Why calculate PI?
The most pristine purpose of the computer is to cope with the complicated computations
which human couldn't achieve. Among these computations is calculation of PI. Although
there is actually no much sense to know so many digits of PI, this's a challenge for a
computer programmer, and it's interesting. Further more, simple as it seems to be,
calculation of PI relates to many useful mathematical knowledges.
The first algorithm: series expansion of arctan
 PI/4 = 4 arctan(1/5)  arctan(1/239) (1)
 arctan(x) = x  x^{3}/3 + x^{5}/5  x^{7}/7 + ....
(2)
It is easy to understand that to get an ultrahigh precision value of PI real should be
store in the computer in the form of array which's size is in proportion to the significant
digits. In this algorithm, the significant digits of PI, n, increase linear with the number of
terms summed in (2). To calculate each term in
(2), a loop to perform an ultrahigh precision real division with
a small integer is needed. The loop times are in proportion to n. Thus, the overall time
complexity of this algorithm is O(n^{2}).
The virtue of the algorithm is simplicity, and only integer operation is encountered.
Below is my PI calculation program. In this program, I have taken some improvement on the speed:
Real are stored in the form of array with 64bits integer (long long) as its element type.
Each element stores 12 decimal digits. Estimates the number of 0 in the head and the tail of
x^{k} (x = 1/5, 1/239) and only calculates the none0 part.
 pi.cpp
C++ source code, compile with 'g++ pi.cpp o pi O2' under Linux
 pi.s
modified assemble code generated by g++, faster. compile with 'g++ pi.s o pi' under Linux
Besides, there are other formulas like (1). But they are
less useful. For example:
 PI/4 = arctan(1/2) + arctan(1/3)
 PI/4 = 8 arctan(1/10)  arctan(1/239)  4 arctan(1/515)
The second algorithm: series of 1/PI

1/PI = (sqrt(8) / 9801) sum_{k=0~inf}
{ [(4k)! (1103 + 26390k)] / [(k!)^{4} 396^{4k}] }
(Ramanujan)

1/PI = (sqrt(10005) / 4270934400) sum_{k=0~inf}
{ [(6k)! (13591409 + 545140134k)] / [(k!)^{3} (3k)! (640320)^{3k}] }
(Chudnovsky)
These two series above (there are other similar formulas, but less useful) are more
complicated than the Taylor series of arctan. Although they are also linear convergence
and their time complexity are also O(n^{2}), but their convergence are more faster.
(Ramanujan) adds 8 significant digits per term,
(Chudnovsky) adds 14.
In this algorithm, besides the division of ultrahigh precision real with small integer,
it also needs to find the square root and inverse of an ultrahigh precision real. This involve
FFT(Fast Fourier Transform), which will be described below.
The third algorithm: ArithmeticGeometric Mean and iteration
ArithmeticGeometric Mean (AGM) M(a, b) is defined as:

a_{0} = a, b_{0} = b
a_{k} = (a_{k1} + b_{k1}) / 2,
b_{k} = sqrt(a_{k1} b_{k1})
M(a, b) = lim_{k>inf} a_{k} = lim_{k>inf} b_{k}
Then, from a series of elliptic integral theories (which I don't understand), we get
the formula below:

a_{0} = 1, b_{0} = 1 / sqrt(2)
1/PI = { 1  sum_{k=0~inf}
[2^{k} (a_{k}^{2}  b_{k}^{2})] }
/ 2M(a_{0}, b_{0})^{2}
(AGM)
From the formula above, a proper algorithm could be created. During the iteration,
significant digits increase at order 2, that is, significant digits increase by a factor of 2
at each iteration. In this algorithm, the multiplication, division and square root of
ultrahigh precision real need FFT, which will be described below. Thinking about the time
complexity of FFT, the overall time complexity of this algorithm is O(n log(n)^{2}).
Besides (AGM), there are other iterations which have the same
time complexity. For instance, the series below converge to 1/PI at order 4:

y_{0} = sqrt(2)  1, a_{0} = 6  4 sqrt(2)
y_{k} = [1  sqrt(sqrt(1  y_{k1}^{4}))]
/ [1 + sqrt(sqrt(1  y_{k1}^{4}))],
a_{k} = (1 + y_{k})^{4} a_{k1}
 2^{2k+1} y_{k} (1 + y_{k} + y_{k}^{2})
1/PI = lim_{k>inf} a_{k} (Borwein)
FFT
According to above, multiplication, division and square root of ultrahigh precision real
(multidigits number in array form) are inevitable in the second and third algorithm.
Computed with normal method, the time complexity of multiplication of ndigits will reach
O(n^{2}). FFT will help to decrease the computation.
Suppose a[k] and b[k] (k=0~n1) are array in type complex. The forward and backward
Discrete Fourier Transform (DFT) are defined as: (i = sqrt(1))
 b = FFT_{forward}(a) :
b[k] = sum_{j=0~n1} ( a[j] e^{i*j*2PI*k/n} )
(3)
 b = FFT_{backward}(a) :
b[k] = (1/n) sum_{j=0~n1} ( a[j] e^{i*j*2PI*k/n} )
(4)
The (1/n) in (3) and (4) could be
placed in either formula, or in both formulas as (1/sqrt(n)). This ensures that it will not
differ with a factor when transform forward and back.
When n's prime factors are all small integers, especially when n is a power of 2,
reducing redundant computation with proper algorithm, the time complexities of DFT
(3) and (4) will decrease to O(n log(n)).
This is socalled Fast Fourier Transform (FFT). Please refer to other articles for more details.
Below is my FFT program for reference. Besides, there are some developed FFT libraries,
eg. FFTW, which could be use directly.
A multiplication of two numbers which have n_{1} and n_{2} digits could be
done like this: Allocate two arraies in size n(n>=n_{1}+n_{2},
2^{m} is the best). Fill the two arraies with the two numbers (lowerdigit first,
fill higherdigits with 0). Perform forward Fourier transform with each array. Multiply the
corresponding terms into one array before transform it back. Since FFT is done in the complex
field, digits in the final array should be rounded and carried to get the final result.
The precision of the Fourier transform should be paid attention. We know that real is stored
in computer in the form of single or double precision number which would cause error.
In the ndigits multiplication, n is usually very large. Error caused by the precision of real
will accumulate during the process of Fourier transform. Will the total error exceed the
permitted range? My opinion is that double precision number should be used in FFT. Considering
the disipline of statistics, the total error caused by precision won't be too large to cause
mistake. A method to check the correctness of the computation is to calculate the residue.
For example, dividing the numbers with 17, if the residues of the multiplicand and the
multiplicator are 8 and 6, the residue of the product should be 14.
From Newton iteration method, inverse and square root of multidigits number could be
calculated with a series of multiplications. For example, to calculate x = 1/a, surpose
f(x) = 1/x  a, we get the following iteration series:

x_{0} ~= 1/a
x_{k} = x_{k1}  f(x_{k1})/f'(x_{k1})
= 2x_{k1}  ax_{k1}^{2} (5)
To calculate x = sqrt(a), we should calculate x = 1 / sqrt(a) first. Suppose
f(x) = 1/x^{2}  a, we get:

x_{0} ~= 1 / sqrt(a)
x_{k} = x_{k1}  f(x_{k1})/f'(x_{k1})
= (3/2)x_{k1}  (1/2)ax_{k1}^{3} (6)
(5) and (6) converge to result at
order 2. There are other complicated iteration series which converge to result at higher order.
Demonstration
To give a demonstration to the whole course of calculating PI using AGM and FFT, here is
my new written PI program. I am sorry to say that my program is about 100 times slower
and consumes much more memory than other programs that you could find on the internet.
I don't know why at present. :( Anyway, this is my first PI program using AGM and FFT.
Congratulations! :)
Summarize
Three kinds of PI calculation algorithm are introduced above. The first algorithm is
the slowest, and is obsolete. The next two algorithm are more faster and most used. Although
the time complexity of iteration algorithm is smaller than that of 1/PI series algorithm,
the former is complicated to implement and seems to be not more faster than the latter.
Sorry for my poor English. Any discussion about this article or my English is welcome.
(2004.12.31) 