Numerical algorithms I: basic methods

This and subsequent chapters document the numerical algorithms used in Yacas for exact integer calculations as well as for multiple precision floating-point calculations. We give brief but self-contained descriptions of the non-trivial algorithms and estimates of their computational cost. Most of the algorithms were taken from referenced literature; the remaining algorithms were developed by us.

Adaptive plotting

The problem of generating plots of functions generally does not require calculations in high precision. However, the algorithm is able to approximate the function arbitrarily well (except for some particularly ill-behaved examples).

The adaptive plotting routine Plot2D'adaptive uses a simple algorithm to select the optimal grid to approximate a function f(x). The same algorithm for adaptive grid refinement could be used for numerical integration. The idea is that plotting and numerical integration require the same kind of detailed knowledge about the behavior of the function.

The algorithm first splits the interval into a specified initial number of equal subintervals, and then repeatedly splits each subinterval in half until the function is well enough approximated by the resulting grid. The integer parameter depth gives the maximum number of binary splittings for a given initial interval; thus, at most 2^depth additional grid points will be generated. The function Plot2D'adaptive should return a list of pairs of points {{x1,y1}, {x2,y2}, ...} to be used directly for plotting.

The adaptive bisection algorithm works like this:

1. Given an interval ( a, c), we split it in half, b:=(a+c)/2 and first compute f(x) at five grid points a, a[1]:=(a+b)/2, b, b[1]:=(b+c)/2, c.
2. If currently depth<=0, return this list of 5 points and values because we cannot refine the grid any more.
3. Otherwise, check that the function does not oscillate too rapidly on the interval [ a, c]. The formal criterion is that the 5 values do not make a "zigzag" pattern such as (1,3,2,3,1). We use the following procedure: For each three consecutive values, write "1" if the middle value is larger than the other two, or if it is smaller than the other two. If we have at most two ones now, then we consider the change of values to be "slow enough". Otherwise it is not "slow enough". In this case we need to refine the grid; go to step 5. Otherwise, go to step 4.
4. Check that the function values are smooth enough through the interval. Smoothness is controlled by a parameter epsilon. The meaning of the parameter epsilon is the (relative) error of the numerical approximation of the integral of f(x) by the grid. A good heuristic value of epsilon is 1/(the number of pixels on the screen) because it means that no pixels will be missing in the area under the graph. However, for this to work we need to make sure that we are actually computing the area under the graph; so we define g(x):=f(x)-f[0] where f[0] is the minimum of the values of f(x) on the five grid points a, a[1], b, b[1], and c; the function g(x) is nonnegative and has minimum value of 0. Then we compute two different Newton-Cotes quadratures for Integrate(x,b,b[1])g(x) using these five points. (Asymmetric quadratures are chosen to avoid running into an accidental symmetry of the function; the first quadrature uses points a, a[1], b, b[1] and the second quadrature uses b, b[1], c.) If the absolute value of the difference between these quadratures is less than epsilon * (value of the second quadrature), then we are done and we return the list of these five points and values.
5. Otherwise, we need to refine the grid. We compute Plot2D'adaptive recursively for the two halves of the interval, i.e. for ( a, b) and ( b, c); we pass the midpoint values as necessary and decrease depth by 1. We also multiply epsilon by 2 because we need to maintain constant absolute precision and this means that the relative error for the two subintervals can be twice as large. The resulting two lists for the two subintervals are concatenated (excluding the double value at point b) and returned.

This algorithm works well if the initial number of points and the depth parameter are large enough.

Singularities in the function are handled by the step 3. Namely, the algorithm checks whether the function returns a non-number (e.g. Infinity) and if so, the sign change is always considered to be "too rapid". Thus, the intervals immediately adjacent to the singularity will be plotted at the highest allowed refinement level. When plotting the resulting data, the singular points are simply not printed to the data file and the plotting programs do not have any problems.

The meaning of Newton-Cotes quadrature coefficients is that an integral is approximated as

Integrate(x,a[0],a[n])f(x)<=>h*Sum(k,0,n,c[k]*f(a[k])),

where h:=a[1]-a[0] is the grid step, a[k] are the grid points, and c[k] are the quadrature coefficients. These coefficients c[k] are independent of the function f(x) and can be precomputed in advance for a given grid a[k] (not necessarily a grid with constant step h=a[k]-a[k-1]). The Newton-Cotes coefficients c[k] for grids with a constant step h can be found, for example, by solving a system of equations,

Sum(k,0,n,c[k]*k^p)=n^(p+1)/(p+1)

for p=0, 1, ..., n. This system of equations means that the quadrature correctly approximates the integrals of p+1 functions f(x)=x^p, p=0, 1, ..., n, over the interval (0, n).

The solution of this system always exists and gives quadrature coefficients as rational numbers. For example, the Simpson quadrature c[0]=1/6, c[1]=2/3, c[2]=1/6 is obtained with n=2.

In the same way it is possible to find quadratures for the integral over a subinterval rather than over the whole interval of x. In the current implementation of the adaptive plotting algorithm, two quadratures are used: the 3-point quadrature ( n=2) and the 4-point quadrature ( n=3) for the integral over the first subinterval, Integrate(x,a[0],a[1])f(x). Their coefficients are (5/12, 2/3, -1/12) and ( 3/8, 19/24, -5/24, 1/24).

Cost of arbitrary-precision computations

A computer algebra system absolutely needs to be able to perform computations with very large integer numbers. Without this capability, many symbolic computations (such as exact GCD of polynomials or exact solution of polynomial equations) would be impossible.

A different question is whether a CAS really needs to be able to evaluate, say, 10,000 digits of the value of a Bessel function of some complex argument. It is almost certain that no applied problem of natural sciences would need floating-point computations of special functions with such a high precision. However, arbitrary-precision computations are certainly useful in some mathematical applications; e.g. some numerical identities can be first guessed by a floating-point computation with many digits and then proved (by hand).

Very high precision computations of special functions might be useful in the future. It is however quite clear that computations with moderately high precision (50 or 100 digits) are useful for applied problems. Implementing an efficient algorithm that computes 100 digits of Sin(3/7) already involves many of the issues that would also be relevant for a 10,000 digit computation. Modern algorithms allow evaluations of all elementary functions in time that is asymptotically logarithmic in the number of digits P and linear in the cost of long multiplication (usually denoted M(P)). All special functions can be evaluated in time that is linear in P and in M(P).

Therefore Yacas strives to implement all of its numerical functions in arbitrary precision. All integer or rational functions return exact results, and all floating-point functions return their value with P correct decimal digits. The current value of P is accessed as GetPrecision() and may be changed by Precision(...).

Implementing an arbitrary-precision floating-point computation of a function f(x), such as f(x)=Exp(x), typically needs the following:

An algorithm that will compute f(x) for a given value x to a user-specified precision of P (decimal) digits. Sometimes several algorithms must be implemented that work for certain subdomains of the ( x, P) space.
An estimate of the computational cost of the algorithm(s), as a function of x and P. This is needed to select the best algorithm for given x, P.
An estimate of the round-off error. This is needed to select the "working precision" which will typically be somewhat higher than the precision of the final result.

Selecting algorithms for computations are the most non-trivial part of the implementation. We want to achieve arbitrarily high precision, so somehow we need to find either a series, or a continued fraction, or a sequence given by explicit formula, that converges to the function in a controlled way. It is not enough to use a table of precomputed values or an approximation formula that has a limited precision. Recently (1980--2000), the interest in research of arbitrary-precision computations grew and many efficient algorithms for elementary and special functions were published.

In most cases it is imperative to know in advance how many iterations are needed for given x, P. This knowledge allows to estimate the computational cost, in terms of the required precision P and of the cost of long multiplication M(P). Typically all operations will fall into the following categories (sorted by the increasing cost):

addition, subtraction: linear in P;
multiplication, division, integer power, integer root: linear in M(P);
elementary functions: Exp(x), Ln(x), Sin(x), ArcTan(x) etc.: Ln(P)*M(P) or slower by some more factors of Ln(P);
transcendental functions: Erf(x), Gamma(x) etc.: typically P*M(P) or slower.

Some algorithms (e.g. Taylor series summation) also need extra storage space, typically of order P*Ln(P) (i.e. O(Ln(P)) temporary P-digit numbers).

Here is a worked-out example of how we could estimate the required number of terms in the power series

Exp(x)=1+x+x^2/2! +...+x^n/n! +O(x^(n+1)),

if we need P digits of precision and x is a "small enough" number, say Abs(x)<1. (A similar calculation can be done for any other bound on x.) Suppose we truncate the series after n-th term and the series converges "well enough" after that term. The error will be approximately equal to the first term we dropped (this is what we really mean by "converges well enough" and this will generally be the case in all applications, because we will not use series that do not converge well enough). The term we dropped is x^(n+1)/(n+1)!. To estimate n! for large n, one can use the inequality

e^(e-1)*(n/e)^n<n! <(n/e)^(n+1)

(valid for all n>=47) which provides tight bounds for the growth of the factorial, or a weaker inequality which is easier to use,

(n/e)^n<n! <((n+1)/e)^(n+1)

(valid for all n>=6). The latter inequality is quite enough for most purposes. If we use the upper bound on n! from this estimate, the term we dropped is bounded by

x^(n+1)/(n+1)! <(e/(n+2))^(n+2).

We need this to be smaller than 10^(-P). This leads to an inequality

(e/(n+2))^(n+2)<10^(-P),

which we now need to solve for n. The left hand side decreases with growing n. So it is clear that the inequality will hold for large enough n, say for n>=n0 where n0 is an unknown (integer) value. We can take a logarithm of both sides, replace n with n0 and obtain the following equation for n0:

(n0+2)*Ln((n0+2)/e)=P*Ln(10).

This equation cannot be solved exactly in terms of elementary functions; this is a typical situation in such estimates. However, we do not really need an precise exact solution; all we need is its integer part. It is also acceptable if our approximate n0 comes out a couple of units higher than necessary, because a couple of extra terms of the Taylor series will not significantly slow down the algorithm (but it is important that we do not underestimate n0). This is also a typical situation and it simplifies the analysis. Finally, we are mostly interested in having a good enough answer for large values of P. So let us assume that P is large (say, 100). We can try to guess what the result is: The largest term on the LHS grows as n0*Ln(n0) and it should be approximately equal to P*Ln(10); Ln(n0) grows very slowly, so this gives us a hint that n0 is proportional to P*Ln(10). As a first try, set n0=P*Ln(10)-2 and compare the RHS with the LHS; we see that we have overshot by a factor Ln(P)-1+Ln(Ln(10)), which is not a large factor. We can now compensate and divide n0 by this factor, so our second try is

n0=(P*Ln(10))/(Ln(P)-1+Ln(Ln(10)))-2.

(This approximation procedure is equivalent to solving the equation

x=(P*Ln(10))/(Ln(x)-1)

by iterations, starting from x=P*Ln(10).) If we substitute our second try for n0 into the equation, we shall find that we undershot a little bit (i.e. the LHS is a little smaller than the RHS), but our n0 is now smaller than it should be by a quantity that is smaller than 1 for large enough P. So we should stop at this point and simply add 1 to this approximate answer. We should also replace Ln(Ln(10))-1 by 0 for simplicity (this is safe because it will slightly increase n0.)

Our final result is that it is enough to take

n=(P*Ln(10))/Ln(P)-1

terms in the Taylor series to compute Exp(x) for Abs(x)<1 to P decimal digits. (Of course, if x is much smaller than 1, many fewer terms will suffice.)

As the required precision P grows, an arbitrary-precision algorithm will need more iterations or more terms of the series. So the round-off error introduced by every floating-point operation will increase. When doing arbitrary-precision computations, we can always perform all calculations with a few more digits and compensate for round-off error. It is however imperative to know in advance how many more digits we need to take for our "working precision". We should also take that increase into account when estimating the total cost of the method. Of course, we should avoid very large roundoff errors; "repeat everything with twice as many digits" is not an adequate solution.

Here is a simple estimate of the round-off error in a computation of n terms of a power series. Suppose that adding each term requires two multiplications and one addition. If all calculations are performed with relative precision epsilon=10^(-P), then the accumulated relative round-off error is 3*n*epsilon. (This is an optimistic estimate that needs to be verified in each particular case; what if the series begins with some large terms but converges to a very small value?) If the relative error is 3*n*epsilon, it means that our answer is something like a*(1+3*n*epsilon) where a is the correct answer. We can see that out of the total P digits of this answer, only the first k decimal digits are correct, where k= -Ln(3*n*epsilon)/Ln(10). In other words, we have lost P-k digits because of accumulated roundoff error. So we found that we need P-k=Ln(3*n)/Ln(10) extra decimal digits to compensate for this round-off error.

In the previous exercise we found the number of terms n for Exp(x). So now we know how many extra digits of working precision we need for this particular case.

Below we shall have to perform similar estimates of the required number of terms and of the accumulated round-off error in our analysis of the algorithms.

Basic arbitrary-precision arithmetic

Currently, Yacas can be compiled to use either internal math (the yacasnumbers library) or the GNU multiple precision library gmp. The algorithms for basic arithmetic in the internal math mode are currently rather slow compared with gmp. If P is the number of digits of precision, then multiplication and division take M(P)=O(P^2) operations in the internal math. (Of course, multiplication and division by a short integer takes time linear in P.) Much faster algorithms for long multiplication (Karatsuba, Toom-Cook, FFT, Newton-Raphson division etc.) are implemented in gmp where the cost of multiplication is M(P)=O(P*Ln(P)) for very large precision. For the estimates of computation cost in this book we shall assume that M(P) is at least linear in P and maybe slower.

Warning: calculations with internal Yacas math using precision exceeding 10,000 digits are currently impractically slow.

In some algorithms it is necessary to compute the integer parts of expressions such as a*Ln(b)/Ln(10) or a*Ln(10)/Ln(2) where a, b are short integers of order O(P). Such expressions are frequently needed to estimate the number of terms in the Taylor series or similar parameters of the algorithms. In these cases, it is important that the result is not underestimated but it would be wasteful to compute Ln(10)/Ln(2) in floating point only to discard most of that information by taking the integer part of say 1000*Ln(10)/Ln(2). It is more efficient to approximate such constants from above by short rational numbers, for example, Ln(10)/Ln(2)<28738/8651 and Ln(2)<7050/10171. The error of such an approximation will be small enough for practical purposes. The function NearRational can be used to find optimal rational approximations. The function IntLog (see below) efficiently computes the integer part of a logarithm (for an integer base, not a natural logarithm). If more precision is desired in calculating Ln(a)/Ln(b) for integer a, b, one can compute IntLog(a^k,b) for some integer k and then divide by k.

How many digits of Sin(Exp(Exp(1000))) do we need?

Consider the problem of representing very large numbers such as x=Exp(Exp(1000)). Suppose we need a floating-point representation of the number x with P decimal digits of precision. In other words, we need to express

x<=>M*10^E,

where the mantissa 1<M<10 is a floating-point number and the exponent E is an integer, chosen so that the relative precision is 10^(-P). How much effort is needed to find M and E?

The exponent E is easier to obtain:

E=Floor(Ln(x)/Ln(10))=Floor(Exp(1000)/Ln(10)).

To compute Floor(y) exactly as an integer, we need to approximate y with at least Ln(y)/Ln(10) floating-point digits. In our example, we find that we need 434 decimal digits to represent E.

Once we found E, we can write x=10^(E+m) where m=Exp(1000)/Ln(10)-E is a floating-point number, 0<m<1. Then M=10^m. To find M with P (decimal) digits, we need m with also at least P digits. Therefore, we actually need to evaluate Exp(1000)/Ln(10) with 434+P decimal digits before we can find P digits of the mantissa of x. In other words, we need a high-precision calculation even to find the first digit of x, because it is necessary to find the exponent E exactly as an integer.

What about finding Sin(x)? This is extremely difficult, because to find even the first digit of Sin(x) we have to evaluate x with absolute error at most 0.5. We know, however, that the number x has approximately 10^434 digits before the decimal point. Therefore, we would need to calculate x with at least that many digits which is clearly far beyond the capability of modern computers. It seems unlikely that even the sign of Sin(Exp(Exp(1000))) will be obtained in the near future, and even more unlikely that it would be of any use to anybody even if it could be obtained.

Suppose that N is the largest integer that our arbitrary-precision facility can reasonably handle. (For Yacas internal math library, N is about 10^10000.) Then we see that numbers x of order 10^N can be calculated with at most one (1) digit of floating-point precision, while larger numbers cannot be calculated with any precision at all.

It seems that very large numbers can be obtained in practice only through exponentiation or powers. It is unlikely that such numbers will arise from sums or products of reasonably-sized numbers in some formula.

A factorial function can produce rapidly growing results, but exact factorials n! for large n are well represented by the Stirling formula which involves powers and exponentials.

For example, suppose a program operates with numbers x of size N or smaller; a number such as 10^N can be obtained only by multiplying O(N) numbers x together. But since N is the largest representable number, it is certainly unfeasible to perform O(N) operations on a computer (at least on a classical computer). However, it is feasible to obtain N-th power of a number, since it requires only O(Ln(N)) operations.

If numbers larger than 10^N are created only by exponentiation operations, then special exponential notation could be used to represent them. For example, a very large number z could be stored and manipulated as an unevaluated exponential z=Exp(M*10^E) where M>=1 is a floating-point number with P digits of mantissa and E is an integer, 0<E<N. Let us call such objects "exponentially large numbers" or "exp-numbers" for short.

In practice, we should decide on a threshold value N and promote a number to an exp-number when its decimal exponent exceeds N.

Note that an exp-number z might be known to be positive or negative, e.g. z= -Exp(M*10^E).

Arithmetic operations can be applied to the exp-numbers, although we should warn that exp-large arithmetic is of limited use because of almost certainly critical loss of precision. The power and logarithm operations can be meaningfully performed on exp-numbers z. For example, if z=Exp(M*10^E) and p is a normal floating-point number, then z^p=Exp(p*M*10^E) and Ln(z)=M*10^E. We can also multiply or divide two exp-numbers. But it makes no sense to multiply an exp-number z by a normal number because we cannot represent the difference between z and say 2.52*z. Similarly, adding z to anything else would result in a total underflow, since we do not actually know a single digit of the decimal representation of z. So if z1 and z2 are exp-numbers, then z1+z2 is simply equal to either z1 or z2 depending on which of them is larger. Therefore, an exp-number z acts as an effective "infinity" compared with normal numbers. But they should not be used as a device for computing limits: the unavoidable underflow will almost certainly produce wrong results. For example, trying to verify

Limit(x,0)(Exp(x)-1)/x=1

by substituting x=1/z with some exp-number z gives 0 instead of 1.

Taking a logarithm of an exp-number brings it back to the realm of normal, representable numbers. However, taking an exponential of an exp-number results in a number which is not representable even as an exp-number. This is because an exp-number z needs to have its exponent E represented exactly as an integer, but Exp(z) has an exponent of order O(z) which is not a representable number. The monstrous number Exp(z) could be only written as Exp(Exp(M*10^E)), i.e. as a "doubly exponentially large" number, or "2-exp-number" for short. Thus we obtain a hierarchy of iterated exp-numbers. Each layer is "unrepresentably larger" than the previous one.

The same considerations apply to very small numbers of the order 10^(-N) or smaller. Such numbers can be manipulated as "exponentially small numbers", i.e. expressions of the form Exp(-M*10^E) with floating-point mantissa M>=1 and integer E satisfying 0<E<N.

Similarly to exponentially large numbers, only some arithmetic operations can be meaningfully performed. Exponentially small numbers act as an effective zero compared with normal numbers.

Taking a logarithm of an exp-small number makes it again a normal representable number. However, taking an exponential of an exp-small number produces 1 because of underflow. To obtain a "doubly exponentially small" number, we need to take a reciprocal of a doubly exponentially large number, or take the exponent of an exponentially large negative power. In other words, Exp(-M*10^E) is exp-small, while Exp(-Exp(M*10^E)) is 2-exp-small.

The practical significance of exp-numbers is rather limited. We cannot obtain even a single significant digit of an exp-number. A "computation" with exp-numbers is essentially a floating-point computation with logarithms of these exp-numbers. A practical problem that needs numbers of this magnitute can probably be restated in terms of more manageable logarithms of such numbers. In practice, exp-numbers could be useful not as a means to get a numerical answer, but as a warning sign of critical overflow or underflow.

Continued fractions

Approximation of numbers by continued fractions

The function ContFrac converts a (rational) number r into a regular continued fraction,

r=n[0]+1/(n[1]+1/(n[2]+...)).

Here all numbers n[i] ("terms" of a continued fraction) are integers and all except n[0] must be positive. (Continued fractions may not converge unless their terms are positive and bounded from below.)

The algorithm for converting a rational number r=n/m into a continued fraction is simple. First, we determine the integer part of r, which is Div(n,m). If it is negative, we need to subtract one, so that r=n[0]+x and the remainder x is nonnegative and less than 1. The remainder x=Mod(n,m)/m is then inverted, r[1]:=1/x=m/Mod(n,m) and so we have completed the first step in the decomposition, r=n[0]+1/r[1]; now n[0] is integer but r[1] is perhaps not integer. We repeat the same procedure on r[1], obtain the next integer term n[1] and the remainder r[2] and so on, until such n that r[n] is an integer and there is no more work to do. This process will always terminate because all floating-point values are actually rationals in disguise.

Continued fractions are useful in many ways. For example, if we know that a certain number x is rational but have only a floating-point representation of x with a limited precision, say, 1.5662650602409638, we can try to guess its rational form (in this example x=130/83). The function GuessRational uses continued fractions to find a rational number with "optimal" (small) numerator and denominator that is approximately equal to a given floating-point number.

Consider the following example. The number 17/3 has a continued fraction expansion {5,1,2}. Evaluated as a floating point number with limited precision, it may become something like 17/3+0.00001, where the small number represents a roundoff error. The continued fraction expansion of this number is {5, 1, 2, 11110, 1, 5, 1, 3, 2777, 2}. The presence of an unnaturally large term 11110 clearly signifies the place where the floating-point error was introduced; all terms following it should be discarded to recover the continued fraction {5,1,2} and from it the initial number 17/3.

If a continued fraction for a number x is cut right before an unusually large term, and evaluated, the resulting rational number is very close to close to x but has an unusually small denominator. This works because partial continued fractions provide "optimal" rational approximations for the final (irrational) number, and because the magnitude of the terms of the partial fraction is related to the magnitude of the denominator of the resulting rational approximation.

GuessRational(x, prec) needs to choose the place where it should cut the continued fraction. The algorithm for this is somewhat less precise than it could be but it works well enough. The idea is to cut the continued fraction when adding one more term would change the result by less than the specified precision. To realize this in practice, we need an estimate of how much a continued fraction changes when we add one term.

The routine GuessRational uses a (somewhat weak) upper bound for the difference of continued fractions that differ only by an additional last term:

Abs(delta):=Abs(1/(a[1]+1/(...+1/a[n]))-1/(a[1]+1/(...+1/a[n+1])))<1/((a[1]*...*a[n])^2*a[n+1]).

Thus we should compute the product of successive terms a[i] of the continued fraction and stop at a[n] at which this product exceeds the maximum number of digits. The routine GuessRational has a second parameter prec which is by default 1/2 times the number of decimal digits of current precision; it stops at a[n] at which the product a[1]*...*a[n] exceeds 10^prec.

The above estimate for delta hinges on the inequality

1/(a+1/(b+...))<1/a

and is suboptimal if some terms a[i]=1, because the product of a[i] does not increase when one of the terms is equal to 1, whereas in fact these terms do make delta smaller. A somewhat better estimate would be obtained if we use the inequality

1/(a+1/(b+1/(c+...)))<1/(a+1/(b+1/c)).

(See the next section for more explanations of precision of continued fraction approximations.) This does not lead to a significant improvement if a>1 but makes a difference when a=1. In the product a[1]*...*a[n], the terms a[i] which are equal to 1 should be replaced by

a[i]+1/(a[i+1]+1/a[i+2]).

Since the comparison of a[1]*...*a[n] with 10^prec is qualitative, it it enough to do calculations for it with limited precision.

This algorithm works well if x is computed with enough precision; namely, it must be computed to at least as many digits as there are in the numerator and the denominator of the fraction combined. Also, the parameter prec should not be too large (or else the algorithm will find another rational number with a larger denominator that approximates x "better" than the precision to which you know x).

The related function NearRational(x, prec) works somewhat differently. The goal is to find an "optimal" rational number, i.e. with smallest numerator and denominator, that is within the distance 10^(-prec) of a given value x. The algorithm for this comes from the 1972 HAKMEM document, Item 101C. Their description is terse but clear:

Problem: Given an interval, find in it the rational number with the smallest numerator and denominator. Solution: Express the endpoints as continued fractions. Find the first term where they differ and add 1 to the lesser term, unless it's last. Discard the terms to the right. What's left is the continued fraction for the "smallest" rational in the interval. (If one fraction terminates but matches the other as far as it goes, append an infinity and proceed as above.)

The HAKMEM text (M. Beeler, R. W. Gosper, and R. Schroeppel: Memo No. 239, MIT AI Lab, 1972, available as HTML online from various places) contains several interesting insights relevant to continued fractions and other numerical algorithms.

Accurate computation of continued fractions

Sometimes an analytic function f(x) can be approximated using a continued fraction that contains x in its terms. Examples include: the inverse tangent ArcTan(x), the error function Erf(x) and the incomplete gamma function Gamma(a,x) (see below for details). For these functions, continued fractions provide a method of numerical calculation that works when the Taylor series converges slowly or not at all. However, continued fractions usually converge quickly for one value of x but slowly for another. Also, it is not as easy to obtain an analytic error bound for a continued fraction approximation as it is for power series.

In this section we describe two methods that compute a continued fraction: the simple "bottom-up" and the more complicated "top-down" method. The "bottom-up" method is faster but requires to know the number of terms in advance. The "top-down" method essentially replaces the continued fraction by a sum of a certain series. This method provides an automatic error estimate and can be used to evaluate a continued fraction with more and more terms until the desired precision is achieved. The formula for the precision of the continued fraction approximation used in the "top-down" method sometimes allows to estimate the number of terms in advance. If we know the number of terms, then the faster "bottom-up" method can be used.

A continued fraction

a[0]+b[0]/(a[1]+b[1]/(a[2]+...))

is specified by a set of terms (a[i], b[i]). [If continued fractions are used to approximate analytic functions such as ArcTan(x), then (a[i], b[i]) will be some infinite sequences that depend on x.] Let us denote by F[m][n] the truncated fraction containing only terms from m to n, i.e.

F[m][n]:=a[m]+b[m]/(a[m+1]+b[m+1]/(...+b[n]/a[n])).

In this notation, the continued fraction that we need to compute is F[0][n]. Our problem is first, to select a large enough n so that F[0][n] gives enough precision, and second, to compute that value.

The "bottom-up" method is to select some value of n and start evaluating the fraction from the bottom upwards. First we take F[n][n]=a[n] and then we use the obvious relation of backward recurrence,

F[m][n]=a[m]+b[m]/F[m+1][n],

to obtain successively F[n-1][n], ..., F[0][n]. This method requires one long division at each step.

An alternative implementation may be to imagine that all a[i], b[i] are integers, and to obtain the numerator and the denominator of F[0][n] separately as two simultaneous backward recurrences. If F[m+1][n]=p[m+1]/q[m+1], then p[m]=a[m]*p[m+1]+b[m]*q[m+1] and q[m]=p[m+1]. This method would start with p[n]=a[n], q[n]=1 and require two long multiplications at each step; the only division will be performed at the very end. Sometimes this method allows to reduce the roundoff error.

There is an improvement to this method that can sometimes increase the achieved precision without computing more terms (this is suggested in the Tsimring book). Namely, for the starting value of the recurrence we should choose not a[n] but another number that more closely approximates the infinite remainder of the fraction. The infinite remainder, which can be symbolically written as F[n][Infinity], can be sometimes estimated (obviously, we are not able to compute it exactly). In simple cases, F[n][Infinity] changes very slowly at large n. Suppose that F[n][Infinity] is approximately constant; then it must be approximately equal to F[n+1][Infinity]. Therefore, if we solve the (quadratic) equation

x=a[n]+b[n]/x,

we shall obtain the (positive) value x which may be a much better approximation for F[n][Infinity] than a[n]. But this depends on the assumption of the way the continued fraction converges. It may happen, for example, that for large n the value F[n][Infinity] is almost the same as F[n+2][Infinity] but is significantly different from F[n+1][Infinity]. Then we should instead solve the equation

x=a[n]+b[n]/(a[n+1]+b[n+1]/x)

and take the solution x as the approximation for F[n][Infinity].

The "bottom-up" method obviously requires to know the number of terms n in advance; calculations have to be started all over again if more terms are needed. Also, it provides no error estimate.

The "top-down" method is slower but provides an error estimate and allows to compute the fraction in the forward direction. The idea

This is a known result in the theory of continued fractions. We give an elementary derivation.

is to replace the continued fraction F[0][n] with a sum of a certain series, i.e.

a[0]+b[0]/(a[1]+b[1]/(...+b[n-1]/a[n]))=Sum(k,0,n,f[k]).

Here

f[k]:=F[0][k]-F[0][k-1]

(k>=1) is a sequence that can be calculated in the forward direction, starting from k=1. If we manage to convert the continued fraction to a series, then adding one more term f[k] of this series will be equivalent to recalculating the continued fraction with k terms instead of k-1 terms. This will automatically give an error estimate and allow to compute with more precision if necessary without having to repeat the calculation from the beginning. (The transformation of the continued fraction into a series is exact, not merely an approximation.)

The formula for f[k] is the following. First the auxiliary sequence P[k], Q[k] for k>=1 needs to be defined by P[1]=0, Q[1]=1, and P[k+1]:=b[k]*Q[k], Q[k+1]:=P[k]+a[k]*Q[k]. Then define f[0]:=a[0] and

f[k]:=((-1)^k*b[0]*...*b[k-1])/(Q[k]*Q[k+1])

for k>=1. The "top-down" method consists of computing f[n] sequentially and adding them together, until n is large enough so that f[n]/f[0] is less than the required precision. Evaluating the next element f[k] requires four long multiplications and one division.

Note that typically all terms a[i], b[i] are positive, so the series we obtained will have terms f[k] of alternating signs. To reduce the round-off error,

This method is used in H. C. Thacher, Jr., Algorithm 180, which refers to a suggestion by Hans Maehly.

we can convert this series into a series with constant sign by adding together two adjacent elements, say f[2*k]+f[2*k+1]. The relevant formulae can be derived from the definition of f[k] using the recurrence relations for P[k], Q[k]:

f[2*k-1]+f[2*k]= -(b[0]*...*b[2*k-2]*a[2*k])/(Q[2*k-1]*Q[2*k+1]),

f[2*k]+f[2*k+1]=(b[0]*...*b[2*k-1]*a[2*k+1])/(Q[2*k]*Q[2*k+2]).

Now e.g. in the series f[0]+(f[1]+f[2])+(f[3]+f[4])+... the first term is positive and all subsequent terms will be negative.

Here is a straightforward derivation of the above formula for f[k]. We need to compute the difference between successive approximations F[0][n] and F[0][n+1]. The recurrence relation we shall use is

F[m][n+1]-F[m][n]= -(b[m]*(F[m+1][n+1]-F[m+1][n]))/(F[m+1][n+1]*F[m+1][n]).

This can be easily seen by manipulating the two fractions, or by using the first recurrence relation above. So far we have reduced the difference between F[m][n+1] and F[m][n] to a similar difference on the next level m+1 instead of m; i.e. we can increment m but keep n fixed. We can apply this formula to F[0][n+1]-F[0][n], i.e. for m=0, and continue applying the same recurrence relation until m reaches n. The result is

F[0][n+1]-F[0][n]=((-1)^n*b[0]*...*b[n])/(F[1][n+1]*...*F[n+1][n+1]*F[1][n]*...*F[n][n]).

Now the problem is to simplify the two long products in the denominator. We notice that F[1][n] has F[2][n] in the denominator, and therefore F[1][n]*F[2][n]=F[2][n]*a[1]+b[1]. The next product is F[1][n]*F[2][n]*F[3][n] and it simplifies to a linear function of F[3][n], namely F[1][n]*F[2][n]*F[3][n] = (b[1]+a[1]*a[2])*F[3][n]+b[1]*a[2]. So we can see that there is a general formula

F[1][n]*...*F[k][n]=P[k]+Q[k]*F[k][n]

with some coefficients P[k], Q[k] which do not actually depend on n but only on the terms of the partial fraction up to k. In other words, these coefficients can be computed starting with P[1]=0, Q[1]=1 in the forward direction. The recurrence relations for P[k], Q[k] follow from the identity (P[k]+Q[k]*F[k][n])*F[k+1][n] = P[k+1]+Q[k+1]*F[k+1][n].

Having found the coefficients P[k], Q[k], we can now rewrite the long products in the denominator, e.g.

F[1][n]*...*F[n][n]=P[n]+Q[n]*F[n][n]=Q[n+1].

(We have used the recurrence relation for Q[n+1].) Now it follows that

f[n+1]:=F[0][n+1]-F[0][n]=((-1)^n*b[0]*...*b[n])/(Q[n+2]*Q[n+1]).

Thus we have converted the continued fraction into a series, i.e. F[0][n]=Sum(k,0,n,f[k]) with f[k] defined above.

Estimating convergence of continued fractions

Suppose we are given the terms a[k], b[k] that define an infinite continued fraction, and we are interested to estimate its rate of convergence. We need to find the number of terms n for which the error of approximation is less than a given epsilon. In our notation, we need to solve

Abs(f[n+1])<epsilon

for n.

The formula that we derived for f[n+1] gives an error estimate for the continued fraction truncated at the n-th term. But this formula contains numbers Q[k] in the denominator. The problem now is to find how quickly the sequence Q[k] grows.

It is not always easy to get a handle on this sequence; in most cases there is no closed-form expression for Q[k]. The recurrence relation for it can be rewritten as

Q[k+2]=a[k+1]*Q[k+1]+b[k]*Q[k],

for k>=0, with initial values Q[0]=0 and Q[1]=1.

A simple lower bound on the growth can be quickly obtained from this recurrence relation. Assume that a[k]>0, b[k]>0. It is clear that all Q[k] are positive, and so Q[k+1]>=a[k]*Q[k]. Therefore Q[k] grows at least as the product of all a[k]:

Q[k+1]>=Factorize(i,1,k,a[k]).

This result gives the following upper bound on the precision,

Abs(f[n+1])<=(b[0]*...*b[n])/((a[1]*...*a[n])^2*a[n+1]).

We have successfully used this bound to estimate the relative error of the continued fraction expansion for ArcTan(x) at small x. However, at large x this bound becomes greater than 1. This does not mean that the continued fraction does not converge, it merely indicates that the bound is too weak in some cases. The sequence Q[k] actually grows faster than the product of all a[k] and we sometimes a tighter bound on this growth.

We shall now consider another example when the growth of this sequence can be estimated analytically. The complementary error function Erfc(x) can be computed using an auxiliary continued fraction

Sqrt(Pi)/2*x*Exp(x^2)*Erfc(x)=1/(1+v/(1+(2*v)/(1+(3*v)/(1+...)))),

where v:=(2*x^2)^(-1) is a small parameter, v<1/2. The terms of this continued fraction are: a[k]=1, b[k]=k*v, for k>=1, and a[0]=0, b[0]=1.

The simple bound would give Abs(f[n+1])<=v^n*n! and this expression grows with n. But we know that our continued fraction actually converges for any v and f[n+1] must tend to zero for large n. It seems that the simple bound is not strong enough for any v and we need a better bound.

The asymptotic growth of the sequence Q[k] can be estimated in this case by the method of stationary phase, or Laplace's method. (See, e.g., F. W. J. Olver, Asymptotics and special functions, Academic Press, 1974, ch. 3, sect. 7.5.) This method is somewhat complicated but quite powerful. The method requires that we find an integral representation for Q[k], and then it gives the dominant contribution to the integral as an asymptotic series in k^(-1). We actually only need the leading terms of the series which are easy to find (even without a CAS).

An integral representation for Q[k] can be obtained using the method of generating functions. Consider a function G(s) defined by the infinite series

G(s)=Sum(k,0,Infinity,Q[k+1]*(I*s)^k/k!).

G(s) is usually called the "generating function" of a sequence. We shifted the index to k+1 for cosmetic convenience, since Q[0]=0. The imaginary unit in front of s is not strictly necessary to introduce but it usually makes the generating function nicer (on the real axis, that is).

It is generally the case that if we know a simple linear recurrence relation for a sequence, then we can also easily find its generating function. The generating function will satisfy a linear differential equation. To guess this equation, we write down the series for G(s) and its derivative D(s)G(s) and try to find their linear combination which is identically zero because of the recurrence relation. (There is, of course, a computer algebra algorithm for doing this automatically.)

In the case of our sequence Q[k] above, we find that

I*D(s)G(s)=(1-I*v*s)*G(s).

The solution with the obvious initial condition G(0)=1 is

G(s)=Exp(-I*s-v*s^2/2).

The second step is to obtain an integral representation for Q[k]. This is possible because Q[k+1] is equal to the k-th derivative of the generating function at s=0:

Q[k+1]=I^k*D(s,k)G(s=0).

So if we had an integral representation for G(s), we could also obtain one for Q[k] by differentiation.

Now we use the fact that derivatives of Fourier transforms are "easy". The generating function G(s) has the following Fourier representation,

G(s)=1/Sqrt(2*Pi*v)*Integrate(t,-Infinity,Infinity)Exp(-t^2/(2*v)-I*s*(t+1)).

Taking the derivatives with respect to s at s=0, we obtain an integral representation for Q[k]:

Q[n+1]=1/Sqrt(2*Pi*v)*Integrate(t,-Infinity,Infinity)(1+t)^n*Exp(-t^2/(2*v)).

Now we can use the method of stationary phase. We represent the integrand as an exponential of some function g(t,n) and find "stationary points" where this function has local maxima:

Q[n+1]=1/Sqrt(2*Pi*v)*Integrate(t,-Infinity,Infinity)Exp(g(t,n)),

g(t,n):= -t^2/(2*v)+n*Ln(1+t).

(Note that the logarithm here must acquire an imaginary part I*Pi for t<-1, but we shall see that the integral over negative t is negligible.) We expect that when n is large, g(t,n) will have a peak or several peaks at certain values of t. At t away from the peaks, the value of g(t,n) is much smaller and, since g is in the exponential, the integral is dominated by the contribution of the peaks. This is the essence of the method of stationary phase.

We find that in our case, two peaks occur at approximately t1<=> -1/2+Sqrt(n*v) and t2<=> -1/2-Sqrt(n*v). We assume that n is large enough so that n*v>1/2. Then the first peak is at a positive t and the second peak is at a negative t.

The contribution of the peaks can be computed from the Taylor approximation of g(t,n) near the peaks. We can expand, for example,

g(t,n)<=>g(t1,n)+Deriv(t,2)g(t1,n)*(t-t1)^2/2

near t=t1. The values g(t1,n) and Deriv(t,2)g(t1,n), and likewise for t2, are constants that we already know since we know t1 and t2. Then the integral of Exp(g) will be approximated by the integral of the form

Integrate(t,-Infinity,Infinity)Exp(g(t1,n)+Deriv(t,2)g(t1,n)*(t-t1)^2/2).

(Note that Deriv(t,2)g(t1,n) is negative.) This is a Gaussian integral that can be easily evaluated, with the result

e^g(t1,n)*Sqrt(-(2*Pi)/Deriv(t,2)g(t1,n)).

This will be the leading term of the contribution of the peak at t1; there will be a similar expression for the contribution of t2. We find that the peak at t1 gives a larger contribution, by the factor Exp(2*Sqrt(n/v)). This factor is never small since n>1 and v<1/2. So it is safe to ignore the peak at t2 for the purposes of our error analysis.

We find, after some algebra, that the leading asymptotic of Q[k] is

Q[n+1]<=>1/Sqrt(2)*Exp(Sqrt(n/v)-1/(4*v)-n/2)*(v*n)^(n/2).

This, together with Stirling's formula

n! <=>Sqrt(2*Pi*n)*(n/e)^n,

allows to estimate the error of the continued fraction approximation:

Abs(f[n+1])<=>2*Sqrt((2*Pi*n)/v)*Exp(-n/2-2*Sqrt(n/v)+1/(2*v)).

Note that this is not merely a bound but an actual asymptotic estimate of Abs(f[n+1]). (By the way, Stirling's formula can be also derived using the method of stationary phase from an integral representation of the Gamma function, in a similar way.)

We find that indeed the error decreases with n in a geometric progression (i.e. roughly as some constant to the power of n).

Newton's method and its improvements

The Newton-Raphson method of numerical solution of algebraic equations can be used to obtain multiple-precision values of several elementary functions.

The basic formula is widely known: If f(x)=0 must be solved, one starts with a value of x that is close to some root and iterates

x'=x-f(x)*D(x)f(x)^(-1).

This formula is based on the approximation of the function f(x) by a tangent line at some point x. A Taylor expansion in the neighborhood of the root shows that (for an initial value x[0] sufficiently close to the root) each iteration gives at least twice as many correct digits of the root as the previous one ("quadratic convergence"). Therefore the complexity of this algorithm is proportional to a logarithm of the required precision and to the time it takes to evaluate the function and its derivative. Generalizations of this method require computation of higher derivatives of the function f(x) but successive approximations to the root converge several times faster (the complexity is still logarithmic).

Newton's method is particularly convenient for multiple precision calculations because of its insensitivity to accumulated errors: if x[k] at some iteration is found with a small error, the error will be corrected at the next iteration. Therefore it is not necessary to compute all iterations with the full required precision; each iteration needs to be performed at the precision of the root expected from that iteration. For example, if we know that the initial approximation is accurate to 3 digits, then (assuming quadratic convergence)

This disregards the possibility that the convergence might be slightly slower. For example, when the precision at one iteration is n digits, it might be **2*n-10** digits at the next iteration. In these (fringe) cases, the initial approximation must be already precise enough (e.g. to at least 10 digits in this example).

it is enough to perform the first iteration to 6 digits, the second iteration to 12 digits and so on. In this way, multiple precision calculations are enormously speeded up.

However, Newton's method suffers from sensitivity to the initial guess. If the initial value x[0] is not chosen sufficiently close to the root, the iterations may converge very slowly or not converge at all. To remedy this, one can combine Newton's iteration with simple bisection. Once the root is bracketed inside an interval (a, b), one checks whether (a+b)/2 is a better approximation for the root than that obtained from Newton's iteration. This guarantees at least linear convergence in the worst case.

For some equations f(x)=0, Newton's method converges faster; for example, solving Sin(x)=0 in the neighborhood of x=3.14159 gives "cubic" convergence, i.e. the number of correct digits is tripled at each step. This happens because Sin(x) near its root x=Pi has vanishing second derivative and thus the function is particularly well approximated by a straight line.

Halley's method is an improvement to Newton's method that makes each equation well approximated by a straight line near the root. Edmund Halley computed fractional powers, x=a^(1/n), by the iteration

x'=x*(n*(a+x^n)+a-x^n)/(n*(a+x^n)-(a-x^n)).

This formula is equivalent to Newton's method applied to the equation x^(n-q)=a*x^(-q) with q=(n-1)/2. This iteration has a cubic convergence rate. This is the fastest method to compute n-th roots with multiple precision. Iterations with higher order of convergence, for example, the method with quintic convergence rate

x'=x*((n-1)/(n+1)*(2*n-1)/(2*n+1)*x^(2*n)+2*(2*n-1)/(n+1)*x^n*a+a^2)/(x^(2*n)+2*(2*n-1)/(n+1)*x^n*a+(n-1)/(n+1)*(2*n-1)/(2*n+1)*a^2),

require more arithmetic operations per step and are in fact less efficient at high precision.

Halley's method can be generalized to any function f(x). A cubically convergent iteration is always obtained if we replace the equation f(x)=0 by an equivalent equation

g(x):=f(x)/Sqrt(Abs(D(x)f(x)))=0

and use the standard Newton's method on it. Here the function g(x) is chosen so that its second derivative vanishes (D(x,2)g(x)=0) at the root of the equation f(x)=0, independently of where this root is. (There is no unique choice of the function g(x) and sometimes another choice is needed to make the iteration more easily computable.)

The Halley iteration for the equation f(x)=0 can be written as

x'=x-(2*f(x)*D(x)f(x))/(2*D(x)f(x)^2-f(x)*Deriv(x,2)f(x)).

For example, the equation Exp(x)=a is transformed into g(x):=Exp(x/2)-a*Exp(-x/2)=0.

Halley's iteration, despite its faster convergence rate, may be more cumbersome to evaluate than Newton's iteration and so it may not provide a more efficient numerical method for some functions. Only in some special cases is Halley's iteration just as simple to compute as Newton's iteration. But Halley's method has another advantage: it is generally less sensitive to the choice of the initial point x[0]. An extreme example of sensitivity to the initial point is the equation x^(-2)=12 for which Newton's iteration x'=3*x/2-6*x^3 converges to the root only from initial points 0<x[0]<0.5 and wildly diverges otherwise, while Halley's iteration converges to the root from any x[0]>0.

It is at any rate not true that Halley's method always converges better than Newton's method. For instance, it diverges on the equation 2*Cos(x)=x unless started at x[0] within the interval (-1/6*Pi, 7/6*Pi). Another example is the equation Ln(x)=a. This equation allows to compute x=Exp(a) if a fast method for computing Ln(x) is available (e.g. the AGM-based method). For this equation, Newton's iteration

x'=x*(1+a-Ln(x))

converges for any 0<x<Exp(a+1), while Halley's iteration converges only for Exp(a-2)<x<Exp(a+2).

When it converges, Halley's iteration can still converge very slowly for certain functions f(x), for example, for f(x)=x^n-a if n^n>a. For such functions that have very large and rapidly changing derivatives, no general method can converge faster than linearly. In other words, a simple bisection will generally do just as well as any sophisticated iteration, until the root is approximated relatively precisely. Halley's iteration combined with bisection seems to be a good choice for such problems.

For practical evaluation, iterations must be supplemented with error control. For example, if x0 and x1 are two consecutive approximations that are already very close, we can quickly compute the achieved (relative) precision by finding the number of leading zeros in the number Abs(x0-x1)/Max(x0,x1). This is easily done using the integer logarithm. After performing a small number of initial iterations at low precision, we can make sure that x1 has at least a certain number of correct digits of the root. Then we know which precision to use for the next iteration (e.g. triple precision if we are using a cubically convergent scheme). It is important to perform each iteration at the precision of the root which it will give and not at a higher precision; this saves a great deal of time since multiple-precision calculations quickly become very slow at high precision.

Fast evaluation of Taylor series

Taylor series for elementary functions can be used for evaluating the functions when no faster method is available. For example, to straightforwardly evaluate

Exp(x)<=>Sum(k,0,N-1,x^k/k!)

with P decimal digits of precision and x<2, one would need about N<=>P*Ln(10)/Ln(P) terms of the series. To evaluate the truncated series term by term, one needs N-1 long multiplications. (Divisions by large integers k! can be replaced by a short division of the previous term by k.) In addition, about Ln(N)/Ln(10) decimal digits will be lost due to accumulated roundoff errors; therefore the working precision must be increased by this many digits.

If we do not know in advance how many terms of the Taylor series we need, we cannot do any better than just evaluate each term and check if it is already small enough. So in this case we will have to do O(N) long multiplications. However, we can organize the calculation much more efficiently if we can estimate the necessary number of terms and if we can afford some storage. A "rectangular" algorithm uses 2*Sqrt(N) long multiplications (assuming that the coefficients of the series are short rational numbers) and Sqrt(N) units of storage. (See paper: D. M. Smith, Efficient multiple-precision evaluation of elementary functions, 1985.)

Suppose we need to evaluate Sum(k,0,N,a[k]*x^k) and we know the number of terms N in advance. Suppose also that the coefficients a[k] are rational numbers with small numerators and denominators, so a multiplication a[k]*x is not a long multiplication (usually, either a[k] or the ratio a[k]/a[k-1] is a short rational number). Then we can organize the calculation in a rectangular array with c columns and r rows like this,

a[0]+a[r]*x^r+...+a[(c-1)*r]*x^((c-1)*r)+

x*(a[1]+a[r+1]*x^r+...+a[(c-1)*r+1]*x^((c-1)*r+1))+

...+

x^(r-1)*(a[r-1]+a[2*r+1]*x^r+...).

To evaluate this rectangle, we first compute x^r (which, if done by the fast binary algorithm, requires O(Ln(r)) long multiplications). Then we compute the c-1 successive powers of x^r, namely x^(2*r), x^(3*r), ..., x^((c-1)*r) in c-1 long multiplications. The partial sums in the r rows are evaluated column by column as more powers of x^r become available. This requires storage of r intermediate results but no more long multiplications by x. If a simple formula relating the coefficients a[k] and a[k-1] is available, then a whole column can be computed and added to the accumulated row values using only short operations, e.g. a[r+1]*x^r can be computed from a[r]*x^r (note that each column contains some consecutive terms of the series). Otherwise, we would need to multiply each coefficient a[k] separately by the power of x; if the coefficients a[k] are short numbers, this is also a short operation. After this, we need r-1 more multiplications for the vertical summation of rows (using the Horner scheme). We have potentially saved time because we do not need to evaluate powers such as x^(r+1) separately, so we do not have to multiply x by itself quite so many times.

The total required number of long multiplications is r+c+Ln(r)-2. The minimum number of multiplications, given that r*c>=N, is around 2*Sqrt(N) at r<=>Sqrt(N)-1/2 (the formula r<=>Sqrt(N-Sqrt(N)) can be used with an integer square root algorithm). Therefore, by arranging the Taylor series in a rectangle of sides r and c, we have obtained an algorithm which costs O(Sqrt(N)) instead of O(N) long multiplications and requires Sqrt(N) units of storage.

One might wonder if we should not try to arrange the Taylor series in a cube or another multidimensional matrix instead of a rectangle. However, calculations show that this does not save time: the optimal arrangement is the two-dimensional rectangle.

An additional speed-up is possible if the elementary function allows a transformation that reduces x and makes the Taylor series converge faster. For example, Ln(x)=2*Ln(Sqrt(x)), Cos(2*x)=2*Cos(x)^2-1, and Sin(3*x)=3*Sin(x)-4*Sin(x)^3 are such transformations. It may be faster to perform a number of such transformations before evaluating the Taylor series, if the time saved by its quicker convergence is more than the time needed to perform the transformations. The optimal number of transformations can be estimated. Using this technique in principle reduces the cost of Taylor series from O(Sqrt(N)) to O(N^(1/3)) long multiplications. However, additional roundoff error may be introduced by this procedure for some x.

Using asymptotic series for calculations

Several important analytic functions have asymptotic series expansions. For example, the error function Erf(x) and Euler's Gamma function Gamma(x) have asymptotic expansions at large (positive) x:

Erf(x)=1-e^(-x^2)/(x*Sqrt(Pi))*(1-1/(2*x^2)+...+(2*n-1)!! /(-2*x^2)^n+...),

Ln(Gamma(x))=(x-1/2)*Ln(x)-x+Ln(2*Pi)/2

+Sum(n,1,Infinity,B[2*n]/(2*n*(2*n-1)*x^(2*n-1)))

(here B[k] are Bernoulli numbers).

Usually, an asymptotic series does not actually converge for any x. This can be seen in the examples above. For instance, in the asymptotic series for Erf(x) the n-th term has (2*n-1)!! in the numerator which grows faster than n-th power of any number. So if we select a large value of x, the terms of the series decrease at first but then start to grow.

The way to use an asymptotic series for a numerical calculation is to truncate the series well before the terms start to grow. The error of this approximation is usually not greater than the first discarded term.

The inherent limitation of the method of asymptotic series is that for a given x, there will be a certain place in the series where the term has the minimum absolute value (after that, the series is unusable), and the error of the approximation cannot be smaller than that term.

For example, take the above asymptotic series for Erf(x). The logarithm of the absolute value of the n-th term can be estimated using Stirling's formula for the factorial as

Ln((2*n-1)!! /(2*x^2)^n)<=>n*(Ln(n)-1-2*Ln(x)).

This function of n has its minimum at n=x^2 where it is equal to -x^2. Therefore the best we can do with this series is to truncate it before this term. The resulting approximation to Erf(x) will have relative precision of order Exp(-x^2).

We find that for a given finite x, no matter how large, there is a maximum precision that can be achieved with the asymptotic series; getting more precision requires a different method.

However, sometimes the function we are evaluating allows identity transformations that relate f(x) to f(y) with y>x. For example, the Gamma function satisfies x*Gamma(x)=Gamma(x+1). In this case we can transform the function so that we would need to evaluate it at large enough x for the asymptotic series to give us enough precision.

The AGM sequence algorithms

Several algorithms are based on the arithmetic-geometric mean (AGM) sequence. If one takes two numbers a, b and computes their arithmetic mean and their geometric mean, (a+b)/2 and Sqrt(a*b), then the two means are generally much closer to each other than the original numbers. Repeating this process creates a rapidly converging sequence.

More formally, one can define the function of two arguments AGM(x,y) as the limit of the sequence a[k] where a[k+1]=1/2*(a[k]+b[k]), b[k+1]=Sqrt(a[k]*b[k]), and the initial values are a[0]=x, b[0]=y. (The limit of the sequence b[k] is the same.) This function is obviously linear, AGM(c*x,c*y)=c*AGM(x,y), so in principle it is enough to compute AGM(1,x) or arbitrarily select c for convenience.

Gauss and Legendre have shown that the limit of the AGM sequence is related to the complete elliptic integral by

Pi/2*1/AGM(a,Sqrt(a^2-b^2))=Integrate(x,0,Pi/2)1/Sqrt(a^2-b^2*Sin(x)^2).

(Here 0<b<a.) This integral can be rearranged to provide some other useful functions. For example, with suitable parameters a and b, this integral is equal to 1. Thus, one obtains a fast method of computing Pi.

The AGM sequence is also defined for complex values a, b. It requires to take a square root Sqrt(a*b), which needs a branch cut to be defined. Selecting the natural cut along the negative real semiaxis (Re(x)<0, Im(x)=0), we obtain an AGM sequence that converges for any initial values x, y with positive real part.

Let us estimate the convergence rate of the AGM sequence starting from x, y, following Brent's paper rpb028 (see reference below). Clearly the worst case is when the numbers x and y are very different (one is much larger than another). In this case the numbers a[k], b[k] become approximately equal after about k=1/Ln(2)*Ln(Abs(Ln(x/y))) iterations (note: Brent's paper online mistypes this as 1/Ln(2)*Abs(Ln(x/y))). Then one needs about Ln(n)/Ln(2) more iterations to make the first n (decimal) digits of a[k] and b[k] coincide, because the relative error epsilon=1-b/a decays approximately as epsilon[k]=1/8*Exp(-2^k).

Unlike the Newton iteration, the AGM sequence does not correct errors and all elements need to be computed with full precision. Actually, slightly more precision is needed to compensate for accumulated roundoff error. Brent (in paper rpb028) says that O(Ln(Ln(n))) bits of accuracy are lost to roundoff error if there are total of n iterations.

The AGM sequence can be used for fast computations of Pi, Ln(x) and ArcTan(x). However, currently the limitations of Yacas internal math make these methods less efficient than simpler methods based on Taylor series and Newton iterations.