The adaptive plotting routine Plot2D'adaptive uses a simple algorithm to select the optimal grid to approximate a function f(x). The same algorithm for adaptive grid refinement could be used for numerical integration. The idea is that plotting and numerical integration require the same kind of detailed knowledge about the behavior of the function.
The algorithm first splits the interval into a specified initial number of equal subintervals, and then repeatedly splits each subinterval in half until the function is well enough approximated by the resulting grid. The integer parameter depth gives the maximum number of binary splittings for a given initial interval; thus, at most 2^depth additional grid points will be generated. The function Plot2D'adaptive should return a list of pairs of points {{x1,y1}, {x2,y2}, ...} to be used directly for plotting.
The adaptive bisection algorithm works like this:
This algorithm works well if the initial number of points and the depth parameter are large enough.
Singularities in the function are handled by the step 3. Namely, the algorithm checks whether the function returns a non-number (e.g. Infinity) and if so, the sign change is always considered to be "too rapid". Thus, the intervals immediately adjacent to the singularity will be plotted at the highest allowed refinement level. When plotting the resulting data, the singular points are simply not printed to the data file and the plotting programs do not have any problems.
The meaning of Newton-Cotes quadrature coefficients is that an integral is approximated as
The solution of this system always exists and gives quadrature coefficients as rational numbers. For example, the Simpson quadrature c[0]=1/6, c[1]=2/3, c[2]=1/6 is obtained with n=2.
In the same way it is possible to find quadratures for the integral over a subinterval rather than over the whole interval of x. In the current implementation of the adaptive plotting algorithm, two quadratures are used: the 3-point quadrature ( n=2) and the 4-point quadrature ( n=3) for the integral over the first subinterval, Integrate(x,a[0],a[1])f(x). Their coefficients are (5/12, 2/3, -1/12) and ( 3/8, 19/24, -5/24, 1/24).
A different question is whether a CAS really needs to be able to evaluate, say, 10,000 digits of the value of a Bessel function of some complex argument. It is almost certain that no applied problem of natural sciences would need floating-point computations of special functions with such a high precision. However, arbitrary-precision computations are certainly useful in some mathematical applications; e.g. some numerical identities can be first guessed by a floating-point computation with many digits and then proved (by hand).
Very high precision computations of special functions might be useful in the future. It is however quite clear that computations with moderately high precision (50 or 100 digits) are useful for applied problems. Implementing an efficient algorithm that computes 100 digits of Sin(3/7) already involves many of the issues that would also be relevant for a 10,000 digit computation. Modern algorithms allow evaluations of all elementary functions in time that is asymptotically logarithmic in the number of digits P and linear in the cost of long multiplication (usually denoted M(P)). All special functions can be evaluated in time that is linear in P and in M(P).
Therefore Yacas strives to implement all of its numerical functions in arbitrary precision. All integer or rational functions return exact results, and all floating-point functions return their value with P correct decimal digits. The current value of P is accessed as GetPrecision() and may be changed by Precision(...).
Implementing an arbitrary-precision floating-point computation of a function f(x), such as f(x)=Exp(x), typically needs the following:
Selecting algorithms for computations are the most non-trivial part of the implementation. We want to achieve arbitrarily high precision, so somehow we need to find either a series, or a continued fraction, or a sequence given by explicit formula, that converges to the function in a controlled way. It is not enough to use a table of precomputed values or an approximation formula that has a limited precision. Recently (1980--2000), the interest in research of arbitrary-precision computations grew and many efficient algorithms for elementary and special functions were published.
In most cases it is imperative to know in advance how many iterations are needed for given x, P. This knowledge allows to estimate the computational cost, in terms of the required precision P and of the cost of long multiplication M(P). Typically all operations will fall into the following categories (sorted by the increasing cost):
Some algorithms (e.g. Taylor series summation) also need extra storage space, typically of order P*Ln(P) (i.e. O(Ln(P)) temporary P-digit numbers).
Here is a worked-out example of how we could estimate the required number of terms in the power series
Our final result is that it is enough to take
As the required precision P grows, an arbitrary-precision algorithm will need more iterations or more terms of the series. So the round-off error introduced by every floating-point operation will increase. When doing arbitrary-precision computations, we can always perform all calculations with a few more digits and compensate for round-off error. It is however imperative to know in advance how many more digits we need to take for our "working precision". We should also take that increase into account when estimating the total cost of the method. Of course, we should avoid very large roundoff errors; "repeat everything with twice as many digits" is not an adequate solution.
Here is a simple estimate of the round-off error in a computation of n terms of a power series. Suppose that adding each term requires two multiplications and one addition. If all calculations are performed with relative precision epsilon=10^(-P), then the accumulated relative round-off error is 3*n*epsilon. (This is an optimistic estimate that needs to be verified in each particular case; what if the series begins with some large terms but converges to a very small value?) If the relative error is 3*n*epsilon, it means that our answer is something like a*(1+3*n*epsilon) where a is the correct answer. We can see that out of the total P digits of this answer, only the first k decimal digits are correct, where k= -Ln(3*n*epsilon)/Ln(10). In other words, we have lost P-k digits because of accumulated roundoff error. So we found that we need P-k=Ln(3*n)/Ln(10) extra decimal digits to compensate for this round-off error.
In the previous exercise we found the number of terms n for Exp(x). So now we know how many extra digits of working precision we need for this particular case.
Below we shall have to perform similar estimates of the required number of terms and of the accumulated round-off error in our analysis of the algorithms.
Warning: calculations with internal Yacas math using precision exceeding 10,000 digits are currently impractically slow.
In some algorithms it is necessary to compute the integer parts of expressions such as a*Ln(b)/Ln(10) or a*Ln(10)/Ln(2) where a, b are short integers of order O(P). Such expressions are frequently needed to estimate the number of terms in the Taylor series or similar parameters of the algorithms. In these cases, it is important that the result is not underestimated but it would be wasteful to compute Ln(10)/Ln(2) in floating point only to discard most of that information by taking the integer part of say 1000*Ln(10)/Ln(2). It is more efficient to approximate such constants from above by short rational numbers, for example, Ln(10)/Ln(2)<28738/8651 and Ln(2)<7050/10171. The error of such an approximation will be small enough for practical purposes. The function NearRational can be used to find optimal rational approximations. The function IntLog (see below) efficiently computes the integer part of a logarithm (for an integer base, not a natural logarithm). If more precision is desired in calculating Ln(a)/Ln(b) for integer a, b, one can compute IntLog(a^k,b) for some integer k and then divide by k.
The exponent E is easier to obtain:
Once we found E, we can write x=10^(E+m) where m=Exp(1000)/Ln(10)-E is a floating-point number, 0<m<1. Then M=10^m. To find M with P (decimal) digits, we need m with also at least P digits. Therefore, we actually need to evaluate Exp(1000)/Ln(10) with 434+P decimal digits before we can find P digits of the mantissa of x. In other words, we need a high-precision calculation even to find the first digit of x, because it is necessary to find the exponent E exactly as an integer.
What about finding Sin(x)? This is extremely difficult, because to find even the first digit of Sin(x) we have to evaluate x with absolute error at most 0.5. We know, however, that the number x has approximately 10^434 digits before the decimal point. Therefore, we would need to calculate x with at least that many digits which is clearly far beyond the capability of modern computers. It seems unlikely that even the sign of Sin(Exp(Exp(1000))) will be obtained in the near future, and even more unlikely that it would be of any use to anybody even if it could be obtained.
Suppose that N is the largest integer that our arbitrary-precision facility can reasonably handle. (For Yacas internal math library, N is about 10^10000.) Then we see that numbers x of order 10^N can be calculated with at most one (1) digit of floating-point precision, while larger numbers cannot be calculated with any precision at all.
It seems that very large numbers can be obtained in practice only through exponentiation or powers. It is unlikely that such numbers will arise from sums or products of reasonably-sized numbers in some formula.
If numbers larger than 10^N are created only by exponentiation operations, then special exponential notation could be used to represent them. For example, a very large number z could be stored and manipulated as an unevaluated exponential z=Exp(M*10^E) where M>=1 is a floating-point number with P digits of mantissa and E is an integer, 0<E<N. Let us call such objects "exponentially large numbers" or "exp-numbers" for short.
In practice, we should decide on a threshold value N and promote a number to an exp-number when its decimal exponent exceeds N.
Note that an exp-number z might be known to be positive or negative, e.g. z= -Exp(M*10^E).
Arithmetic operations can be applied to the exp-numbers, although we should warn that exp-large arithmetic is of limited use because of almost certainly critical loss of precision. The power and logarithm operations can be meaningfully performed on exp-numbers z. For example, if z=Exp(M*10^E) and p is a normal floating-point number, then z^p=Exp(p*M*10^E) and Ln(z)=M*10^E. We can also multiply or divide two exp-numbers. But it makes no sense to multiply an exp-number z by a normal number because we cannot represent the difference between z and say 2.52*z. Similarly, adding z to anything else would result in a total underflow, since we do not actually know a single digit of the decimal representation of z. So if z1 and z2 are exp-numbers, then z1+z2 is simply equal to either z1 or z2 depending on which of them is larger. Therefore, an exp-number z acts as an effective "infinity" compared with normal numbers. But they should not be used as a device for computing limits: the unavoidable underflow will almost certainly produce wrong results. For example, trying to verify
Taking a logarithm of an exp-number brings it back to the realm of normal, representable numbers. However, taking an exponential of an exp-number results in a number which is not representable even as an exp-number. This is because an exp-number z needs to have its exponent E represented exactly as an integer, but Exp(z) has an exponent of order O(z) which is not a representable number. The monstrous number Exp(z) could be only written as Exp(Exp(M*10^E)), i.e. as a "doubly exponentially large" number, or "2-exp-number" for short. Thus we obtain a hierarchy of iterated exp-numbers. Each layer is "unrepresentably larger" than the previous one.
The same considerations apply to very small numbers of the order 10^(-N) or smaller. Such numbers can be manipulated as "exponentially small numbers", i.e. expressions of the form Exp(-M*10^E) with floating-point mantissa M>=1 and integer E satisfying 0<E<N.
Similarly to exponentially large numbers, only some arithmetic operations can be meaningfully performed. Exponentially small numbers act as an effective zero compared with normal numbers.
Taking a logarithm of an exp-small number makes it again a normal representable number. However, taking an exponential of an exp-small number produces 1 because of underflow. To obtain a "doubly exponentially small" number, we need to take a reciprocal of a doubly exponentially large number, or take the exponent of an exponentially large negative power. In other words, Exp(-M*10^E) is exp-small, while Exp(-Exp(M*10^E)) is 2-exp-small.
The practical significance of exp-numbers is rather limited. We cannot obtain even a single significant digit of an exp-number. A "computation" with exp-numbers is essentially a floating-point computation with logarithms of these exp-numbers. A practical problem that needs numbers of this magnitute can probably be restated in terms of more manageable logarithms of such numbers. In practice, exp-numbers could be useful not as a means to get a numerical answer, but as a warning sign of critical overflow or underflow.
The algorithm for converting a rational number r=n/m into a continued fraction is simple. First, we determine the integer part of r, which is Div(n,m). If it is negative, we need to subtract one, so that r=n[0]+x and the remainder x is nonnegative and less than 1. The remainder x=Mod(n,m)/m is then inverted, r[1]:=1/x=m/Mod(n,m) and so we have completed the first step in the decomposition, r=n[0]+1/r[1]; now n[0] is integer but r[1] is perhaps not integer. We repeat the same procedure on r[1], obtain the next integer term n[1] and the remainder r[2] and so on, until such n that r[n] is an integer and there is no more work to do. This process will always terminate because all floating-point values are actually rationals in disguise.
Continued fractions are useful in many ways. For example, if we know that a certain number x is rational but have only a floating-point representation of x with a limited precision, say, 1.5662650602409638, we can try to guess its rational form (in this example x=130/83). The function GuessRational uses continued fractions to find a rational number with "optimal" (small) numerator and denominator that is approximately equal to a given floating-point number.
Consider the following example. The number 17/3 has a continued fraction expansion {5,1,2}. Evaluated as a floating point number with limited precision, it may become something like 17/3+0.00001, where the small number represents a roundoff error. The continued fraction expansion of this number is {5, 1, 2, 11110, 1, 5, 1, 3, 2777, 2}. The presence of an unnaturally large term 11110 clearly signifies the place where the floating-point error was introduced; all terms following it should be discarded to recover the continued fraction {5,1,2} and from it the initial number 17/3.
If a continued fraction for a number x is cut right before an unusually large term, and evaluated, the resulting rational number is very close to close to x but has an unusually small denominator. This works because partial continued fractions provide "optimal" rational approximations for the final (irrational) number, and because the magnitude of the terms of the partial fraction is related to the magnitude of the denominator of the resulting rational approximation.
GuessRational(x, prec) needs to choose the place where it should cut the continued fraction. The algorithm for this is somewhat less precise than it could be but it works well enough. The idea is to cut the continued fraction when adding one more term would change the result by less than the specified precision. To realize this in practice, we need an estimate of how much a continued fraction changes when we add one term.
The routine GuessRational uses a (somewhat weak) upper bound for the difference of continued fractions that differ only by an additional last term:
The above estimate for delta hinges on the inequality
This algorithm works well if x is computed with enough precision; namely, it must be computed to at least as many digits as there are in the numerator and the denominator of the fraction combined. Also, the parameter prec should not be too large (or else the algorithm will find another rational number with a larger denominator that approximates x "better" than the precision to which you know x).
The related function NearRational(x, prec) works somewhat differently. The goal is to find an "optimal" rational number, i.e. with smallest numerator and denominator, that is within the distance 10^(-prec) of a given value x. The algorithm for this comes from the 1972 HAKMEM document, Item 101C. Their description is terse but clear:
Problem: Given an interval, find in it the rational number with the smallest numerator and denominator. Solution: Express the endpoints as continued fractions. Find the first term where they differ and add 1 to the lesser term, unless it's last. Discard the terms to the right. What's left is the continued fraction for the "smallest" rational in the interval. (If one fraction terminates but matches the other as far as it goes, append an infinity and proceed as above.) |
The HAKMEM text (M. Beeler, R. W. Gosper, and R. Schroeppel: Memo No. 239, MIT AI Lab, 1972, available as HTML online from various places) contains several interesting insights relevant to continued fractions and other numerical algorithms.
In this section we describe two methods that compute a continued fraction: the simple "bottom-up" and the more complicated "top-down" method. The "bottom-up" method is faster but requires to know the number of terms in advance. The "top-down" method essentially replaces the continued fraction by a sum of a certain series. This method provides an automatic error estimate and can be used to evaluate a continued fraction with more and more terms until the desired precision is achieved. The formula for the precision of the continued fraction approximation used in the "top-down" method sometimes allows to estimate the number of terms in advance. If we know the number of terms, then the faster "bottom-up" method can be used.
A continued fraction
The "bottom-up" method is to select some value of n and start evaluating the fraction from the bottom upwards. First we take F[n][n]=a[n] and then we use the obvious relation of backward recurrence,
An alternative implementation may be to imagine that all a[i], b[i] are integers, and to obtain the numerator and the denominator of F[0][n] separately as two simultaneous backward recurrences. If F[m+1][n]=p[m+1]/q[m+1], then p[m]=a[m]*p[m+1]+b[m]*q[m+1] and q[m]=p[m+1]. This method would start with p[n]=a[n], q[n]=1 and require two long multiplications at each step; the only division will be performed at the very end. Sometimes this method allows to reduce the roundoff error.
There is an improvement to this method that can sometimes increase the achieved precision without computing more terms (this is suggested in the Tsimring book). Namely, for the starting value of the recurrence we should choose not a[n] but another number that more closely approximates the infinite remainder of the fraction. The infinite remainder, which can be symbolically written as F[n][Infinity], can be sometimes estimated (obviously, we are not able to compute it exactly). In simple cases, F[n][Infinity] changes very slowly at large n. Suppose that F[n][Infinity] is approximately constant; then it must be approximately equal to F[n+1][Infinity]. Therefore, if we solve the (quadratic) equation
The "bottom-up" method obviously requires to know the number of terms n in advance; calculations have to be started all over again if more terms are needed. Also, it provides no error estimate.
The "top-down" method is slower but provides an error estimate and allows to compute the fraction in the forward direction. The idea
The formula for f[k] is the following. First the auxiliary sequence P[k], Q[k] for k>=1 needs to be defined by P[1]=0, Q[1]=1, and P[k+1]:=b[k]*Q[k], Q[k+1]:=P[k]+a[k]*Q[k]. Then define f[0]:=a[0] and
Note that typically all terms a[i], b[i] are positive, so the series we obtained will have terms f[k] of alternating signs. To reduce the round-off error,
Here is a straightforward derivation of the above formula for f[k]. We need to compute the difference between successive approximations F[0][n] and F[0][n+1]. The recurrence relation we shall use is
Now the problem is to simplify the two long products in the denominator. We notice that F[1][n] has F[2][n] in the denominator, and therefore F[1][n]*F[2][n]=F[2][n]*a[1]+b[1]. The next product is F[1][n]*F[2][n]*F[3][n] and it simplifies to a linear function of F[3][n], namely F[1][n]*F[2][n]*F[3][n] = (b[1]+a[1]*a[2])*F[3][n]+b[1]*a[2]. So we can see that there is a general formula
Having found the coefficients P[k], Q[k], we can now rewrite the long products in the denominator, e.g.
The formula that we derived for f[n+1] gives an error estimate for the continued fraction truncated at the n-th term. But this formula contains numbers Q[k] in the denominator. The problem now is to find how quickly the sequence Q[k] grows.
It is not always easy to get a handle on this sequence; in most cases there is no closed-form expression for Q[k]. The recurrence relation for it can be rewritten as
A simple lower bound on the growth can be quickly obtained from this recurrence relation. Assume that a[k]>0, b[k]>0. It is clear that all Q[k] are positive, and so Q[k+1]>=a[k]*Q[k]. Therefore Q[k] grows at least as the product of all a[k]:
We have successfully used this bound to estimate the relative error of the continued fraction expansion for ArcTan(x) at small x. However, at large x this bound becomes greater than 1. This does not mean that the continued fraction does not converge, it merely indicates that the bound is too weak in some cases. The sequence Q[k] actually grows faster than the product of all a[k] and we sometimes a tighter bound on this growth.
We shall now consider another example when the growth of this sequence can be estimated analytically. The complementary error function Erfc(x) can be computed using an auxiliary continued fraction
The simple bound would give Abs(f[n+1])<=v^n*n! and this expression grows with n. But we know that our continued fraction actually converges for any v and f[n+1] must tend to zero for large n. It seems that the simple bound is not strong enough for any v and we need a better bound.
The asymptotic growth of the sequence Q[k] can be estimated in this case by the method of stationary phase, or Laplace's method. (See, e.g., F. W. J. Olver, Asymptotics and special functions, Academic Press, 1974, ch. 3, sect. 7.5.) This method is somewhat complicated but quite powerful. The method requires that we find an integral representation for Q[k], and then it gives the dominant contribution to the integral as an asymptotic series in k^(-1). We actually only need the leading terms of the series which are easy to find (even without a CAS).
An integral representation for Q[k] can be obtained using the method of generating functions. Consider a function G(s) defined by the infinite series
It is generally the case that if we know a simple linear recurrence relation for a sequence, then we can also easily find its generating function. The generating function will satisfy a linear differential equation. To guess this equation, we write down the series for G(s) and its derivative D(s)G(s) and try to find their linear combination which is identically zero because of the recurrence relation. (There is, of course, a computer algebra algorithm for doing this automatically.)
In the case of our sequence Q[k] above, we find that
The second step is to obtain an integral representation for Q[k]. This is possible because Q[k+1] is equal to the k-th derivative of the generating function at s=0:
Now we use the fact that derivatives of Fourier transforms are "easy". The generating function G(s) has the following Fourier representation,
Now we can use the method of stationary phase. We represent the integrand as an exponential of some function g(t,n) and find "stationary points" where this function has local maxima:
We find that in our case, two peaks occur at approximately t1<=> -1/2+Sqrt(n*v) and t2<=> -1/2-Sqrt(n*v). We assume that n is large enough so that n*v>1/2. Then the first peak is at a positive t and the second peak is at a negative t.
The contribution of the peaks can be computed from the Taylor approximation of g(t,n) near the peaks. We can expand, for example,
We find, after some algebra, that the leading asymptotic of Q[k] is
Note that this is not merely a bound but an actual asymptotic estimate of Abs(f[n+1]). (By the way, Stirling's formula can be also derived using the method of stationary phase from an integral representation of the Gamma function, in a similar way.)
We find that indeed the error decreases with n in a geometric progression (i.e. roughly as some constant to the power of n).
The basic formula is widely known: If f(x)=0 must be solved, one starts with a value of x that is close to some root and iterates
Newton's method is particularly convenient for multiple precision calculations because of its insensitivity to accumulated errors: if x[k] at some iteration is found with a small error, the error will be corrected at the next iteration. Therefore it is not necessary to compute all iterations with the full required precision; each iteration needs to be performed at the precision of the root expected from that iteration. For example, if we know that the initial approximation is accurate to 3 digits, then (assuming quadratic convergence)
However, Newton's method suffers from sensitivity to the initial guess. If the initial value x[0] is not chosen sufficiently close to the root, the iterations may converge very slowly or not converge at all. To remedy this, one can combine Newton's iteration with simple bisection. Once the root is bracketed inside an interval (a, b), one checks whether (a+b)/2 is a better approximation for the root than that obtained from Newton's iteration. This guarantees at least linear convergence in the worst case.
For some equations f(x)=0, Newton's method converges faster; for example, solving Sin(x)=0 in the neighborhood of x=3.14159 gives "cubic" convergence, i.e. the number of correct digits is tripled at each step. This happens because Sin(x) near its root x=Pi has vanishing second derivative and thus the function is particularly well approximated by a straight line.
Halley's method is an improvement to Newton's method that makes each equation well approximated by a straight line near the root. Edmund Halley computed fractional powers, x=a^(1/n), by the iteration
Halley's method can be generalized to any function f(x). A cubically convergent iteration is always obtained if we replace the equation f(x)=0 by an equivalent equation
The Halley iteration for the equation f(x)=0 can be written as
Halley's iteration, despite its faster convergence rate, may be more cumbersome to evaluate than Newton's iteration and so it may not provide a more efficient numerical method for some functions. Only in some special cases is Halley's iteration just as simple to compute as Newton's iteration. But Halley's method has another advantage: it is generally less sensitive to the choice of the initial point x[0]. An extreme example of sensitivity to the initial point is the equation x^(-2)=12 for which Newton's iteration x'=3*x/2-6*x^3 converges to the root only from initial points 0<x[0]<0.5 and wildly diverges otherwise, while Halley's iteration converges to the root from any x[0]>0.
It is at any rate not true that Halley's method always converges better than Newton's method. For instance, it diverges on the equation 2*Cos(x)=x unless started at x[0] within the interval (-1/6*Pi, 7/6*Pi). Another example is the equation Ln(x)=a. This equation allows to compute x=Exp(a) if a fast method for computing Ln(x) is available (e.g. the AGM-based method). For this equation, Newton's iteration
When it converges, Halley's iteration can still converge very slowly for certain functions f(x), for example, for f(x)=x^n-a if n^n>a. For such functions that have very large and rapidly changing derivatives, no general method can converge faster than linearly. In other words, a simple bisection will generally do just as well as any sophisticated iteration, until the root is approximated relatively precisely. Halley's iteration combined with bisection seems to be a good choice for such problems.
For practical evaluation, iterations must be supplemented with error control. For example, if x0 and x1 are two consecutive approximations that are already very close, we can quickly compute the achieved (relative) precision by finding the number of leading zeros in the number Abs(x0-x1)/Max(x0,x1). This is easily done using the integer logarithm. After performing a small number of initial iterations at low precision, we can make sure that x1 has at least a certain number of correct digits of the root. Then we know which precision to use for the next iteration (e.g. triple precision if we are using a cubically convergent scheme). It is important to perform each iteration at the precision of the root which it will give and not at a higher precision; this saves a great deal of time since multiple-precision calculations quickly become very slow at high precision.
If we do not know in advance how many terms of the Taylor series we need, we cannot do any better than just evaluate each term and check if it is already small enough. So in this case we will have to do O(N) long multiplications. However, we can organize the calculation much more efficiently if we can estimate the necessary number of terms and if we can afford some storage. A "rectangular" algorithm uses 2*Sqrt(N) long multiplications (assuming that the coefficients of the series are short rational numbers) and Sqrt(N) units of storage. (See paper: D. M. Smith, Efficient multiple-precision evaluation of elementary functions, 1985.)
Suppose we need to evaluate Sum(k,0,N,a[k]*x^k) and we know the number of terms N in advance. Suppose also that the coefficients a[k] are rational numbers with small numerators and denominators, so a multiplication a[k]*x is not a long multiplication (usually, either a[k] or the ratio a[k]/a[k-1] is a short rational number). Then we can organize the calculation in a rectangular array with c columns and r rows like this,
The total required number of long multiplications is r+c+Ln(r)-2. The minimum number of multiplications, given that r*c>=N, is around 2*Sqrt(N) at r<=>Sqrt(N)-1/2 (the formula r<=>Sqrt(N-Sqrt(N)) can be used with an integer square root algorithm). Therefore, by arranging the Taylor series in a rectangle of sides r and c, we have obtained an algorithm which costs O(Sqrt(N)) instead of O(N) long multiplications and requires Sqrt(N) units of storage.
One might wonder if we should not try to arrange the Taylor series in a cube or another multidimensional matrix instead of a rectangle. However, calculations show that this does not save time: the optimal arrangement is the two-dimensional rectangle.
An additional speed-up is possible if the elementary function allows a transformation that reduces x and makes the Taylor series converge faster. For example, Ln(x)=2*Ln(Sqrt(x)), Cos(2*x)=2*Cos(x)^2-1, and Sin(3*x)=3*Sin(x)-4*Sin(x)^3 are such transformations. It may be faster to perform a number of such transformations before evaluating the Taylor series, if the time saved by its quicker convergence is more than the time needed to perform the transformations. The optimal number of transformations can be estimated. Using this technique in principle reduces the cost of Taylor series from O(Sqrt(N)) to O(N^(1/3)) long multiplications. However, additional roundoff error may be introduced by this procedure for some x.
Usually, an asymptotic series does not actually converge for any x. This can be seen in the examples above. For instance, in the asymptotic series for Erf(x) the n-th term has (2*n-1)!! in the numerator which grows faster than n-th power of any number. So if we select a large value of x, the terms of the series decrease at first but then start to grow.
The way to use an asymptotic series for a numerical calculation is to truncate the series well before the terms start to grow. The error of this approximation is usually not greater than the first discarded term.
The inherent limitation of the method of asymptotic series is that for a given x, there will be a certain place in the series where the term has the minimum absolute value (after that, the series is unusable), and the error of the approximation cannot be smaller than that term.
For example, take the above asymptotic series for Erf(x). The logarithm of the absolute value of the n-th term can be estimated using Stirling's formula for the factorial as
We find that for a given finite x, no matter how large, there is a maximum precision that can be achieved with the asymptotic series; getting more precision requires a different method.
However, sometimes the function we are evaluating allows identity transformations that relate f(x) to f(y) with y>x. For example, the Gamma function satisfies x*Gamma(x)=Gamma(x+1). In this case we can transform the function so that we would need to evaluate it at large enough x for the asymptotic series to give us enough precision.
More formally, one can define the function of two arguments AGM(x,y) as the limit of the sequence a[k] where a[k+1]=1/2*(a[k]+b[k]), b[k+1]=Sqrt(a[k]*b[k]), and the initial values are a[0]=x, b[0]=y. (The limit of the sequence b[k] is the same.) This function is obviously linear, AGM(c*x,c*y)=c*AGM(x,y), so in principle it is enough to compute AGM(1,x) or arbitrarily select c for convenience.
Gauss and Legendre have shown that the limit of the AGM sequence is related to the complete elliptic integral by
The AGM sequence is also defined for complex values a, b. It requires to take a square root Sqrt(a*b), which needs a branch cut to be defined. Selecting the natural cut along the negative real semiaxis (Re(x)<0, Im(x)=0), we obtain an AGM sequence that converges for any initial values x, y with positive real part.
Let us estimate the convergence rate of the AGM sequence starting from x, y, following Brent's paper rpb028 (see reference below). Clearly the worst case is when the numbers x and y are very different (one is much larger than another). In this case the numbers a[k], b[k] become approximately equal after about k=1/Ln(2)*Ln(Abs(Ln(x/y))) iterations (note: Brent's paper online mistypes this as 1/Ln(2)*Abs(Ln(x/y))). Then one needs about Ln(n)/Ln(2) more iterations to make the first n (decimal) digits of a[k] and b[k] coincide, because the relative error epsilon=1-b/a decays approximately as epsilon[k]=1/8*Exp(-2^k).
Unlike the Newton iteration, the AGM sequence does not correct errors and all elements need to be computed with full precision. Actually, slightly more precision is needed to compensate for accumulated roundoff error. Brent (in paper rpb028) says that O(Ln(Ln(n))) bits of accuracy are lost to roundoff error if there are total of n iterations.
The AGM sequence can be used for fast computations of Pi, Ln(x) and ArcTan(x). However, currently the limitations of Yacas internal math make these methods less efficient than simpler methods based on Taylor series and Newton iterations.