Powers

This procedure should be recursive, or else it requires to obtain the bits
of the number ** n** in reverse order.
It is easier to implement this algorithm in a slightly different form: suppose we have the bits ** b[i]** of the number **n**, so that ** n=b[0]+2*b[1]+...+b[m]*2^m**. Then

In Yacas script form, the algorithm looks like this:

power(x_IsPositiveInteger, n_IsPositiveInteger)<-- [ Local(result); result:=1; While(n != 0) [ if (IsOdd(n)) result := result*x; x := x*x; n := n>>1; ]; result; ]; |

Since the "binary squaring" algorithm scans bits in the number ** n**,
the execution time is linear in the number of
digits of ** n**, i.e. logarithmic in ** n**.

Roots

For integer **N**, the following steps are performed:

- Find the highest bit set,
**l2**, in the number**N**. **1<<(l2/2)**is at least a bit that is set in the end result. Start by setting that bit in the result,**u=1<<l2**. It is also the highest bit set.- Now, traverse all the lower bits, one by one. For each lower
bit, starting at
**lnext=l2-1**, set**v=1<<lnext**. Now,**(u+v)^2=u^2+2*u*v+v^2**. If**(u+v)^2<=N**, then the bit set in**v**should also be set in the result,**u**, otherwise that bit should be cleared. - Set
**lnext=lnext-1**, and repeat until all bits are tested, and**lnext=0**and return the result found.

The intermediate results, ** u^2**, ** v^2** and ** 2*u*v** can be
maintained easily too, due to the nature of the numbers
involved (** v** having only one bit set, and it being known which bit that
is).

For floating point numbers, first the required number of decimals ** p** after
the decimal point is determined. Then the input number ** N** is
multiplied by a power of 10 until it has ** 2*p** decimal. Then the integer square root calculation
is performed, and the resulting number has ** p** digits of precision.

Below is some ** Yacas** script code to perform the calculation for integers.

//sqrt(1) = 1, sqrt(0) = 0 10 # BisectSqrt(0) <-- 0; 10 # BisectSqrt(1) <-- 1; 20 # BisectSqrt(N_IsPositiveInteger) <-- [ Local(l2,u,v,u2,v2,uv2,n); // Find highest set bit, l2 u := N; l2 := 0; While (u!=0) [ u:=u>>1; l2++; ]; l2--; // 1<<(l2/2) now would be a good under estimate // for the square root. 1<<(l2/2) is definitely // set in the result. Also it is the highest // set bit. l2 := l2>>1; // initialize u and u2 (u2==u^2). u := 1 << l2; u2 := u << l2; // Now for each lower bit: l2--; While( l2 != 0 ) [ // Get that bit in v, and v2 == v^2. v := 1<<l2; v2 := v<<l2; // uv2 == 2*u*v, where 2==1<<1, and // v==1<<l2, thus 2*u*v == // (1<<1)*u*(1<<l2) == u<<(l2+1) uv2 := u<<(l2 + 1); // n = (u+v)^2 = u^2 + 2*u*v + v^2 // = u2+uv2+v2 n := u2 + uv2 + v2; // if n (possible new best estimate for // sqrt(N)^2 is smaller than N, then the // bit l2 is set in the result, and // add v to u. if( n <= N ) [ u := u+v; // u <- u+v u2 := n; // u^2 <- u^2 + 2*u*v + v^2 ]; l2--; ]; u; // return result, accumulated in u. ]; |

A separate function ** IntNthRoot** is provided to compute the integer part of

Since we only need the integer part of the root, it is enough to use integer division in the Halley iteration. The sequence **x[k]** will monotonically approximate the number **n^(1/s)** from below if we start from an initial guess that is less than the exact value. (We start from below so that we have to deal with smaller integers rather than with larger integers.) If **n=p^s**, then after enough iterations the floating-point value of ** x[k]** would be slightly less than **p**; our value is the integer part of ** x[k]**. Therefore, at each step we check whether **1+x[k]** is a solution of **x^s=n**, in which case we are done; and we also check whether **(1+x[k])^s>n**, in which case the integer part of the root is ** x[k]**.
To speed up the Halley iteration in the worst case when **s^s>n**, it is combined with bisection. The root bracket interval ** x1<x<x2** is maintained and the next iteration ** x[k+1]** is assigned to the midpoint of the interval if the Newton formula does not give sufficiently rapid convergence. The initial root bracket interval can be taken as **x[0]**, **2*x[0]**.

Real powers (as opposed to integer powers and roots) are computed by using the exponential and logarithm functions, **a^b=Exp(b*Ln(a))**.
This is also faster on integers than the integer root algorithm if **b** is very large.

Logarithm

The "integer logarithm", defined as the integer part of **Ln(x)/Ln(b)**, where **x** and ** b** are integers, is computed using a special routine ** IntLog(x, b)** with purely integer math.
This is much faster than evaluating the full logarithm when both arguments are integers and only the integer part of the logarithm is needed. The algorithm consists of (integer) dividing

The logarithm function ** Ln(x)** for general (complex) **x** can be computed using its Taylor series,

Currently the routine ** LnNum** uses the Halley method for the equation

A much faster algorithm based on the AGM sequence was given by Salamin (see R. P. Brent: *Multiple-precision zero-finding methods and the complexity of elementary function evaluation*, in *Analytic Computational Complexity*, ed. by J. F. Traub, Academic Press, 1975, p. 151; also available online from Oxford Computing Laboratory, as the paper ** rpb028**).
The formula is based on an asymptotic relation,

The required number of AGM iterations is approximately ** 2*Ln(P)/Ln(2)**. For
smaller values of **x** (but ** x>1**), one can either raise ** x** to a large
integer power ** r** (this is quick only if ** x** is an integer or a rational)
and compute ** 1/r*Ln(x^r)**, or multiply **x** by a large integer power of 2
(this is better for floating-point ** x**) and compute ** Ln(2^s*x)-s*Ln(2)**.
Here the required powers are

If **x<1**, then (**-Ln(1/x)**) is computed.
Finally, there is a special case when **x** is very close to 1, where the Taylor series converges quickly but the AGM algorithm requires to multiply ** x** by a large power of 2 and then subtract two almost equal numbers, leading to a great loss of precision. Suppose ** 1<x<1+10^(-M)**, where **M** is large (say of order ** P**). The Taylor series for ** Ln(1+epsilon)** needs about **N= -P*Ln(10)/Ln(epsilon)**=**P/M** terms.
If we evaluate the Taylor series using the rectangular scheme, we need
** 2*Sqrt(N)** multiplications and **Sqrt(N)** units of storage. On the other
hand, the main slow operation for the AGM sequence is the geometric mean **Sqrt(a*b)**. If **Sqrt(a*b)** takes an equivalent of **c** multiplications (Brent's estimate is ** c=13/2** but it may be more in practice), then the AGM sequence requires ** 2*c*Ln(P)/Ln(2)** multiplications. Therefore the Taylor series method is more efficient for

For larger **x>1+10^(-M)**, the AGM method is more efficient. It is necessary to increase the working precision to **P+M*Ln(2)/Ln(10)** but this does not decrease the asymptotic speed of the algorithm. To compute **Ln(x)** with **P** digits of precision for any ** x**, only ** O(Ln(P))** long multiplications are required.

Exponential

An alternative way to compute **x=Exp(a)** at large precision would be to solve the equation **Ln(x)=a** using a fast logarithm routine. A cubically convergent formula is obtained if we replace ** Ln(x)=a** by an equivalent equation

Calculation of

Efficient iterative algorithms for computing ** pi** with arbitrary precision have been recently developed by Brent, Salamin, Borwein and others. However, limitations of the current multiple-precision implementation in Yacas (compiled with the "internal" math option) make these advanced algorithms run slower because they require many more arbitrary-precision multiplications at each iteration.

The file ** examples/pi.ys** implements five different algorithms that
duplicate the functionality of

** PiMethod0()**,

Since ** pi** is a solution of ** Sin(x)=0**, one may start sufficiently close, e.g. at ** x0=3.14159265** and iterate ** x'=x-Tan(x)**. In fact it is faster to iterate
**x'=x+Sin(x)** which solves a different equation for **pi**. ** PiMethod0()** is the straightforward implementation of the latter iteration. A significant speed improvement is achieved by doing calculations at each iteration only with the precision of the root that we expect to get from that iteration. Any imprecision introduced by round-off will be automatically corrected at the next iteration.

If at some iteration ** x=pi+epsilon** for small ** epsilon**, then from the Taylor expansion of ** Sin(x)** it follows that the value **x'** at the next iteration will differ from ** pi** by ** O(epsilon^3)**. Therefore, the number of correct digits triples at each iteration. If we know the number of correct digits of **pi** in the initial approximation, we can decide in advance how many iterations to compute and what precision to use at each iteration.

The final speed-up in ** PiMethod0()** is to avoid computing at unnecessarily high precision. This may happen if, for example, we need to evaluate 200 digits of

Newton's method is based on approximating the function ** f(x)** by a straight line. One can achieve better approximation and therefore faster convergence to the root if one approximates the function with a polynomial curve of higher order. The routine ** PiMethod1()** uses the iteration

Both ** PiMethod0()** and

The routines ** PiBrentSalamin()** and

Trigonometric functions

Inverse trigonometric functions are computed by Newton's method (for ** ArcSin**) or by continued fraction expansion (for

By the identity ** ArcCos(x):=Pi/2-ArcSin(x)**, the inverse cosine is reduced to the inverse sine. Newton's method for **ArcSin(x)** consists of solving the equation **Sin(y)=x** for ** y**. Implementation is similar to the calculation of ** pi** in ** PiMethod0()**.

For ** x** close to 1, Newton's method for ** ArcSin(x)** converges very slowly. An identity

Inverse tangent can also be related to inverse sine by

Hyperbolic and inverse hyperbolic functions are reduced to exponentials and logarithms:
**Cosh(x)=1/2*(Exp(x)+Exp(-x))**, **Sinh(x)=1/2*(Exp(x)-Exp(-x))**, **Tanh(x)=Sinh(x)/Cosh(x)**,

The idea to use continued fraction expansions for ** ArcTan** comes from the book by
Jack W. Crenshaw,

The convergence of the continued fraction expansion of **ArcTan(x)** is indeed better than convergence of the Taylor series. Namely, the Taylor series converges only for **Abs(x)<1** while the continued fraction converges for all ** x**. However, the speed of its convergence is not uniform in ** x**; the larger the value of ** x**, the slower the convergence. The necessary number of terms of the continued fraction is in any case proportional to the required number of digits of precision, but the constant of proportionality depends on ** x**.

This can be understood by the following elementary argument. The difference between two partial continued fractions that differ only by one extra last term can be estimated by

If we compare the rate of convergence of the continued fraction for **ArcTan(x)** with the Taylor series

Factorials and binomial coefficients

There are two tasks related to the factorial: the exact integer calculation and an approximate calculation to some floating-point precision. Factorial of **n** has approximately ** n*Ln(n)/Ln(10)** decimal digits, so an exact calculation is practical only for relatively small **n**. In the current implementation, exact factorials for ** n>65535** are not computed but print an error message advising the user to avoid exact computations. For example, ** LnGammaNum(n+1)** is able to compute

Exact factorials

A second method uses a binary tree arrangement of the numbers **1**, ** 2**, ..., ** n** similar to the recursive sorting routine ("merge-sort"). If we denote by ** a *** b** the "partial factorial" product

Therefore, the tree method wins over the simple method if the cost of multiplication is lower than quadratic.

The tree method can also be used to compute "double factorials" (** n!!**). This is faster than to use the identities
**(2*n)!! =2^n*n!** and

Binomial coefficients **Bin(n,m)** are found by first selecting the smaller of **m**, ** n-m** and using the identity ** Bin(n,m)=Bin(n,n-m)**. Then a partial factorial is used to compute

Approximate factorials

Classical orthogonal polynomials: general case

In principle, one could choose any (non-negative) weight function ** rho(x)** and any interval [**a**, ** b**] and construct the corresponding family of orthogonal polynomials ** q _n(x)**.
For example, take **q _0=1**, then take ** q _1=x+c** with unknown ** c** and find ** c** such that ** q _0** and ** q _1** satisfy the orthogonality condition;
this requires solving a linear equation.
Then we can similarly find two unknown coefficients of ** q _2** and so on.
(This is called the Gramm-Schmidt orthogonalization procedure.)

But of course not all weight functions ** rho(x)** and not all intervals [**a**, ** b**] are equally interesting.
There are several "classical" families of orthogonal polynomials that have been of use to theoretical and applied science.
The "classical" polynomials are always solutions of a simple second-order differential equation and are always a specific case of some hypergeometric function.

The construction of "classical" polynomials always follows the same scheme.
The function ** rho(x)** must satisfy the so-called Pearson's equation,

If the function ** rho(x)** and the interval [**a**, ** b**] is chosen in this way, then the corresponding orthogonal polynomials ** q_n(x)** are solutions of the differential equation

(Deriv(x)(beta(x)*rho(x)*(Deriv(x)q_n(x))))-n*(alpha_1+(n+1)*beta_2)*q[n]=0 . |

The polynomials ** q_n(x)** are also given by the Rodrigues formula,

q_n(x) = A[n]/rho(x)*D(x,n)rho(x)*beta(x)^n , |

where ** A[n]** is a normalization constant.
It is usual practice to normalize the polynomials so that

(Integrate(x,a,b)(q_n(x))^2*rho(x))=1 . |

So the normalization constant can be chosen accordingly.

Finally, there is a formula for the generating function of the polynomials,

G(x,w) = Sum(n,0,Infinity, q_n(x)/n! *w^n) , |

where

q_n(x) = 1/rho(x)*D(x,n)rho(x)*beta(x)^n . |

The classical families of (normalized) orthogonal polynomials are obtained in this framework with the following definitions:

- The Legendre polynomials
**OrthoP(n,x)**:**a= -1**,**b=1**,**rho(x)=1**,**alpha(x)=0**,**beta(x)=1-x^2**,**A[n]=(-1)^n/(2^n*n!)**.

- The Laguerre polynomials
**(L^m)_n(x)**:**a=0**,**b= +Infinity**,**rho(x)=x^m*Exp(-x)**, (here**m> -1**or else the weight function is not integrable on the interval),**alpha(x)=m-x**,**beta(x)=x**,**A[n]=1**.

- The Hermite polynomials
**OrthoH(n,x)**:**a= -Infinity**,**b= +Infinity**,**rho(x)=Exp(-x^2)**,**alpha(x)= -2*x**,**beta(x)=1**,**A[n]=(-1)^n**.

- The Chebyshev polynomials of the first kind
**OrthoT(n,x)**:**a= -1**,**b=1**,**rho(x)=1/Sqrt(1-x^2)**,**alpha(x)=x**,**beta(x)=1-x^2**,**A[n]=(-1)^n/(2*n)!!**.

The Rodrigues formula or the generating function are not efficient ways to calculate the polynomials.
A better way is to use linear recurrence relations connecting **q[n+1]** with **q[n]** and **q[n-1]**.
(These recurrence relations can also be written out in full generality through **alpha(x)** and **beta(x)** but we shall save the space.)

There are three computational tasks related to orthogonal polynomials:

- Compute the coefficients of a given polynomial
**q _n(x)**exactly. The coefficients are usually rational numbers, but the numerators and denominators usually grow exponentially quickly with**n**, so that**n**-th polynomial has at least some coefficients as**n**-digit numbers. The array of coefficients can be obtained using recurrence relations**n**times. The required number of operations is proportional to**n^2**(because of**n**coefficients,**n**recurrence relations for each of them) and to the multiplication time of**n**-digit integers, i.e.**O(n^2*M(n))**. Sometimes an exact formula for the coefficients is available (this is the case for the four "classical" families of polynomials above; see the next section). Then the computation time dramatically drops down to**O(n^2)**because no recurrences are needed. - Compute the numerical value of a given polynomial
**q _n(x)**at given**x**, either exactly (at rational**x**) or approximately in floating point. This requires only**O(n*M(n))**operations for exact computation and**O(n*P)**operations in**P**-digit floating point. - Compute a series of orthogonal polynomials with given coefficients
**f[n]**, i.e.**f(x):=Sum(n,0,N,f[n]*q _n(x))**, at a given**x**. This task does not actually require computing the polynomials first, if we use the so-called Clenshaw-Smith procedure which gives the value of**f(x)**directly in**n**iterations. (See the next section for more explanations.) The number of operations is again**O(n*P)**because only "short" multiplications and divisions are needed.

In the next section we shall give some formulae that allow to calculate particular polynomials more efficiently.

Classical orthogonal polynomials: special cases

There is a way to implement this method without recursion.
The idea is to build the sequence of numbers ** n[1]**, **n[2]**, ... that are needed to compute **OrthoT(n,x)**.

For example, to compute **OrthoT(19,x)** using the second recurrence relation, we need **OrthoT(10,x)** and **OrthoT(9,x)**.
We can write this chain symbolically as **19<>c(9,10)**.
For **OrthoT(10,x)** we need only **OrthoT(5,x)**.
This we can write as **10<>c(5)**.
Similarly we find: **9<>c(4,5)**.
Therefore, we can find both **OrthoT(9,x)** and **OrthoT(10,x)** if we know **OrthoT(4,x)** and **OrthoT(5,x)**.
Eventually we find the following chain of pairs:

There are about ** 2*Ln(n)/Ln(2)** elements in the chain that leads to the number **n**.
We can generate this chain in a straightforward way by examining the bits in the binary representation of ** n**.
Therefore, we find that this method requires no storage and time logarithmic in ** n**.
A recursive routine would also take logarithmic time and also require logarithmic storage space.

Note that using these recurrence relations we do not obtain any individual coefficients of the Tschebyscheff polynomials.
This method does not seem very useful for symbolic calculations (with symbolic ** x**), because the resulting expressions are rather complicated combinations of nested products.
It is difficult to expand such an expression into powers of ** x** or manipulate it in any other way, except compute a numerical value.
However, these fast recurrences are numerically unstable, so numerical values need to be evaluated with extended working precision.
Currently this method is not used in Yacas, despite its speed.

Coefficients for Legendre, Hermite, Laguerre, Chebyshev polynomials can be obtained by explicit formulae. This is faster than using recurrences if we need the entire polynomial symbolically, but still slower than the recurrences for numerical calculations.

- Tschebyscheff polynomials of the first and of the second kind:
If
**k=Floor(n/2)**, then**OrthoT(n,x)=Sum(i,0,k,(-1)^i*(2*x)^(n-2*i)/(n-i)*Bin(n-i,i)),** Here it is assumed that**OrthoU(n,x)=Sum(i,0,k,(-1)^i*(2*x)^(n-2*i)*Bin(n-i,i)).****n>0**(the case**n=0**must be done separately). The summation is over integer values of**i**such that**0<=2*i<=n**, regardless of whether**n**is even or odd.

- Hermite polynomials:
For even
**n=2*k**where**k>=0**, and for odd**OrthoH(n,x)=(-2)^k*(n-1)!! *Sum(i,0,k,(x^(2*i)*(-4)^i*k!)/((2*i)! *(k-i)!)),****n=2*k+1**where**k>=0**,**OrthoH(n,x)=2*(-2)^k*n!! *Sum(i,0,k,(x^(2*i+1)*(-4)^i*k!)/((2*i+1)! *(k-i)!)).**

- Legendre polynomials:
If
**k=Floor(n/2)**, then The summation is over integer values of**OrthoP(n,x)=2^(-n)*Sum(i,0,k,(-1)^i*x^(n-2*i)*Bin(n,i)*Bin(2*n-2*i,n)).****i**such that**0<=2*i<=n**, regardless of whether**n**is even or odd.

- Laguerre polynomials:
Here the binomial coefficient must be defined for non-integer**OrthoL(n,a,x)=Sum(i,0,n,(-x)^i/i! *Bin(n+a,n-i)).****a**through the Gamma function instead of factorials, which gives The result is a rational function of**Bin(n+a,n-i)=((n+a)*...*(i+1+a))/(n-i)!.****a**.

In all formulae for the coefficients, there is no need to compute factorials every time:
the next coefficient can be obtained from the previous one by a few short multiplications and divisions.
Therefore this computation costs ** O(n^2)** short operations.

Series of orthogonal polynomials

Suppose a family of functions ** q _n(x)**,
**n=0**, 1, ... satisfies known recurrence relations of the form

The procedure goes as follows.

Luke's book warns that the recurrence relation for **X[n]** is not always numerically stable.

Note that in the book there seems to be some confusion as
to how the coefficient **A[1]** is defined.
(It is not defined explicitly there.)
Our final formula differs from the book for this reason.

The Clenshaw-Smith procedure is analogous to the Horner scheme of calculating polynomials.
This procedure can also be generalized for linear recurrence relations having
more than two terms.
The functions **q _0(x)**, **q _1(x)**, **A _n(x)**, and **B _n(x)**
do not actually have to be polynomials for this to work.