The problem of numerical differentiation does not receive very much attention nowadays. Although the Taylor series plays a key role in much of classical analysis, the poor reputation enjoyed by numerical differentiation has led numerical analysts to construct techniques for most problems which avoid the explicit use of numerical differentiation.

One may briefly and roughly define the term numerical differentiation as any process in which numerical values of derivatives

f^{(s)} (x_{0})

are obtained from evaluations

f (x_{i})

of the function

f (x)

at several abscissae

x_{i}

near

x_{0}

. This problem can be stable, well-conditioned, and accurate when complex-valued abscissae are used. This was first pointed out by Lyness and Moler (1967). An item of numerical software for this appears in Lyness and Ande (1971). However, in many applications the use of complex-valued abscissae is either prohibitive or prohibited. The main difficulty in using real abscissae is that amplification of round-off error can completely obliterate meaningful results. In the days when one relied on hand calculating machines and tabular data, the frustration caused by this effect led to the abandonment of serious use of the process.

There are several reasons for believing that, in the present-day computing environment, numerical differentiation might find a useful role. The first is that, by present standards, it is rather a small-scale calculation, so its cost may well be negligible compared with any overall saving in cost that might result from its use. Secondly, the assignment of a step length

h

is now generally open. One does not have to rely on tabular data. Thirdly, although the amplification of round-off error is an integral part of the calculation, its effect can be measured reliably and automatically by the function at the time of the calculation.

Thus you do not have to gauge the round-off level (or noise level) of the function values or assess the effect of this on the accuracy of the results. A function does this automatically, returning with each result an error estimate which has already taken round-off error amplification into account.

We now illustrate, by means of a very simple example, the importance of the round-off error effect. A very simple approximation of

f^{'} (0)

, based on the identity

f^{'} (0) = (f (h) - f (- h)) / 2 h + (h^{2} / 3!) f^{'''} (ξ),

(1)

(f (h) - f (- h)) / 2 h .

If there were no precision problem, this formula would be the only one needed in the theory of first-order numerical differentiation. We could simply take

h = 10^{- 40}

(or

h = 10^{- 1000}

) to obtain an excellent approximation based on two function values. It is only when we consider in detail how a machine with finite precision comes to calculate

(f (h) - f (- h)) / 2 h

that the necessity for a sophisticated theory becomes apparent.

To simplify the subsequent description we shall assume that the quantities involved are neither so close to zero that the machine underflow characteristics need be considered nor so large that machine overflow occurs. The approximation mentioned above involves the function values

f (h)

and

f (- h)

. In general no computer has available these numbers exactly. Instead it uses approximations

\hat{f} (h)

and

\hat{f} (- h)

whose relative accuracy is less than some tolerance

ε_{f}

. If the function

f (x)

is a Library function, for example

\sin x

ε_{f}

may coincide with the machine accuracy parameter

ε_{m}

. More generally the function

f (x)

is calculated in a user-supplied function and

ε_{f}

is larger than

ε_{m}

by a small factor

5

6

if the calculation is short or by some larger factor in an extended calculation. This factor is not usually known by you.

\hat{f} (h)

and

\hat{f} (- h)

are related to

f (h)

and

f (- h)

\begin{array}{l} \hat{f} (h) = f (h) (1 + θ_{1} ε_{f}), & |θ_{1}| \leq 1 \\ \hat{f} (- h) = f (- h) (1 + θ_{2} ε_{f}), & |θ_{2}| \leq 1 . \end{array}

Thus even if the rest of the calculation were carried out exactly, it is trivial to show that

\frac{\hat{f} (h) - \hat{f} (- h)}{2 h} - \frac{f (h) - f (- h)}{2 h} ≃ 2 ϕ ε_{f} \frac{f (ξ)}{2 h}, |ϕ| \leq 1 .

The difference between the quantity actually calculated and the quantity which one attempts to calculate may be as large as

ε_{f} f (ξ) / h

; for example using

h = 10^{- 40}

and

ε_{m} = 10^{- 7}

this gives a ‘conditioning error’ of

10^{33} f (ξ)

In practice much more sophisticated formulae than (1) above are used, and for these and for the corresponding higher-derivative formulae the error analysis is different and more complicated in detail. But invariably the theory contains the same overall feature. In a finite length calculation, the error is composed of two main parts: a discretization error which increases as

h

becomes large and is zero for

h = 0

; and a ‘conditioning’ error due to amplification of round-off error in function evaluation, which increases as

h

becomes small and is infinite for

h = 0

The functions in this chapter have to take into account internally both these sources of error in the results. Thus they return pairs of results,

der [j - 1]

and

erest [j - 1]

where

der [j - 1]

is an approximation to

f^{(j)} (x_{0})

and

erest [j - 1]

is an estimate of the error in the approximation

der [j - 1]

. If the function has not been misled,

der [j - 1]

and

erest [j - 1]

satisfy

|der [j - 1] - f^{(j)} (x_{0})| \leq erest [j - 1] .

In this respect, numerical differentiation functions are fundamentally different from other functions. You do not specify an error criterion. Instead the function provides the error estimate and this may be very large.

We mention here a terminological distinction. A fully automatic function is one in which you do not need to provide any information other than that required to specify the problem. Such a function (at a cost in computing time) decides an appropriate internal parameter such as the step length

h

by itself. On the other hand a function which requires you to provide a step length

h

, but automatically chooses from several different formulae each based on the specified step length, is termed a semi-automatic function.

The situation described above must have seemed rather depressing when hand machines were in common use. For example in the simple illustration one does not know the values of the quantities

f^{'''} (ξ)

ε_{f}

involved in the error estimates, and the idea of altering the value of

h

and starting again must have seemed appalling. However, by present-day standards, it is a relatively simple matter to write a program which carries out all the previously considered time-consuming calculations which may seem necessary. None of the functions in this chapter are particularly revolutionary in concept. They simply utilize the computer for the sort of task for which it was originally designed. It carries out simple tedious calculations which are necessary to estimate the accuracy of the results of other even simpler tedious calculations.

2.2 Examples of Applications for Numerical Differentiation Routines

(a)One immediate use to which a set of derivatives at a point is likely to be put is that of constructing a Taylor series representation:
$f (x) = f (x_{0}) + \sum_{j = 1}^{n} \frac{f^{(j)} (x_{0})}{j!} {(x - x_{0})}^{j} + \frac{f^{(n + 1)} (ξ)}{(n + 1)!} {(x - x_{0})}^{n + 1}, |ξ - x_{0}| \leq x .$
This infinite series converges so long as $|x - x_{0}| < R_{c}$ (the radius of convergence) and it is only for these values of $x$ that such a series is likely to be used. In this case in forming the sum, the required accuracy in $f^{(j)} (x_{0})$ diminishes with increasing $j$ .
The series formed from the Taylor series using elementary operations such as integration or differentiation has the same overall characteristic. A technique based on a Taylor series expansion may be quite accurate, even if the individual derivatives are not, so long as the less accurate derivatives are associated with known small coefficients.

The error introduced by using $n$ approximate derivatives $der [j - 1]$ is bounded by
$\sum_{j = 1}^{n} erest [j - 1] {|x - x_{0}|}^{j} / j!$
Thus, if you are prepared to base the result on a truncated Taylor series, the additional error introduced by using approximate Taylor coefficients can be readily bounded from the values of $erest [j - 1]$ . However, in an automatic code you must be prepared to introduce some alternative approach in case this error bound turns out to be unduly high.

In this sort of application the accuracy of the result depends on the size of the errors in the numerical differentiation. There are other applications where the effect of large errors $erest [j - 1]$ is merely to prolong a calculation, but not to impair the final accuracy.
(b)A simple Taylor series approach such as described in (a) is used to find a starting value for a rapidly converging but extremely local iterative process.
(c)The technique known as ‘subtracting out the singularity’ as a preliminary to numerical quadrature may be extended and may be carried out approximately. Thus suppose we are interested in evaluating
$\int_{0}^{1} x^{- (1 / 2)} ϕ (x) d x,$
we have an automatic quadrature function available, and we have a function available for $ϕ (x)$ which we know to be an analytic function. An integrand function like $x^{- (1 / 2)} ϕ (x)$ is generally accepted to be difficult for an automatic integrator because of the singularity. If $ϕ (x)$ and some of its derivatives at the singularity $x = 0$ are known one may effectively ‘subtract out’ the singularity using the following identity:
$\int_{0}^{1} x^{- (1 / 2)} ϕ (x) d x = \int_{0}^{1} x^{- (1 / 2)} (ϕ (x) - ϕ (0) - A x - B x^{2} / 2) d x + 2 ϕ (0) + 2 A / 3 + B / 5$ (2)
with $A = ϕ^{'} (0)$ and $B = ϕ^{''} (0)$ .
The integrand function on the right of (2) has no singularity, but its third derivative does. Thus using numerical quadrature for this integral is much cheaper than using numerical quadrature for the original integral (in the left-hand side of (2)).

However, (2) is an identity whatever values of $A$ and $B$ are assigned. Thus the same procedure can be used with $A$ and $B$ being approximations to $ϕ^{'} (0)$ and $ϕ^{''} (0)$ provided by a numerical differentiation function. The integrand would now be more difficult to integrate than if $A$ and $B$ were correct but still much less difficult than the original integrand (on the left-hand side of (2)). But, assuming that the automatic quadrature function is reliable, the overall result would still be correct. The effect of using approximate derivatives rather than exact derivatives does not impair the accuracy of the result. It simply makes the result more expensive to obtain, but not nearly as expensive as if no derivatives were used at all.
(d)The calculation of a definite integral may be based on the Euler–Maclaurin expansion
$\int_{0}^{1} f (x) d x = \frac{1}{m} {\sum_{j = 0}^{m}}^{''} f (j / m) - \sum_{s = 1}^{l} \frac{B_{2 s}}{2 s!} \frac{(f^{(2 s - 1)} (1) - f^{(2 s - 1)} (0))}{m^{2 s}} + O (m^{(- 2 l - 2)}) .$
Here $B_{2 s}$ is a Bernoulli number. If one fixes a value of $l$ then as $m$ is increased the right-hand side (without the remainder term) approaches the true value of the integral. This statement remains true whatever values are used to replace $f^{(2 s - 1)} (1)$ and $f^{(2 s - 1)} (0)$ . If no derivatives are available, and this formula is used (effectively with the derivatives replaced by zero) the rate of convergence is slow (like $m^{- 2}$ ) and a large number of function evaluations may be used in calculating the trapezoidal rule sum for large $m$ before a sufficiently accurate result is attained. However, if approximate derivatives are used, the initial rate of convergence is enhanced. In fact, in this example any derivative approximation which is closer than the approximation zero is helpful. Thus the use of inaccurate derivatives may have the effect of shortening the overall calculation, since a sufficiently accurate result may be obtained using a smaller value of $m$ , without impairing the accuracy of the result. (The resemblance with Gregory's formula is superficial. Here $l$ is kept fixed and $m$ is increased, ensuring a convergent process.)
The examples given above are only intended to illustrate the sort of use to which approximate derivatives may be put. Very simple illustrations have been used for clarity, and in such simple cases there are usually more efficient approaches to the problem. The same ideas applied in a more complicated or restrictive setting may provide an efficient approach to a problem for which no simple standard approach exists.

3 Recommendations on Choice and Use of Available Functions

(a)At the present, there is only one numerical differentiation algorithm available in this chapter accessible using direct communication in d04aac, and using reverse communication in d04bac (see Section 7 in How to Use the NAG Library for a description of the difference between these two conventions). These are semi-automatic functions for obtaining approximations to the first fourteen derivatives of a real valued function $f (x)$ at a specified point $x_{0}$ . For d04aac, you must provide a function representing $f (x)$ , the value of $x_{0}$ , an upper limit $n \leq 14$ on the order of the derivatives required and a step length $h$ . For d04bac, you must supply the value of $x_{0}$ , $20$ other abscissae and the function values at those abscissae. Both functions return a set of approximations $der [j - 1]$ and corresponding error estimates $erest [j - 1]$ which hopefully satisfy
$|der [j - 1] - f^{(j)} (x_{0})| \leq erest [j - 1], j = 1, 2, \dots, n \leq 14 .$

It is important that the error estimate $erest [j - 1]$ be taken into consideration before any use of $der [j - 1]$ is made. The actual size of $erest [j - 1]$ depends on the analytic structure of the function, on the computational precision used and on the step size $h$ , and is difficult to predict. Thus you have to run the function to find out how accurate the results are. Usually you will find the higher-order derivatives are successively more inaccurate and that past a certain order, say $10$ or $11$ , the size of $erest [j - 1]$ actually exceeds $der [j - 1]$ . Clearly when this happens the approximation $der [j - 1]$ is unusable.
(b)There is available in the algorithm section of CACM (see Lyness and Ande (1971)) a semi-automatic Fortran function for numerical differentiation of an analytical function $f (z)$ at a possibly complex abscissa $z_{0}$ . This is a stable problem. It can be used for any problem for which d04aac might be used and produces more accurate results, and derivatives of arbitrary high order. However, even if $z_{0}$ is real and $f (z)$ is real for $z$ , this function requires a user-supplied function which evaluates $f (z)$ for complex values of $z$ and it makes use of complex arithmetic.
(c)Routines are available in Chapter E02 to differentiate functions which are polynomials (in Chebyshev series representation) or cubic splines (in B-spline representation).

4 Functionality Index

Generates abscissae for d04bac

d04bbc

Numerical derivatives,

direct communication

d04aac

reverse communication

d04bac

5 Auxiliary Functions Associated with Library Function Arguments

None.

6 Withdrawn or Deprecated Functions

None.

7 References

Lyness J N and Ande G (1971) Algorithm 413, ENTCAF and ENTCRE: evaluation of normalised Taylor coefficients of an analytic function Comm. ACM 14(10) 669–675

Lyness J N and Moler C B (1967) Numerical differentiation of analytic functions SIAM J. Numer. Anal. 4(2) 202–210

NAG Library Manual, Mark 27

Interfaces: FL CL CPP AD

D04 (Numdiff)

Chapter Contents

Chapter Introduction FL CL

NAG CL InterfaceD04 (Numdiff)Numerical Differentiation

▸▿ Contents

1 Scope of the Chapter

2 Background to the Problems