NAG CL Interface
e02bec (dim1_​spline_​auto)

1 Purpose

e02bec computes a cubic spline approximation to an arbitrary set of data points. The knots of the spline are located automatically, but a single argument must be specified to control the trade-off between closeness of fit and smoothness of fit.

2 Specification

#include <nag.h>
void  e02bec (Nag_Start start, Integer m, const double x[], const double y[], const double weights[], double s, Integer nest, double *fp, Nag_Comm *warmstartinf, Nag_Spline *spline, NagError *fail)
The function may be called by the names: e02bec, nag_fit_dim1_spline_auto or nag_1d_spline_fit.

3 Description

e02bec determines a smooth cubic spline approximation sx to the set of data points xr,yr, with weights wr, for r=1,2,,m.
The spline is given in the B-spline representation
sx=i=1 n-4ciNix, (1)
where Nix denotes the normalized cubic B-spline defined upon the knots λi,λi+1,,λi+4.
The total number n of these knots and their values λ1,,λn are chosen automatically by the function. The knots λ5,,λn-4 are the interior knots; they divide the approximation interval x1,xm into n-7 sub-intervals. The coefficients c1,c2,,cn-4 are then determined as the solution of the following constrained minimization problem:
minimize
η=i=5 n-4δi2 (2)
subject to the constraint
θ=r=1mεr2S, (3)
where δi stands for the discontinuity jump in the third order derivative of sx at the interior knot λi,
  εr denotes the weighted residual wryr-sxr,
and S is a non-negative number to be specified by you.
The quantity η can be seen as a measure of the (lack of) smoothness of sx, while closeness of fit is measured through θ. By means of the argument S , ‘the smoothing factor’, you can then control the balance between these two (usually conflicting) properties. If S is too large, the spline will be too smooth and signal will be lost (underfit); if S is too small, the spline will pick up too much noise (overfit). In the extreme cases the function will return an interpolating spline θ=0 if S is set to zero, and the weighted least squares cubic polynomial η=0 if S is set very large. Experimenting with S values between these two extremes should result in a good compromise. (See Section 9.2 for advice on choice of S .)
The method employed is outlined in Section 9.3 and fully described in Dierckx (1975), Dierckx (1981) and Dierckx (1982). It involves an adaptive strategy for locating the knots of the cubic spline (depending on the function underlying the data and on the value of S ), and an iterative method for solving the constrained minimization problem once the knots have been determined.
Values of the computed spline, or of its derivatives or definite integral, can subsequently be computed by calling e02bbc, e02bcc or e02bdc, as described in Section 9.4.

4 References

Dierckx P (1975) An algorithm for smoothing, differentiating and integration of experimental data using spline functions J. Comput. Appl. Math. 1 165–184
Dierckx P (1981) An improved algorithm for curve fitting with spline functions Report TW54 Department of Computer Science, Katholieke Univerciteit Leuven
Dierckx P (1982) A fast algorithm for smoothing data on a rectangular grid while using spline functions SIAM J. Numer. Anal. 19 1286–1304
Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183

5 Arguments

1: start Nag_Start Input
On entry: must be set to Nag_Cold or Nag_Warm.
start=Nag_Cold
The function will build up the knot set starting with no interior knots. No values need be assigned to the argument splinen, and memory will be allocated internally to splinelamda, splinec, warmstartinfnag_w and warmstartinfnag_iw.
start=Nag_Warm
The function will restart the knot-placing strategy using the knots found in a previous call of the function. In this case, all arguments except s must be unchanged from that previous call. This warm start can save much time in searching for a satisfactory value of the smoothing factor S .
Constraint: start=Nag_Cold or Nag_Warm.
2: m Integer Input
On entry: m , the number of data points.
Constraint: m4.
3: x[m] const double Input
On entry: x[r-1] holds the value x r of the independent variable (abscissa) x , for r=1,2,,m.
Constraint: x1<x2<<xm.
4: y[m] const double Input
On entry: y[r-1] holds the value y r of the dependent variable (ordinate) y , for r=1,2,,m.
5: weights[m] const double Input
On entry: the values wr of the weights, for r=1,2,,m. For advice on the choice of weights, see Section 2.1.2 in the E02 Chapter Introduction.
Constraint: weights[r-1]>0.0, for r=1,2,,m.
6: s double Input
On entry: the smoothing factor, S .
If S=0.0, the function returns an interpolating spline.
If S is smaller than machine precision, it is assumed equal to zero.
For advice on the choice of S, see Sections 3 and 9.2.
Constraint: s0.0.
7: nest Integer Input
On entry: an overestimate for the number, n, of knots required.
Constraint: nest8. In most practical situations, nest=m/2 is sufficient. nest never needs to be larger than m+4, the number of knots needed for interpolation s=0.0.
8: fp double * Output
On exit: the sum of the squared weighted residuals, θ, of the computed spline approximation. If fp=0.0, this is an interpolating spline. fp should equal s within a relative tolerance of 0.001 unless n=8 when the spline has no interior knots and so is simply a cubic polynomial. For knots to be inserted, s must be set to a value below the value of fp produced in this case.
9: warmstartinf Nag_Comm *
Pointer to structure of type Nag_Comm with the following members:
nag_wdouble *Input
On entry: if the warm start option is used, the values nag_w[0] , , nag_w[splinen-1] must be left unchanged from the previous call.
nag_iwInteger *Input
On entry: if the warm start option is used, the values nag_iw[0] , , nag_iw[splinen-1] must be left unchanged from the previous call.
Note that when the information contained in the pointers nag_w and nag_iw is no longer of use, or before a new call to e02bec with the same warmstartinf, you should free this storage using the NAG macro NAG_FREE. This storage will have been allocated only if this function returns with fail.code=NE_NOERROR , NE_SPLINE_COEFF_CONV or NE_NUM_KNOTS_1D_GT.
10: spline Nag_Spline *
Pointer to structure of type Nag_Spline with the following members:
nIntegerInput/Output
On entry: if the warm start option is used, the value of n must be left unchanged from the previous call.
On exit: the total number, n , of knots of the computed spline.
lamdadouble *Input/Output
On entry: a pointer to which, if start=Nag_Cold, memory of size nest is internally allocated. If the warm start option is used, the values lamda[0] , lamda[1] , , lamda[n-1] must be left unchanged from the previous call.
On exit: the knots of the spline, i.e., the positions of the interior knots lamda[4] , lamda[5] , , lamda[n-5] as well as the positions of the additional knots lamda[0] = lamda[1] = lamda[2] = lamda[3] = x[0] and lamda[n-4] = lamda[n-3] = lamda[n-2] = lamda[n-1] = x[m-1] needed for the B-spline representation.
cdouble *Output
On exit: a pointer to which, if start=Nag_Cold, memory of size nest-4 is internally allocated. c[i-1] holds the coefficient c i of the B-spline N i x in the spline approximation s x , for i = 1 , 2 , , n - 4 .
Note that when the information contained in the pointers lamda and c is no longer of use, or before a new call to e02bec with the same spline, you should free this storage using the NAG macro NAG_FREE. This storage will have been allocated only if this function returns with fail.code=NE_NOERROR , NE_SPLINE_COEFF_CONV, or NE_NUM_KNOTS_1D_GT.
11: fail NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_BAD_PARAM
On entry, argument start had an illegal value.
NE_ENUMTYPE_WARM
start=Nag_Warm at the first call of this function. start must be set to start=Nag_Cold at the first call.
NE_INT_ARG_LT
On entry, m=value.
Constraint: m4.
On entry, nest=value.
Constraint: nest8.
NE_NOT_STRICTLY_INCREASING
The sequence x is not strictly increasing: x[value] = value, x[value] = value.
NE_NUM_KNOTS_1D_GT
The number of knots needed is greater than nest, nest=value . If nest is already large, say nest > m / 2 , this may indicate that possibly s is too small: s=value .
NE_REAL_ARG_LT
On entry, s=value.
Constraint: s0.0.
NE_SF_D_K_CONS
On entry, nest=value , s=value , m=value .
Constraint: nest m + 4 when s=0.0 .
NE_SPLINE_COEFF_CONV
The iterative process has failed to converge. Possibly s is too small: s=value .
NE_WEIGHTS_NOT_POSITIVE
On entry, the weights are not strictly positive: weights[value] = value.
If the function fails with an error exit of NE_SPLINE_COEFF_CONV or NE_NUM_KNOTS_1D_GT, a spline approximation is returned, but it fails to satisfy the fitting criterion (see (2) and (3)) – perhaps by only a small amount, however.

7 Accuracy

On successful exit, the approximation returned is such that its weighted sum of squared residuals θ (as in (3)) is equal to the smoothing factor S , up to a specified relative tolerance of 0.001 – except that if n=8, θ may be significantly less than S : in this case the computed spline is simply a weighted least squares polynomial approximation of degree 3, i.e., a spline with no interior knots.

8 Parallelism and Performance

e02bec is not threaded in any implementation.

9 Further Comments

9.1 Timing

The time taken for a call of e02bec depends on the complexity of the shape of the data, the value of the smoothing factor S , and the number of data points. If e02bec is to be called for different values of S , much time can be saved by setting start=Nag_Warm after the first call.

9.2 Choice of S

If the weights have been correctly chosen (see Section 2.1.2 in the E02 Chapter Introduction), the standard deviation of wryr would be the same for all r , equal to σ, say. In this case, choosing the smoothing factor S in the range σ2m±2m, as suggested by Reinsch (1967), is likely to give a good start in the search for a satisfactory value. Otherwise, experimenting with different values of S will be required from the start, taking account of the remarks in Section 3.
In that case, in view of computation time and memory requirements, it is recommended to start with a very large value for S and so determine the least squares cubic polynomial; the value returned in fp, call it θ0, gives an upper bound for S . Then progressively decrease the value of S to obtain closer fits – say by a factor of 10 in the beginning, i.e., S=θ0/10, S=θ0/100, and so on, and more carefully as the approximation shows more details.
The number of knots of the spline returned, and their location, generally depend on the value of S and on the behaviour of the function underlying the data. However, if e02bec is called with start=Nag_Warm, the knots returned may also depend on the smoothing factors of the previous calls. Therefore if, after a number of trials with different values of S and start=Nag_Warm, a fit can finally be accepted as satisfactory, it may be worthwhile to call e02bec once more with the selected value for S but now using start=Nag_Cold. Often, e02bec then returns an approximation with the same quality of fit but with fewer knots, which is therefore better if data reduction is also important.

9.3 Outline of Method Used

If S=0, the requisite number of knots is known in advance, i.e., n=m+4; the interior knots are located immediately as λi=xi-2, for i=5,6,,n-4. The corresponding least squares spline (see e02bac) is then an interpolating spline and therefore a solution of the problem.
If S>0, a suitable knot set is built up in stages (starting with no interior knots in the case of a cold start but with the knot set found in a previous call if a warm start is chosen). At each stage, a spline is fitted to the data by least squares (see e02bac) and θ, the weighted sum of squares of residuals, is computed. If θ>S , new knots are added to the knot set to reduce θ at the next stage. The new knots are located in intervals where the fit is particularly poor, their number depending on the value of S and on the progress made so far in reducing θ. Sooner or later, we find that θS and at that point the knot set is accepted. The function then goes on to compute the (unique) spline which has this knot set and which satisfies the full fitting criterion specified by (2) and (3). The theoretical solution has θ=S . The function computes the spline by an iterative scheme which is ended when θ=S within a relative tolerance of 0.001. The main part of each iteration consists of a linear least squares computation of special form, done in a similarly stable and efficient manner as in e02bac.
An exception occurs when the function finds at the start that, even with no interior knots n=8, the least squares spline already has its weighted sum of squares of residuals S . In this case, since this spline (which is simply a cubic polynomial) also has an optimal value for the smoothness measure η, namely zero, it is returned at once as the (trivial) solution. It will usually mean that S has been chosen too large.
For further details of the algorithm and its use, see Dierckx (1981).

9.4 Evaluation of Computed Spline

The value of the computed spline at a given value x may be obtained in the double variable s by the call:
nag_fit_dim1_spline_eval(x, &s, &spline, &fail)
where spline is a structure of type Nag_Spline which is the output argument of e02bec.
The values of the spline and its first three derivatives at a given value x may be obtained in the array s of dimension at least 4 by the call:
nag_fit_dim1_spline_deriv(derivs, x, s, &spline, &fail)
where, if derivs=Nag_LeftDerivs, left-hand derivatives are computed and, if derivs=Nag_RightDerivs, right-hand derivatives are calculated. The value of derivs is only relevant if x is an interior knot (see e02bcc).
The value of the definite integral of the spline over the interval x[0] to x[m-1] can be obtained in the variable integral by the call:
nag_fit_dim1_spline_integ(&spline, &integral, &fail)
see e02bdc.

10 Example

This example reads in a set of data values, followed by a set of values of s . For each value of s it calls e02bec to compute a spline approximation, and prints the values of the knots and the B-spline coefficients ci.
The program includes code to evaluate the computed splines, by calls to e02bbc, at the points xr and at points mid-way between them. These values are not printed out, however; instead the results are illustrated by plots of the computed splines, together with the data points (indicated by ×) and the positions of the knots (indicated by vertical lines): the effect of decreasing s can be clearly seen.

10.1 Program Text

Program Text (e02bece.c)

10.2 Program Data

Program Data (e02bece.d)

10.3 Program Results

Program Results (e02bece.r)
GnuplotProduced by GNUPLOT 4.6 patchlevel 3 −2 0 2 4 6 8 0 1 2 3 4 5 6 7 8 B-spline x Example Program Calculation and Evaluation of B-splines Representation Smoothing Factor S=1.0 gnuplot_plot_1 B-spline gnuplot_plot_2 gnuplot_plot_3 data points
GnuplotProduced by GNUPLOT 4.6 patchlevel 3 −2 0 2 4 6 8 0 1 2 3 4 5 6 7 8 B-spline x Smoothing Factor S=0.5 gnuplot_plot_1 B-spline gnuplot_plot_2 gnuplot_plot_3 data points
GnuplotProduced by GNUPLOT 4.6 patchlevel 3 −2 0 2 4 6 8 0 1 2 3 4 5 6 7 8 B-spline x Smoothing Factor S=0.1 gnuplot_plot_1 B-spline gnuplot_plot_2 gnuplot_plot_3 data points