e02be:: Curve and Surface Fitting (NAG Toolbox)

The quantity

η

can be seen as a measure of the (lack of) smoothness of

s (x)

, while closeness of fit is measured through

θ

. By means of the argument

S

, ‘the smoothing factor’, you can then control the balance between these two (usually conflicting) properties. If

S

is too large, the spline will be too smooth and signal will be lost (underfit); if

S

is too small, the spline will pick up too much noise (overfit). In the extreme cases the function will return an interpolating spline

(θ = 0)

S

is set to zero, and the weighted least squares cubic polynomial

(η = 0)

S

is set very large. Experimenting with

S

values between these two extremes should result in a good compromise. (See Choice of for advice on choice of

S

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Accuracy

On successful exit, the approximation returned is such that its weighted sum of squared residuals

θ

(as in (3)) is equal to the smoothing factor

S

, up to a specified relative tolerance of

0.001

– except that if

n = 8

θ

may be significantly less than

S

: in this case the computed spline is simply a weighted least squares polynomial approximation of degree

3

, i.e., a spline with no interior knots.

Further Comments

Timing

Choice of S

If the weights have been correctly chosen (see Weighting of data points in the E02 Chapter Introduction), the standard deviation of

w_{r} y_{r}

would be the same for all

r

, equal to

σ

, say. In this case, choosing the smoothing factor

S

in the range

σ^{2} (m \pm \sqrt{2 m})

, as suggested by Reinsch (1967), is likely to give a good start in the search for a satisfactory value. Otherwise, experimenting with different values of

S

will be required from the start, taking account of the remarks in Description.

In that case, in view of computation time and memory requirements, it is recommended to start with a very large value for

S

and so determine the least squares cubic polynomial; the value returned in fp, call it

θ_{0}

, gives an upper bound for

S

. Then progressively decrease the value of

S

to obtain closer fits – say by a factor of

10

in the beginning, i.e.,

S = θ_{0} / 10

S = θ_{0} / 100

, and so on, and more carefully as the approximation shows more details.

The number of knots of the spline returned, and their location, generally depend on the value of

S

and on the behaviour of the function underlying the data. However, if nag_fit_1dspline_auto (e02be) is called with

start ='W'

, the knots returned may also depend on the smoothing factors of the previous calls. Therefore if, after a number of trials with different values of

S

and

start ='W'

, a fit can finally be accepted as satisfactory, it may be worthwhile to call nag_fit_1dspline_auto (e02be) once more with the selected value for

S

but now using

start ='C'

. Often, nag_fit_1dspline_auto (e02be) then returns an approximation with the same quality of fit but with fewer knots, which is therefore better if data reduction is also important.

Outline of Method Used

S > 0

, a suitable knot set is built up in stages (starting with no interior knots in the case of a cold start but with the knot set found in a previous call if a warm start is chosen). At each stage, a spline is fitted to the data by least squares (see nag_fit_1dspline_knots (e02ba)) and

θ

, the weighted sum of squares of residuals, is computed. If

θ > S

, new knots are added to the knot set to reduce

θ

at the next stage. The new knots are located in intervals where the fit is particularly poor, their number depending on the value of

S

and on the progress made so far in reducing

θ

. Sooner or later, we find that

θ \leq S

and at that point the knot set is accepted. The function then goes on to compute the (unique) spline which has this knot set and which satisfies the full fitting criterion specified by (2) and (3). The theoretical solution has

θ = S

. The function computes the spline by an iterative scheme which is ended when

θ = S

within a relative tolerance of

0.001

. The main part of each iteration consists of a linear least squares computation of special form, done in a similarly stable and efficient manner as in nag_fit_1dspline_knots (e02ba).

An exception occurs when the function finds at the start that, even with no interior knots

(n = 8)

, the least squares spline already has its weighted sum of squares of residuals

\leq S

. In this case, since this spline (which is simply a cubic polynomial) also has an optimal value for the smoothness measure

η

, namely zero, it is returned at once as the (trivial) solution. It will usually mean that

S

has been chosen too large.

Evaluation of Computed Spline

The values of the spline and its first three derivatives at a given value x may be obtained in the double array s of dimension at least

4

by the call:

[s, ifail] = e02bc(lamda, c, x, left);

where if

left = 1

, left-hand derivatives are computed and if

left \neq 1

, right-hand derivatives are calculated. The value of left is only relevant if x is an interior knot (see nag_fit_1dspline_deriv (e02bc)).

Example

The program includes code to evaluate the computed splines, by calls to nag_fit_1dspline_eval (e02bb), at the points

x_{r}

and at points mid-way between them. These values are not printed out, however; instead the results are illustrated by plots of the computed splines, together with the data points (indicated by

\times

) and the positions of the knots (indicated by vertical lines): the effect of decreasing

s

can be clearly seen.

function e02be_example


fprintf('e02be example results\n\n');

start = 'C';
data = [0.0  -1.100  1.0;
        0.5  -0.372  2.0;
        1.0   0.431  1.5;
        1.5   1.690  1.0;
        2.0   2.110  3.0;
        2.5   3.100  1.0;
        3.0   4.230  0.5;
        4.0   4.350  1.0;
        4.5   4.810  2.0;
        5.0   4.610  2.5;
        5.5   4.790  1.0;
        6.0   5.230  3.0;
        7.0   6.350  1.0;
        7.5   7.190  2.0;
        8.0   7.970  1.0];
x = data(:,1);
y = data(:,2);
w = data(:,3);
m = size(data,1);

s = [1 0.5 0.1];

nest  = m + 4;
n     = int64(nest);
lamda = zeros(nest,1);
wrk   = zeros(4*m+16*nest+41, 1);
iwrk  = zeros(nest, 1, 'int64');
start = 'Cold';

xs(1:2:2*m-1) = x;
xs(2:2:2*m-2) = (x(1:m-1)+x(2:m))/2;

for is = 1:3
  % Get spline
  [n, lamda, c, fp, wrk, iwrk, ifail] = ...
  e02be( ...
         start, x, y, w, s(is), n, lamda, wrk, iwrk);

  % Print details of spline
  fprintf('\nCalling with smoothing factor S = %5.2f\n\n', s(is));
  fprintf('         Knots    Coeffs\n');
  fprintf('%4d%20.4f\n', 1, c(1));
  for j = 2:n-5
    fprintf('%4d%10.4f%10.4f\n', j, lamda(j+2), c(j));
  end
  fprintf('%4d%20.4f\n\n', n-4, c(n-4));
  fprintf('Weighted sum of squared residuals = %7.4f\n', fp);
  if fp==0
    fprintf('(The spline is an interpolating spline)\n');
  elseif n==8
    fprintf('(The spline is the weighted least-squares cubic polynomial)\n');
  end
  fprintf('\n');

  % Evaluate at x and mid-points (xs)
  for i = 1:2*m-1
    [fit(i,is), ifail] = e02bb( ...
                                lamda, c, xs(i),'ncap7',n);
  end

  start = 'Warm';
end

fig1 = figure;
hold on
plot(x,y,'*','Color','Green')
plot(xs,fit(:,1));
xlabel('x');
title('Evaluation of B-spline representation, S = 1.0');
legend('Evaluation points','B-spline','Location','NorthWest');
hold off;

fig2 = figure;
hold on
plot(x,y,'*','Color','Green')
plot(xs,fit(:,2));
xlabel('x');
title('Evaluation of B-spline representation, S = 0.5');
legend('Evaluation points','B-spline','Location','NorthWest');
hold off;

fig3 = figure;
hold on
plot(x,y,'*','Color','Green')
plot(xs,fit(:,3));
xlabel('x');
title('Evaluation of B-spline representation, S = 0.1');
legend('Evaluation points','B-spline','Location','NorthWest');
hold off;

e02be example results


Calling with smoothing factor S =  1.00

         Knots    Coeffs
   1             -1.3201
   2    0.0000    1.3542
   3    4.0000    5.5510
   4    8.0000    4.7031
   5              8.2277

Weighted sum of squared residuals =  1.0003


Calling with smoothing factor S =  0.50

         Knots    Coeffs
   1             -1.1072
   2    0.0000   -0.6571
   3    1.0000    0.4350
   4    2.0000    2.8061
   5    4.0000    4.6824
   6    5.0000    4.6416
   7    6.0000    5.1976
   8    8.0000    6.9008
   9              7.9979

Weighted sum of squared residuals =  0.5001


Calling with smoothing factor S =  0.10

         Knots    Coeffs
   1             -1.0901
   2    0.0000   -0.6401
   3    1.0000    0.0334
   4    1.5000    1.6390
   5    2.0000    2.1243
   6    3.0000    4.5591
   7    4.0000    4.2174
   8    4.5000    4.9105
   9    5.0000    4.5475
  10    5.5000    4.6960
  11    6.0000    5.7370
  12    8.0000    6.8179
  13              7.9953

Weighted sum of squared residuals =  0.1000

where	$δ_{i}$	stands for the discontinuity jump in the third order derivative of $s (x)$ at the interior knot $λ_{i}$ ,
	$ε_{r}$	denotes the weighted residual $w_{r} (y_{r} - s (x_{r}))$ ,
and	$S$	is a non-negative number to be specified by you.

On entry,	$start \neq'C'$ or $'W'$ ,
or	$m < 4$ ,
or	$s < 0.0$ ,
or	$s = 0.0$ and $nest < m + 4$ ,
or	$nest < 8$ ,
or	$lwrk < 4 \times m + 16 \times nest + 41$ .

NAG Toolbox: nag_fit_1dspline_auto (e02be)

▸▿ Contents

Purpose

Syntax

Description