e02baf computes a weighted least squares approximation to an arbitrary set of data points by a cubic spline with knots prescribed by you. Cubic spline interpolation can also be carried out.

2 Specification

Fortran Interface

Subroutine e02baf (

m, ncap7, x, y, w, lamda, work1, work2, c, ss, ifail)

Integer, Intent (In)	::	m, ncap7
Integer, Intent (Inout)	::	ifail
Real (Kind=nag_wp), Intent (In)	::	x(m), y(m), w(m)
Real (Kind=nag_wp), Intent (Inout)	::	lamda(ncap7)
Real (Kind=nag_wp), Intent (Out)	::	work1(m), work2(4*ncap7), c(ncap7), ss

C Header Interface

#include <nag.h>

void	e02baf_ (const Integer m, const Integer ncap7, const double x[], const double y[], const double w[], double lamda[], double work1[], double work2[], double c[], double ss, Integer ifail)

The routine may be called by the names e02baf or nagf_fit_dim1_spline_knots.

3 Description

e02baf determines a least squares cubic spline approximation

s (x)

to the set of data points

(x_{r}, y_{r})

with weights

w_{r}

, for

r = 1, 2, \dots, m

. The value of

ncap7 = \bar{n} + 7

, where

\bar{n}

is the number of intervals of the spline (one greater than the number of interior knots), and the values of the knots

λ_{5}, λ_{6}, \dots, λ_{\bar{n} + 3}

, interior to the data interval, are prescribed by you.

s (x)

has the property that it minimizes

θ

, the sum of squares of the weighted residuals

ε_{r}

, for

r = 1, 2, \dots, m

, where

ε_{r} = w_{r} (y_{r} - s (x_{r})) .

The routine produces this minimizing value of

θ

and the coefficients

c_{1}, c_{2}, \dots, c_{q}

, where

q = \bar{n} + 3

, in the B-spline representation

s (x) = \sum_{i = 1}^{q} c_{i} N_{i} (x) .

Here

N_{i} (x)

denotes the normalized B-spline of degree

3

defined upon the knots

λ_{i}, λ_{i + 1}, \dots, λ_{i + 4}

In order to define the full set of B-splines required, eight additional knots

λ_{1}, λ_{2}, λ_{3}, λ_{4}

and

λ_{\bar{n} + 4}, λ_{\bar{n} + 5}, λ_{\bar{n} + 6}, λ_{\bar{n} + 7}

are inserted automatically by the routine. The first four of these are set equal to the smallest

x_{r}

and the last four to the largest

x_{r}

The representation of

s (x)

in terms of B-splines is the most compact form possible in that only

\bar{n} + 3

coefficients, in addition to the

\bar{n} + 7

knots, fully define

s (x)

The method employed involves forming and then computing the least squares solution of a set of

m

linear equations in the coefficients

c_{i}

, for

i = 1, 2, \dots, \bar{n} + 3

. The equations are formed using a recurrence relation for B-splines that is unconditionally stable (see Cox (1972) and de Boor (1972)), even for multiple (coincident) knots. The least squares solution is also obtained in a stable manner by using orthogonal transformations, viz. a variant of Givens rotations (see Gentleman (1974) and Gentleman (1973)). This requires only one equation to be stored at a time. Full advantage is taken of the structure of the equations, there being at most four nonzero values of

N_{i} (x)

for any value of

x

and hence at most four coefficients in each equation.

For further details of the algorithm and its use see Cox (1974), Cox (1975) and Cox and Hayes (1973).

Subsequent evaluation of

s (x)

from its B-spline representation may be carried out using e02bbf. If derivatives of

s (x)

are also required, e02bcf may be used. e02bdf can be used to compute the definite integral of

s (x)

4 References

Cox M G (1972) The numerical evaluation of B-splines J. Inst. Math. Appl. 10 134–149

Cox M G (1974) A data-fitting package for the non-specialist user Software for Numerical Mathematics (ed D J Evans) Academic Press

Cox M G (1975) Numerical methods for the interpolation and approximation of data by spline functions PhD Thesis City University, London

Cox M G and Hayes J G (1973) Curve fitting: a guide and suite of algorithms for the non-specialist user NPL Report NAC26 National Physical Laboratory

de Boor C (1972) On calculating with B-splines J. Approx. Theory 6 50–62

Gentleman W M (1973) Least squares computations by Givens transformations without square roots J. Inst. Math. Applic. 12 329–336

Gentleman W M (1974) Algorithm AS 75. Basic procedures for large sparse or weighted linear least squares problems Appl. Statist. 23 448–454

Schoenberg I J and Whitney A (1953) On Polya frequency functions III Trans. Amer. Math. Soc. 74 246–259

5 Arguments

1: $m$ – Integer Input: On entry: the number $m$ of data points.

Constraint: $m \geq mdist \geq 4$ , where $mdist$ is the number of distinct $x$ values in the data.
2: $ncap7$ – Integer Input: On entry: $\bar{n} + 7$ , where $\bar{n}$ is the number of intervals of the spline (which is one greater than the number of interior knots, i.e., the knots strictly within the range $x_{1}$ to $x_{m}$ ) over which the spline is defined.

Constraint: $8 \leq ncap7 \leq mdist + 4$ , where $mdist$ is the number of distinct $x$ values in the data.
3: $x (m)$ – Real (Kind=nag_wp) array Input: On entry: the values $x_{r}$ of the independent variable (abscissa), for $r = 1, 2, \dots, m$ . The values must satisfy the Schoenberg–Whitney conditions (see Section 9).

Constraint: $x_{1} \leq x_{2} \leq \dots \leq x_{m}$ .
4: $y (m)$ – Real (Kind=nag_wp) array Input: On entry: the values $y_{r}$ of the dependent variable (ordinate), for $r = 1, 2, \dots, m$ .
5: $w (m)$ – Real (Kind=nag_wp) array Input: On entry: the values $w_{r}$ of the weights, for $r = 1, 2, \dots, m$ . For advice on the choice of weights, see the E02 Chapter Introduction.

Constraint: $w (r) > 0.0$ , for $r = 1, 2, \dots, m$ .
6: $lamda (ncap7)$ – Real (Kind=nag_wp) array Input/Output: On entry: $lamda (i)$ must be set to the $(i - 4)$ th (interior) knot, $λ_{i}$ , for $i = 5, 6, \dots, \bar{n} + 3$ .

Constraint: $x (1) < lamda (5) \leq lamda (6) \leq \dots \leq lamda (ncap7 - 4) < x (m)$ .

On exit: the input values are unchanged, and $lamda (i)$ , for $i = 1, 2, 3, 4$ , $ncap7 - 3$ , $ncap7 - 2$ , $ncap7 - 1$ , ncap7 contains the additional (exterior) knots introduced by the routine. For advice on the choice of knots, see Section 3.3 in the E02 Chapter Introduction.
7: $work1 (m)$ – Real (Kind=nag_wp) array Workspace
8: $work2 (4 \times ncap7)$ – Real (Kind=nag_wp) array Workspace
9: $c (ncap7)$ – Real (Kind=nag_wp) array Output: On exit: the coefficient $c_{i}$ of the B-spline $N_{i} (x)$ , for $i = 1, 2, \dots, \bar{n} + 3$ . The remaining elements of the array are not used.
10: $ss$ – Real (Kind=nag_wp) Output: On exit: the residual sum of squares, $θ$ .
11: $ifail$ – Integer Input/Output: On entry: ifail must be set to $0$ , $−1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $−1$ means that an error message is printed while a value of $1$ means that it is not.

If halting is not appropriate, the value $−1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $0$ is recommended. When the value $- 1$ or $1$ is used it is essential to test the value of ifail on exit.

On exit: $ifail = 0$ unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry

ifail = 0

−1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Errors or warnings detected by the routine:

$ifail = 1$: On entry, $J = ⟨ value ⟩$ , $lamda (J) = ⟨ value ⟩$ and $lamda (J + 1) = ⟨ value ⟩$ .
Constraint: $lamda (J) \leq lamda (J + 1)$ .

On entry, $lamda (5) = ⟨ value ⟩$ and $x (1) = ⟨ value ⟩$ .
Constraint: $lamda (5) > x (1)$ .

On entry, $ncap7 = ⟨ value ⟩$ , $lamda (ncap7 - 4) = ⟨ value ⟩$ , $m = ⟨ value ⟩$ and $x (m) = ⟨ value ⟩$ .
Constraint: $lamda (ncap7 - 4) < x (m)$ .

$ifail = 2$: On entry, $i = ⟨ value ⟩$ and $w (i) = ⟨ value ⟩$ .
Constraint: $w (i) > 0.0$ .

$ifail = 3$: On entry, the x values are not in nondecreasing order. $I = ⟨ value ⟩$ , $x (I) = ⟨ value ⟩$ , $J = ⟨ value ⟩$ and $xdist (J) = ⟨ value ⟩$ .
Constraint: $x (I) \geq xdist (J)$ , where xdist is the set of distinct $x$ -values.

$ifail = 4$: On entry, $ncap7 = ⟨ value ⟩$ .
Constraint: $ncap7 \geq 8$ .

On entry, $ncap7 = ⟨ value ⟩$ and $m = ⟨ value ⟩$ .
Constraint: $ncap7 \leq m + 4$ .

On entry, $ncap7 = ⟨ value ⟩$ and $mdist = ⟨ value ⟩$ .
Constraint: $ncap7 \leq mdist + 4$ , where mdist is the number of distinct x-values.

$ifail = 5$: On entry, the Schoenberg–Whitney conditions fail to hold for at least one subset of the distinct data abscissae. $I = ⟨ value ⟩$ , $xdist (I) = ⟨ value ⟩$ , $J = ⟨ value ⟩$ and $lamda (J) = ⟨ value ⟩$ .
Constraint: $xdist (I) < lamda (J)$ , where xdist is the set of distinct $x$ -values.

On entry, the Schoenberg–Whitney conditions fail to hold for at least one subset of the distinct data abscissae. $J = ⟨ value ⟩$ , $xdist (J) = ⟨ value ⟩$ , $J + 4 = ⟨ value ⟩$ and $lamda (J + 4) = ⟨ value ⟩$ .
Constraint: $xdist (J) < lamda (J + 4)$ , where xdist is the set of distinct $x$ -values.

On entry, the Schoenberg–Whitney conditions fail to hold for at least one subset of the distinct data abscissae. $L = ⟨ value ⟩$ , $xdist (L) = ⟨ value ⟩$ , $I = ⟨ value ⟩$ and $lamda (I) = ⟨ value ⟩$ .
Constraint: $xdist (L) > lamda (I)$ , where xdist is the set of distinct $x$ -values.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

The rounding errors committed are such that the computed coefficients are exact for a slightly perturbed set of ordinates

y_{r} + δ y_{r}

. The ratio of the root-mean-square value for the

δ y_{r}

to the root-mean-square value of the

y_{r}

can be expected to be less than a small multiple of

κ \times m \times machine precision

, where

κ

is a condition number for the problem. Values of

κ

for

20

–

30

practical datasets all proved to lie between

4.5

and

7.8

(see Cox (1975)). (Note that for these datasets, replacing the coincident end knots at the end points

x_{1}

and

x_{m}

used in the routine by various choices of non-coincident exterior knots gave values of

κ

between

16

and

180

. Again see Cox (1975) for further details.) In general we would not expect

κ

to be large unless the choice of knots results in near-violation of the Schoenberg–Whitney conditions.

A cubic spline which adequately fits the data and is free from spurious oscillations is more likely to be obtained if the knots are chosen to be grouped more closely in regions where the function (underlying the data) or its derivatives change more rapidly than elsewhere.

8 Parallelism and Performance

Background information to multithreading can be found in the Multithreading documentation.

e02baf is not threaded in any implementation.

9 Further Comments

The time taken is approximately

C \times (2 m + \bar{n} + 7)

seconds, where

C

is a machine-dependent constant.

Multiple knots are permitted as long as their multiplicity does not exceed

4

, i.e., the complete set of knots must satisfy

λ_{i} < λ_{i + 4}

, for

i = 1, 2, \dots, \bar{n} + 3

, (see Section 6). At a knot of multiplicity one (the usual case),

s (x)

and its first two derivatives are continuous. At a knot of multiplicity two,

s (x)

and its first derivative are continuous. At a knot of multiplicity three,

s (x)

is continuous, and at a knot of multiplicity four,

s (x)

is generally discontinuous.

The routine can be used efficiently for cubic spline interpolation, i.e., if

m = \bar{n} + 3

. The abscissae must then of course satisfy

x_{1} < x_{2} < \dots < x_{m}

. Recommended values for the knots in this case are

λ_{i} = x_{i - 2}

, for

i = 5, 6, \dots, \bar{n} + 3

The Schoenberg–Whitney conditions (see Schoenberg and Whitney (1953)) state that there must be a subset of

ncap7 - 4

strictly increasing values,

x (R (1)), x (R (2)), \dots, x (R (ncap7 - 4))

, among the abscissae such that

$x (R (1)) < lamda (1) < x (R (5))$ ,
$x (R (2)) < lamda (2) < x (R (6))$ ,
$⋮$
$x (R (ncap7 - 8)) < lamda (ncap7 - 8) < x (R (ncap7 - 4))$ .

If this condition is not satisfied, then there is no unique solution: there are regions containing too many knots compared with the number of data points.

10 Example

Determine a weighted least squares cubic spline approximation with five intervals (four interior knots) to a set of

14

given data points. Tabulate the data and the corresponding values of the approximating spline, together with the residual errors, and also the values of the approximating spline at points half-way between each pair of adjacent data points.

The example program is written in a general form that will enable a cubic spline approximation with

\bar{n}

intervals (

\bar{n} - 1

interior knots) to be obtained to

m

data points, with arbitrary positive weights, and the approximation to be tabulated. Note that e02bbf is used to evaluate the approximating spline. The program is self-starting in that any number of datasets can be supplied.

e02ba: FL CL CPP AD PY MB

NAG FL Interfacee02baf (dim1_​spline_​knots)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG FL Interface
e02baf (dim1_spline_knots)