PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_smooth_fit_spline_parest (g10ac)
Purpose
nag_smooth_fit_spline_parest (g10ac) estimates the values of the smoothing parameter and fits a cubic smoothing spline to a set of data.
Syntax
[
yhat,
c,
rss,
df,
res,
h,
crit,
rho,
ifail] = g10ac(
method,
x,
y,
crit, 'n',
n, 'wt',
wt, 'u',
u, 'tol',
tol, 'maxcal',
maxcal)
[
yhat,
c,
rss,
df,
res,
h,
crit,
rho,
ifail] = nag_smooth_fit_spline_parest(
method,
x,
y,
crit, 'n',
n, 'wt',
wt, 'u',
u, 'tol',
tol, 'maxcal',
maxcal)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 24: |
tol was made optional; weight was removed from the interface; wt was made optional |
Description
For a set of observations , for , the spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is not suitable.
Cubic smoothing splines arise as the unique real-valued solution function
, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes
where
is the (optional) weight for the
th observation and
is the smoothing argument. This criterion consists of two parts: the first measures the fit of the curve and the second the smoothness of the curve. The value of the smoothing argument
weights these two aspects; larger values of
give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be fitted see
Hutchinson and de Hoog (1985) and
Reinsch (1967).
The fitted values,
, and weighted residuals,
, can be written as:
for a matrix
. The residual degrees of freedom for the spline is
and the diagonal elements of
are the leverages.
The parameter
can be estimated in a number of ways.
(i) |
The degrees of freedom for the spline can be specified, i.e., find such that for given . |
(ii) |
Minimize the cross-validation (CV), i.e., find such that the CV is minimized, where
|
(iii) |
Minimize the generalized cross-validation (GCV), i.e., find such that the GCV is minimized, where
|
nag_smooth_fit_spline_parest (g10ac) requires the
to be strictly increasing. If two or more observations have the same
value then they should be replaced by a single observation with
equal to the (weighted) mean of the
values and weight,
, equal to the sum of the weights. This operation can be performed by
nag_smooth_data_order (g10za)The algorithm is based on
Hutchinson (1986).
nag_roots_contfn_brent_rcomm (c05az) is used to solve for
given
and the method of
nag_opt_one_var_func (e04ab) is used to minimize the GCV or CV.
References
Hastie T J and Tibshirani R J (1990) Generalized Additive Models Chapman and Hall
Hutchinson M F (1986) Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines ACM Trans. Math. Software 12 150–153
Hutchinson M F and de Hoog F R (1985) Smoothing noisy data with spline functions Numer. Math. 47 99–106
Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183
Parameters
Compulsory Input Parameters
- 1:
– string (length ≥ 1)
-
Indicates whether the smoothing parameter is to be found by minimization of the CV or GCV functions, or by finding the smoothing parameter corresponding to a specified degrees of freedom value.
- Cross-validation is used.
- The degrees of freedom are specified.
- Generalized cross-validation is used.
Constraint:
, or .
- 2:
– double array
-
The distinct and ordered values
, for .
Constraint:
, for .
- 3:
– double array
-
The values
, for .
- 4:
– double scalar
-
If
, the required degrees of freedom for the spline.
If
or
,
crit need not be set.
Constraint:
.
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the dimension of the arrays
x,
y. (An error is raised if these dimensions are not equal.)
, the number of observations.
Constraint:
.
- 2:
– double array
-
The dimension of the array
wt
must be at least
if
If
,
wt must contain the
weights. Otherwise
wt is not referenced and unit weights are assumed.
Constraint:
if , , for .
- 3:
– double scalar
Default:
The upper bound on the smoothing parameter. If
,
will be used instead. See
Further Comments for details on how this argument is used.
- 4:
– double scalar
Default:
The accuracy to which the smoothing parameter
rho is required.
tol should preferably be not much less than
, where
is the
machine precision. If
,
will be used instead.
- 5:
– int64int32nag_int scalar
Default:
The maximum number of spline evaluations to be used in finding the value of . If , will be used instead.
Output Parameters
- 1:
– double array
-
The fitted values,
, for .
- 2:
– double array
-
The spline coefficients. More precisely, the value of the spline approximation at is given by , where and .
-
The (weighted) residual sum of squares.
- 4:
– double scalar
-
The residual degrees of freedom. If this will be to the required accuracy.
- 5:
– double array
-
The (weighted) residuals,
, for .
- 6:
– double array
-
The leverages,
, for .
- 7:
– double scalar
-
If
, the value of the cross-validation, or if
, the value of the generalized cross-validation function, evaluated at the value of
returned in
rho.
- 8:
– double scalar
-
The smoothing parameter, .
- 9:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
Cases prefixed with W are classified as warnings and
do not generate an error of type NAG:error_n. See nag_issue_warnings.
-
-
Constraint: if , .
Constraint: if , .
Constraint: .
Constraint: .
On entry,
method is not valid.
-
-
On entry, at least one element of .
-
-
On entry,
x is not a strictly ordered array.
-
-
For the specified degrees of freedom, :
- W
-
Accuracy of
tol cannot be achieved:
- W
-
maxcal iterations have been performed.
- W
-
Optimum value of
rho lies above
u:
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
When minimizing the cross-validation or generalized cross-validation, the error in the estimate of should be within . When finding for a fixed number of degrees of freedom the error in the estimate of should be within .
Given the value of , the accuracy of the fitted spline depends on the value of and the position of the values. The values of and are scaled and is transformed to avoid underflow and overflow problems.
Further Comments
The time to fit the spline for a given value of is of order .
When finding the value of
that gives the required degrees of freedom, the algorithm examines the interval
to
u. For small degrees of freedom the value of
can be large, as in the theoretical case of two degrees of freedom when the spline reduces to a straight line and
is infinite. If the CV or GCV is to be minimized then the algorithm searches for the minimum value in the interval
to
u. If the function is decreasing in that range then the boundary value of
u will be returned. In either case, the larger the value of
u the more likely is the interval to contain the required solution, but the process will be less efficient.
Regression splines with a small
number of knots can be fitted by
nag_fit_1dspline_knots (e02ba) and
nag_fit_1dspline_auto (e02be).
Example
This example uses the data given by
Hastie and Tibshirani (1990), which consists of the age,
, and C-peptide concentration (pmol/ml),
, from a study of the factors affecting insulin-dependent diabetes mellitus in children. The data is input, reduced to a strictly ordered set by
nag_smooth_data_order (g10za) and a spline with
degrees of freedom is fitted by
nag_smooth_fit_spline_parest (g10ac). The fitted values and residuals are printed.
Open in the MATLAB editor:
g10ac_example
function g10ac_example
fprintf('g10ac example results\n\n');
x = [ 5.2 8.8 10.5 10.6 10.4 1.8 12.7 15.6 5.8 1.9 ...
2.2 4.8 7.9 5.2 0.9 11.8 7.9 11.5 10.6 8.5 ...
11.1 12.8 11.3 1.0 14.5 11.9 8.1 13.8 15.5 9.8 ...
11.0 12.4 11.1 5.1 4.8 4.2 6.9 13.2 9.9 12.5 ...
13.2 8.9 10.8];
y = [ 4.8 4.1 5.2 5.5 5.0 3.4 3.4 4.9 5.6 3.7 ...
3.9 4.5 4.8 4.9 3.0 4.6 4.8 5.5 4.5 5.3 ...
4.7 6.6 5.1 3.9 5.7 5.1 5.2 3.7 4.9 4.8 ...
4.4 5.2 5.1 4.6 3.9 5.1 5.1 6.0 4.9 4.1 ...
4.6 4.9 5.1];
[n, x, y, wt, rss, ifail] = g10za( ...
x, y);
x = x(1:n);
y = y(1:n);
crit = 12;
method = 'D';
[yhat, c, rss, df, res, h, crit, rho, ifail] = ...
g10ac( ...
method, x, y, crit, 'wt', wt);
fprintf('Residual sum of squares = %10.2f\n', rss);
fprintf('Degrees of freedom = %10.2f\n', df);
fprintf('rho = %10.2f\n', rho);
fprintf('\n Input data Output results\n');
fprintf(' i x y yhat h\n');
ivar = double(1:n)';
fprintf('%4d%8.3f%8.3f%14.3f%8.3f\n', [ivar x y yhat h]');
g10ac example results
Residual sum of squares = 10.35
Degrees of freedom = 25.00
rho = 2.68
Input data Output results
i x y yhat h
1 0.900 3.000 3.373 0.534
2 1.000 3.900 3.406 0.427
3 1.800 3.400 3.642 0.313
4 1.900 3.700 3.686 0.313
5 2.200 3.900 3.839 0.448
6 4.200 5.100 4.614 0.564
7 4.800 4.200 4.576 0.442
8 5.100 4.600 4.715 0.189
9 5.200 4.850 4.783 0.407
10 5.800 5.600 5.193 0.455
11 6.900 5.100 5.184 0.592
12 7.900 4.800 4.958 0.530
13 8.100 5.200 4.931 0.235
14 8.500 5.300 4.845 0.245
15 8.800 4.100 4.763 0.271
16 8.900 4.900 4.748 0.292
17 9.800 4.800 4.850 0.301
18 9.900 4.900 4.875 0.277
19 10.400 5.000 4.970 0.173
20 10.500 5.200 4.977 0.154
21 10.600 5.000 4.979 0.285
22 10.800 5.100 4.970 0.136
23 11.000 4.400 4.961 0.137
24 11.100 4.900 4.964 0.284
25 11.300 5.100 4.975 0.162
26 11.500 5.500 4.975 0.186
27 11.800 4.600 4.930 0.213
28 11.900 5.100 4.911 0.220
29 12.400 5.200 4.852 0.206
30 12.500 4.100 4.857 0.196
31 12.700 3.400 4.900 0.189
32 12.800 6.600 4.932 0.193
33 13.200 5.300 4.955 0.488
34 13.800 3.700 4.797 0.408
35 14.500 5.700 5.076 0.559
36 15.500 4.900 4.979 0.445
37 15.600 4.900 4.946 0.535
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015