NAG Library Function Document
nag_smooth_spline_estim (g10acc)
1 Purpose
nag_smooth_spline_estim (g10acc) estimates the values of the smoothing argument and fits a cubic smoothing spline to a set of data.
2 Specification
#include <nag.h> |
#include <nagg10.h> |
void |
nag_smooth_spline_estim (Nag_SmoothParamMethods method,
Integer n,
const double x[],
const double y[],
const double weights[],
double yhat[],
double coeff[],
double *rss,
double *df,
double res[],
double h[],
double *crit,
double *rho,
double u,
double tol,
Integer maxcal,
NagError *fail) |
|
3 Description
For a set of observations , for , the spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is not suitable.
Cubic smoothing splines arise as the unique real-valued solution function,
, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes:
where
is the (optional) weight for the
th observation and
is the smoothing argument. This criterion consists of two parts: the first measures the fit of the curve and the second the smoothness of the curve. The value of the smoothing argument
weights these two aspects; larger values of
give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be fitted see
Hutchinson and de Hoog (1985) and
Reinsch (1967).
The fitted values,
, and weighted residuals,
, can be written as:
for a matrix
. The residual degrees of freedom for the spline is trace
and the diagonal elements of
are the leverages.
The argument
can be estimated in a number of ways.
1. |
The degrees of freedom for the spline can be specified, i.e., find such that trace for given . |
2. |
Minimize the cross-validation (CV), i.e., find such that the CV is minimized, where
|
3. |
Minimize the generalized cross-validation (GCV), i.e., find such that the GCV is minimized, where
|
nag_smooth_spline_estim (g10acc) requires the
to be strictly increasing. If two or more observations have the same
value then they should be replaced by a single observation with
equal to the (weighted) mean of the
values and weight,
, equal to the sum of the weights. This operation can be performed by
nag_order_data (g10zac).
The algorithm is based on
Hutchinson (1986).
4 References
Hastie T J and Tibshirani R J (1990) Generalized Additive Models Chapman and Hall
Hutchinson M F (1986) Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines ACM Trans. Math. Software 12 150–153
Hutchinson M F and de Hoog F R (1985) Smoothing noisy data with spline functions Numer. Math. 47 99–106
Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183
5 Arguments
- 1:
method – Nag_SmoothParamMethodsInput
On entry: indicates whether the smoothing argument is to be found by minimization of the CV or GCV functions, or by finding the smoothing argument corresponding to a specified degrees of freedom value.
- Cross-validation is used.
- The degrees of freedom are specified.
- Generalized cross-validation is used.
Constraint:
, or .
- 2:
n – IntegerInput
On entry: the number of observations, .
Constraint:
.
- 3:
x[n] – const doubleInput
-
On entry: the distinct and ordered values , for .
Constraint:
, for .
- 4:
y[n] – const doubleInput
-
On entry: the values , for .
- 5:
weights[n] – const doubleInput
-
On entry:
weights must contain the
weights, if they are required. Otherwise,
weights must be set to
NULL.
Constraint:
if
weights are required, then
, for
.
- 6:
yhat[n] – doubleOutput
-
On exit: the fitted values, , for .
- 7:
coeff[] – doubleOutput
-
On exit: the spline coefficients. More precisely, the value of the spline approximation at is given by , where and .
-
On exit: the (weighted) residual sum of squares.
- 9:
df – double *Output
-
On exit: the residual degrees of freedom. If , this will be to the required accuracy.
- 10:
res[n] – doubleOutput
-
On exit: the (weighted) residuals, , for .
- 11:
h[n] – doubleOutput
-
On exit: the leverages, , for .
- 12:
crit – double *Input/Output
-
On entry: if
, the required degrees of freedom for the spline.
If
or
,
crit need not be set.
Constraint:
.
On exit: if
, the value of the cross-validation, or if
, the value of the generalized cross-validation function, evaluated at the value of
returned in
rho.
- 13:
rho – double *Output
-
On exit: the smoothing argument, .
- 14:
u – doubleInput
-
On entry: the upper bound on the smoothing argument. See
Section 9 for details on how this argument is used.
Suggested value:
.
Constraint:
.
- 15:
tol – doubleInput
-
On entry: the accuracy to which the smoothing argument
rho is required.
tol should be preferably not much less than
, where
is the
machine precision.
Constraint:
.
- 16:
maxcal – IntegerInput
-
On entry: the maximum number of spline evaluations to be used in finding the value of .
Suggested value:
.
Constraint:
.
- 17:
fail – NagError *Input/Output
-
The NAG error argument (see
Section 3.6 in the Essential Introduction).
6 Error Indicators and Warnings
- NE_2_REAL_ARG_LE
-
On entry, while . These arguments must satisfy .
- NE_ALLOC_FAIL
-
Dynamic memory allocation failed.
- NE_BAD_PARAM
-
On entry, argument
method had an illegal value.
- NE_G10AC_ACC
-
A solution to the accuracy given by
tol has not been achieved in
maxcal iterations. Try increasing the value of
tol and/or
maxcal.
- NE_G10AC_CG_RHO
-
or
and the optimal value of
. Try a larger value of
u.
- NE_G10AC_DF_RHO
-
and the required value of
rho for specified degrees of freedom
. Try a larger value of
u.
- NE_G10AC_DF_TOL
-
and the accuracy given by
tol cannot be achieved. Try increasing the value of
tol.
- NE_INT_ARG_LT
-
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INTERNAL_ERROR
-
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
- NE_NOT_STRICTLY_INCREASING
-
The sequence
x is not strictly increasing:
,
.
- NE_REAL
-
On entry, .
Constraint: , if .
- NE_REAL_ARRAY_CONS
-
On entry, .
Constraint: , for .
- NE_REAL_INT_ARG_CONS
-
On entry, and . These arguments must satisfy , if .
- NE_REAL_MACH_PREC
-
On entry, , .
Constraint: .
7 Accuracy
When minimizing the cross-validation or generalized cross-validation, the error in the estimate of should be within . When finding for a fixed number of degrees of freedom the error in the estimate of should be within .
Given the value of , the accuracy of the fitted spline depends on the value of and the position of the values. The values of and are scaled and is transformed to avoid underflow and overflow problems.
8 Parallelism and Performance
Not applicable.
The time to fit the spline for a given value of is of order .
When finding the value of
that gives the required degrees of freedom, the algorithm examines the interval 0.0 to
u. For small degrees of freedom the value of
can be large, as in the theoretical case of two degrees of freedom when the spline reduces to a straight line and
is infinite. If the CV or GCV is to be minimized then the algorithm searches for the minimum value in the interval 0.0 to
u. If the function is decreasing in that range then the boundary value of
u will be returned. In either case, the larger the value of
u the more likely is the interval to contain the required solution, but the process will be less efficient.
Regression splines with a small
number of knots can be fitted by
nag_1d_spline_fit_knots (e02bac) and
nag_1d_spline_fit (e02bec).
10 Example
The data, given by
Hastie and Tibshirani (1990), is the age,
, and C-peptide concentration (pmol/ml),
, from a study of the factors affecting insulin-dependent diabetes mellitus in children. The data is input, reduced to a strictly ordered set by
nag_order_data (g10zac) and a spline with 5 degrees of freedom is fitted by nag_smooth_spline_estim (g10acc). The fitted values and residuals are printed.
10.1 Program Text
Program Text (g10acce.c)
10.2 Program Data
Program Data (g10acce.d)
10.3 Program Results
Program Results (g10acce.r)