naginterfaces.library.smooth.fit_spline_parest¶

naginterfaces.library.smooth.fit_spline_parest(method, x, y, crit, wt=None, u=0.0, tol=0.0, maxcal=0)[source]¶

fit_spline_parest estimates the values of the smoothing parameter and fits a cubic smoothing spline to a set of data.

For full information please refer to the NAG Library document for g10ac

https://support.nag.com/numeric/nl/nagdoc_31.1/flhtml/g10/g10acf.html

Parameters

methodstr, length 1

Indicates whether the smoothing parameter is to be found by minimization of the CV or GCV functions, or by finding the smoothing parameter corresponding to a specified degrees of freedom value.

$m e t h o d ='C'$

Cross-validation is used.

$m e t h o d ='D'$

The degrees of freedom are specified.

$m e t h o d ='G'$

Generalized cross-validation is used.

xfloat, array-like, shape $(n)$

The distinct and ordered values $x_{i}$ , for $i = 1, 2, \dots, n$ .

yfloat, array-like, shape $(n)$

The values $y_{i}$ , for $i = 1, 2, \dots, n$ .

critfloat

If $m e t h o d ='D'$ , the required degrees of freedom for the spline.

If $m e t h o d ='C'$ or $'G'$ , $c r i t$ need not be set.

wtNone or float, array-like, shape $(n)$ , optional

If $weight ='W'$ , $w t$ must contain the $n$ weights. Otherwise $w t$ is not referenced and unit weights are assumed.

ufloat, optional

The upper bound on the smoothing parameter. If $u \leq t o l$ , $u = 1000.0$ will be used instead. See Further Comments for details on how this argument is used.

tolfloat, optional

The accuracy to which the smoothing parameter $r h o$ is required. $t o l$ should preferably be not much less than $\sqrt{ϵ}$ , where $ϵ$ is the machine precision. If $t o l < ϵ$ , $t o l = \sqrt{ϵ}$ will be used instead.

maxcalint, optional

The maximum number of spline evaluations to be used in finding the value of $ρ$ . If $m a x c a l < 3$ , $m a x c a l = 100$ will be used instead.

Returns

yhatfloat, ndarray, shape $(n)$: The fitted values, ${^y}_{i}$ , for $i = 1, 2, \dots, n$ .
cfloat, ndarray, shape $(n - 1, 3)$: The spline coefficients. More precisely, the value of the spline approximation at $t$ is given by $((c [i - 1, 2] \times d + c [i - 1, 1]) \times d + c [i - 1, 0]) \times d + {^y}_{i}$ , where $x_{i} \leq t < x_{i + 1}$ and $d = t - x_{i}$ .
rssfloat: The (weighted) residual sum of squares.
dffloat: The residual degrees of freedom. If $m e t h o d ='D'$ this will be $n - c r i t$ to the required accuracy.
resfloat, ndarray, shape $(n)$: The (weighted) residuals, $r_{i}$ , for $i = 1, 2, \dots, n$ .
hfloat, ndarray, shape $(n)$: The leverages, $h_{i i}$ , for $i = 1, 2, \dots, n$ .
critfloat: If $m e t h o d ='C'$ , the value of the cross-validation, or if $m e t h o d ='G'$ , the value of the generalized cross-validation function, evaluated at the value of $ρ$ returned in $r h o$ .
rhofloat: The smoothing parameter, $ρ$ .

Raises

NagValueError

(errno $1$ )

On entry, $c r i t = ⟨ v a l u e ⟩$ .

Constraint: if $m e t h o d ='D'$ , $c r i t \leq n$ .

(errno $1$ )

On entry, $c r i t = ⟨ v a l u e ⟩$ .

Constraint: if $m e t h o d ='D'$ , $c r i t > 2.0$ .

(errno $1$ )

On entry, $m e t h o d$ is not valid: $m e t h o d = ⟨ v a l u e ⟩$ .

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 3$ .

(errno $2$ )

On entry, at least one element of $w t \leq 0.0$ .

(errno $3$ )

On entry, $x$ is not a strictly ordered array.

(errno $4$ )

For the specified degrees of freedom, $r h o > u$ : $u = ⟨ v a l u e ⟩$ .

Warns

NagAlgorithmicWarning

(errno $5$ ): Accuracy of $t o l$ cannot be achieved: $t o l = ⟨ v a l u e ⟩$ .
(errno $6$ ): $m a x c a l$ iterations have been performed.
(errno $7$ ): Optimum value of $r h o$ lies above $u$ : $u = ⟨ v a l u e ⟩$ .

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

For a set of $n$ observations $(x_{i}, y_{i})$ , for $i = 1, 2, \dots, n$ , the spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is not suitable.

Cubic smoothing splines arise as the unique real-valued solution function $f$ , with absolutely continuous first derivative and squared-integrable second derivative, which minimizes

n \sum i = 1 w_{i} {(y_{i} - f (x_{i}))}^{2} + ρ \int_{- \infty}^{\infty} {(f^{''} (x))}^{2} d x,

where $w_{i}$ is the (optional) weight for the $i$ th observation and $ρ$ is the smoothing parameter. This criterion consists of two parts: the first measures the fit of the curve and the second the smoothness of the curve. The value of the smoothing parameter $ρ$ weights these two aspects; larger values of $ρ$ give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be fitted see Hutchinson and de Hoog (1985) and Reinsch (1967).

The fitted values, $^y = {({^y}_{1}, {^y}_{2}, \dots, {^y}_{n})}_{1}^{T}$ , and weighted residuals, $r_{i}$ , can be written as:

^y = H y and r_{i} = \sqrt{w_{i}} (y_{i} - {^y}_{i})

for a matrix $H$ . The residual degrees of freedom for the spline is $t r a c e (I - H)$ and the diagonal elements of $H$ are the leverages.

The parameter $ρ$ can be estimated in a number of ways.

The degrees of freedom for the spline can be specified, i.e., find $ρ$ such that $t r a c e (H) = ν_{0}$ for given $ν_{0}$ .
Minimize the cross-validation (CV), i.e., find $ρ$ such that the CV is minimized, where

$C V = \frac{1}{\sum_{i = 1}^{n} w_{i}} n \sum i = 1 {[\frac{r_{i}}{1 - h_{i i}}]}^{2} .$
Minimize the generalized cross-validation (GCV), i.e., find $ρ$ such that the GCV is minimized, where

$G C V = \frac{n^{2}}{\sum_{i = 1}^{n} w_{i}} ⎡ ⎣ \frac{\sum_{i = 1}^{n} r_{i}^{2}}{{(\sum_{i = 1}^{n} (1 - h_{i i}))}^{2}} ⎤ ⎦ .$

fit_spline_parest requires the $x_{i}$ to be strictly increasing. If two or more observations have the same $x_{i}$ value then they should be replaced by a single observation with $y_{i}$ equal to the (weighted) mean of the $y$ values and weight, $w_{i}$ , equal to the sum of the weights. This operation can be performed by data_order().

The algorithm is based on Hutchinson (1986). roots.contfn_brent_rcomm is used to solve for $ρ$ given $ν_{0}$ and the method of opt.one_var_func is used to minimize the GCV or CV.

References

Hastie, T J and Tibshirani, R J, 1990, Generalized Additive Models, Chapman and Hall

Hutchinson, M F, 1986, Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines, ACM Trans. Math. Software (12), 150–153

Hutchinson, M F and de Hoog, F R, 1985, Smoothing noisy data with spline functions, Numer. Math. (47), 99–106

Reinsch, C H, 1967, Smoothing by spline functions, Numer. Math. (10), 177–183

NAG and Python

Return to Front

naginterfaces.library.smooth.fit_spline_parest¶

naginterfaces.library.smooth.fit_​spline_​parest¶

naginterfaces.library.smooth.fit_spline_parest¶