naginterfaces.library.correg.ridge_opt¶

naginterfaces.library.correg.ridge_opt(x, isx, y, h, opt, niter, tol, orig, optloo, tau=0.0)[source]¶

ridge_opt calculates a ridge regression, optimizing the ridge parameter according to one of four prediction error criteria.

For full information please refer to the NAG Library document for g02ka

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g02/g02kaf.html

Parameters

xfloat, array-like, shape $(n, m)$

The values of independent variables in the data matrix $X$ .

isxint, array-like, shape $(m)$

Indicates which $m$ independent variables are included in the model.

$i s x [j - 1] = 1$

The $j$ th variable in $x$ will be included in the model.

$i s x [j - 1] = 0$

Variable $j$ is excluded.

yfloat, array-like, shape $(n)$

The $n$ values of the dependent variable $y$ .

hfloat

An initial value for the ridge regression parameter $h$ ; used as a starting point for the optimization.

optint

The measure of prediction error used to optimize the ridge regression parameter $h$ . The value of $o p t$ must be set equal to one of:

$o p t = 1$

Generalized cross-validation (GCV);

$o p t = 2$

Unbiased estimate of variance (UEV)

$o p t = 3$

Future prediction error (FPE)

$o p t = 4$

Bayesian information criteron (BIC).

niterint

The maximum number of iterations allowed to optimize the ridge regression parameter $h$ .

tolfloat

Iterations of the ridge regression parameter $h$ will halt when consecutive values of $h$ lie within $t o l$ .

origint

If $o r i g = 1$ , the parameter estimates $b$ are calculated for the original data; otherwise $o r i g = 2$ and the parameter estimates $~ b$ are calculated for the standardized data.

optlooint

If $o p t l o o = 2$ , the leave-one-out cross-validation estimate of prediction error is calculated; otherwise no such estimate is calculated and $o p t l o o = 1$ .

taufloat, optional

Singular values less than $t a u$ of the SVD of the data matrix $X$ will be set equal to zero.

Returns

hfloat

$h$ is the optimized value of the ridge regression parameter $h$ .

niterint

The number of iterations used to optimize the ridge regression parameter $h$ within $t o l$ .

nepfloat

The number of effective parameters, $γ$ , in the model.

bfloat, ndarray, shape $(ip + 1)$

Contains the intercept and parameter estimates for the fitted ridge regression model in the order indicated by $i s x$ . The first element of $b$ contains the estimate for the intercept; $b [j]$ contains the parameter estimate for the $j$ th independent variable in the model, for $j = 1, 2, \dots, ip$ .

viffloat, ndarray, shape $(ip)$

The variance inflation factors in the order indicated by $i s x$ . For the $j$ th independent variable in the model, $v i f [j - 1]$ is the value of $v_{j}$ , for $j = 1, 2, \dots, ip$ .

resfloat, ndarray, shape $(n)$

$r e s [i - 1]$ is the value of the $i$ th residual for the fitted ridge regression model, for $i = 1, 2, \dots, n$ .

rssfloat

The sum of squares of residual values.

dfint

The degrees of freedom for the residual sum of squares $r s s$ .

perrfloat, ndarray, shape $(5)$

The first four elements contain, in this order, the measures of prediction error: GCV, UEV, FPE and BIC.

If $o p t l o o = 2$ , $p e r r [4]$ is the LOOCV estimate of prediction error; otherwise $p e r r [4]$ is not referenced.

Raises

NagValueError

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n > 1$ .

(errno $1$ )

On entry, $t a u = ⟨ v a l u e ⟩$ .

Constraint: $t a u \geq 0.0$ .

(errno $1$ )

On entry, $o p t = ⟨ v a l u e ⟩$ .

Constraint: $o p t = 1$ , $2$ , $3$ or $4$ .

(errno $1$ )

On entry, $h = ⟨ v a l u e ⟩$ .

Constraint: $h > 0.0$ .

(errno $1$ )

On entry, $o p t l o o = ⟨ v a l u e ⟩$ .

Constraint: $o p t l o o = 1$ or $2$ .

(errno $1$ )

On entry, $t o l = ⟨ v a l u e ⟩$ .

Constraint: $t o l > 0.0$ .

(errno $1$ )

On entry, $n i t e r = ⟨ v a l u e ⟩$ .

Constraint: $n i t e r \geq 1$ .

(errno $1$ )

On entry, $o r i g = ⟨ v a l u e ⟩$ .

Constraint: $o r i g = 1$ or $2$ .

(errno $2$ )

On entry, $m = ⟨ v a l u e ⟩$ and $n = ⟨ v a l u e ⟩$ .

Constraint: $m \leq n$ .

(errno $2$ )

On entry, $ip = ⟨ v a l u e ⟩$ ; $m = ⟨ v a l u e ⟩$ .

Constraint: $1 \leq ip \leq m$ .

(errno $2$ )

On entry, $i s x [⟨ v a l u e ⟩] = ⟨ v a l u e ⟩$ .

Constraint: $i s x [j - 1] = 0$ or $1$ .

(errno $2$ )

On entry, $ip = ⟨ v a l u e ⟩$ .

Constraint: $s u m (i s x) = ip$ .

(errno $3$ )

SVD failed to converge.

Warns

NagAlgorithmicWarning

(errno $- 1$ ): Maximum number of iterations used.

Notes

A linear model has the form:

y = c + X β + ϵ,

where

$y$ is an $n \times 1$ matrix of values of a dependent variable;

$c$ is a scalar intercept term;

$X$ is an $n \times m$ matrix of values of independent variables;

$β$ is an $m \times 1$ matrix of unknown values of parameters;

$ϵ$ is an $n \times 1$ matrix of unknown random errors such that variance of $ϵ = σ^{2} I$ .

Let $~ X$ be the mean-centred $X$ and $~ y$ the mean-centred $y$ . Furthermore, $~ X$ is scaled such that the diagonal elements of the cross product matrix ${~ X}^{T} ~ X$ are one. The linear model now takes the form:

~ y = ~ X ~ β + ϵ .

Ridge regression estimates the parameters $~ β$ in a penalised least squares sense by finding the $~ b$ that minimizes

{∥ ∥ ~ X ~ b - ~ y ∥ ∥}^{2} + h {∥ ∥ ~ b ∥ ∥}^{2}, h > 0,

where $∥ \cdot ∥$ denotes the $ℓ_{2}$ -norm and $h$ is a scalar regularization or ridge parameter. For a given value of $h$ , the parameter estimates $~ b$ are found by evaluating

~ b = {({~ X}^{T} ~ X + h I)}^{- 1} {~ X}^{T} ~ y .

Note that if $h = 0$ the ridge regression solution is equivalent to the ordinary least squares solution.

Rather than calculate the inverse of ( ${~ X}^{T} ~ X + h I$ ) directly, ridge_opt uses the singular value decomposition (SVD) of $~ X$ . After decomposing $~ X$ into $U D V^{T}$ where $U$ and $V$ are orthogonal matrices and $D$ is a diagonal matrix, the parameter estimates become

~ b = V {(D^{T} D + h I)}^{- 1} D U^{T} ~ y .

A consequence of introducing the ridge parameter is that the effective number of parameters, $γ$ , in the model is given by the sum of diagonal elements of

D^{T} D {(D^{T} D + h I)}^{- 1},

see Moody (1992) for details.

Any multi-collinearity in the design matrix $X$ may be highlighted by calculating the variance inflation factors for the fitted model. The $j$ th variance inflation factor, $v_{j}$ , is a scaled version of the multiple correlation coefficient between independent variable $j$ and the other independent variables, $R_{j}$ , and is given by

v_{j} = \frac{1}{1 - R_{j}}, j = 1, 2, \dots, m .

The $m$ variance inflation factors are calculated as the diagonal elements of the matrix:

{({~ X}^{T} ~ X + h I)}^{- 1} {~ X}^{T} ~ X {({~ X}^{T} ~ X + h I)}^{- 1},

which, using the SVD of $~ X$ , is equivalent to the diagonal elements of the matrix:

V {(D^{T} D + h I)}^{- 1} D^{T} D {(D^{T} D + h I)}^{- 1} V^{T} .

Although parameter estimates $~ b$ are calculated by using $~ X$ , it is usual to report the parameter estimates $b$ associated with $X$ . These are calculated from $~ b$ , and the means and scalings of $X$ . Optionally, either $~ b$ or $b$ may be calculated.

The method can adopt one of four criteria to minimize while calculating a suitable value for $h$ :

Generalized cross-validation (GCV):

$\frac{n s}{{(n - γ)}^{2}};$
Unbiased estimate of variance (UEV):

$\frac{s}{n - γ};$
Future prediction error (FPE):

$\frac{1}{n} (s + \frac{2 γ s}{n - γ});$
Bayesian information criterion (BIC):

$\frac{1}{n} (s + \frac{log (n) γ s}{n - γ});$

where $s$ is the sum of squares of residuals. However, the function returns all four of the above prediction errors regardless of the one selected to minimize the ridge parameter, $h$ . Furthermore, the function will optionally return the leave-one-out cross-validation (LOOCV) prediction error.

References

Hastie, T, Tibshirani, R and Friedman, J, 2003, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer Series in Statistics

Moody, J.E., 1992, The effective number of parameters: An analysis of generalisation and regularisation in nonlinear learning systems, In: Neural Information Processing Systems, (eds J E Moody, S J Hanson, and R P Lippmann), 4, 847–854, Morgan Kaufmann San Mateo CA

NAG and Python

Return to Front

naginterfaces.library.correg.ridge_opt¶

naginterfaces.library.correg.ridge_​opt¶

naginterfaces.library.correg.ridge_opt¶