naginterfaces.library.correg.ridge¶

naginterfaces.library.correg.ridge(x, isx, y, h, wantb, wantvf, pec=None)[source]¶

ridge calculates a ridge regression, with ridge parameters supplied by you.

For full information please refer to the NAG Library document for g02kb

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g02/g02kbf.html

Parameters

xfloat, array-like, shape $(n, m)$

The values of independent variables in the data matrix $X$ .

isxint, array-like, shape $(m)$

Indicates which $m$ independent variables are included in the model.

$i s x [j - 1] = 1$

The $j$ th variable in $x$ will be included in the model.

$i s x [j - 1] = 0$

Variable $j$ is excluded.

yfloat, array-like, shape $(n)$

The $n$ values of the dependent variable $y$ .

hfloat, array-like, shape $(lh)$

$h [j - 1]$ is the value of the $j$ th ridge parameter $h$ .

wantbint

Defines the options for parameter estimates.

$w a n t b = 0$

Parameter estimates are not calculated and $b$ is not referenced.

$w a n t b = 1$

Parameter estimates $b$ are calculated for the original data.

$w a n t b = 2$

Parameter estimates $~ b$ are calculated for the standardized data.

wantvfint

Defines the options for variance inflation factors.

$w a n t v f = 0$

Variance inflation factors are not calculated and the array $v f$ is not referenced.

$w a n t v f = 1$

Variance inflation factors are calculated.

pecNone or str, length 1, array-like, shape $(lpec)$ , optional

If $p e c$ is not None, $p e c [j - 1]$ defines the $j$ th prediction error, for $j = 1, 2, \dots, lpec$ ; otherwise $p e c$ is not referenced.

$p e c [j - 1] ='B'$

Bayesian information criterion (BIC).

$p e c [j - 1] ='F'$

Future prediction error (FPE).

$p e c [j - 1] ='G'$

Generalized cross-validation (GCV).

$p e c [j - 1] ='L'$

Leave-one-out cross-validation (LOOCV).

$p e c [j - 1] ='U'$

Unbiased estimate of variance (UEV).

Returns

nepfloat, ndarray, shape $(lh)$: $n e p [j - 1]$ is the number of effective parameters, $γ$ , in the $j$ th model, for $j = 1, 2, \dots, lh$ .
bfloat, ndarray, shape $(:, :)$: If $w a n t b \neq 0$ , $b$ contains the intercept and parameter estimates for the fitted ridge regression model in the order indicated by $i s x$ . $b [0, j - 1]$ , for $j = 1, 2, \dots, lh$ , contains the estimate for the intercept; $b [i, j - 1]$ contains the parameter estimate for the $i$ th independent variable in the model fitted with ridge parameter $h [j - 1]$ , for $i = 1, 2, \dots, ip$ .
vffloat, ndarray, shape $(:, :)$: If $w a n t v f = 1$ , the variance inflation factors. For the $i$ th independent variable in a model fitted with ridge parameter $h [j - 1]$ , $v f [i - 1, j - 1]$ is the value of $v_{i}$ , for $i = 1, 2, \dots, ip$ .
peNone or float, ndarray, shape $(:, :)$: If $p e c$ is None on entry, $p e$ is None; otherwise $p e [i - 1, j - 1]$ contains the prediction error of criterion $p e c [i - 1]$ for the model fitted with ridge parameter $h [j - 1]$ , for $j = 1, 2, \dots, lh$ , for $i = 1, 2, \dots, lpec$ .

Raises

NagValueError

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 1$ .

(errno $1$ )

On entry, $m = ⟨ v a l u e ⟩$ and $n = ⟨ v a l u e ⟩$ .

Constraint: $m \leq n$ .

(errno $1$ )

On entry, $h [j - 1] < 0$ for at least one $j$ .

Constraint: $h [j - 1] \leq 0.0$ , for all $j$ .

(errno $1$ )

On entry, $lh = ⟨ v a l u e ⟩$ .

Constraint: $lh > 0$ .

(errno $1$ )

On entry, $w a n t b = ⟨ v a l u e ⟩$ .

Constraint: $w a n t b = 0$ , $1$ or $2$ .

(errno $1$ )

On entry, $w a n t v f = ⟨ v a l u e ⟩$ .

Constraint: $w a n t v f = 0$ or $1$ .

(errno $1$ )

On entry, $p e c [j - 1]$ is invalid for at least one $j$ .

Constraint: if $p e c$ is not None, $p e c [j - 1] ='B'$ , $'F'$ , $'G'$ , $'L'$ or $'U'$ , for all $j$ .

(errno $2$ )

On entry, $i s x [j - 1] \neq 0$ or $1$ for at least one $j$ .

Constraint: $i s x [j - 1] = 0$ or $1$ , for all $j$ .

(errno $2$ )

On entry, $ip$ is not equal to the sum of elements in $i s x$ .

Constraint: exactly $ip$ elements of $i s x$ must be equal to $1$ .

(errno $2$ )

On entry, $ldb = ⟨ v a l u e ⟩$ and $ip = ⟨ v a l u e ⟩$ .

Constraint: if $w a n t b \neq 0$ , $ldb \geq ip + 1$ .

(errno $2$ )

On entry, $ldvf = ⟨ v a l u e ⟩$ and $ip = ⟨ v a l u e ⟩$ .

Constraint: if $w a n t v f \neq 0$ , $ldvf \geq ip$ .

(errno $3$ )

On entry, $w a n t b = 0$ and $w a n t v f = 0$ .

Constraint: $w a n t b = 0$ , $w a n t v f = 1$ .

Notes

A linear model has the form:

y = c + X β + ϵ,

where

$y$ is an $n \times 1$ matrix of values of a dependent variable;

$c$ is a scalar intercept term;

$X$ is an $n \times m$ matrix of values of independent variables;

$β$ is an $m \times 1$ matrix of unknown values of parameters;

$ϵ$ is an $n \times 1$ matrix of unknown random errors such that variance of ${ϵ = σ}^{2} I$ .

Let $~ X$ be the mean-centred $X$ and $~ y$ the mean-centred $y$ . Furthermore, $~ X$ is scaled such that the diagonal elements of the cross product matrix ${~ X}^{T} ~ X$ are one. The linear model now takes the form:

~ y = ~ X ~ β + ϵ .

Ridge regression estimates the parameters $~ β$ in a penalised least squares sense by finding the $~ b$ that minimizes

{∥ ∥ ~ X ~ b - ~ y ∥ ∥}^{2} + h {∥ ∥ ~ b ∥ ∥}^{2}, h > 0,

where $∥ \cdot ∥$ denotes the $ℓ_{2}$ -norm and $h$ is a scalar regularization or ridge parameter. For a given value of $h$ , the parameters estimates $~ b$ are found by evaluating

~ b = {({~ X}^{T} ~ X + h I)}^{- 1} {~ X}^{T} ~ y .

Note that if $h = 0$ the ridge regression solution is equivalent to the ordinary least squares solution.

Rather than calculate the inverse of ( ${~ X}^{T} ~ X + h I$ ) directly, ridge uses the singular value decomposition (SVD) of $~ X$ . After decomposing $~ X$ into $U D V^{T}$ where $U$ and $V$ are orthogonal matrices and $D$ is a diagonal matrix, the parameter estimates become

~ b = V {(D^{T} D + h I)}^{- 1} D U^{T} ~ y .

A consequence of introducing the ridge parameter is that the effective number of parameters, $γ$ , in the model is given by the sum of diagonal elements of

D^{T} D {(D^{T} D + h I)}^{- 1},

see Moody (1992) for details.

Any multi-collinearity in the design matrix $X$ may be highlighted by calculating the variance inflation factors for the fitted model. The $j$ th variance inflation factor, $v_{j}$ , is a scaled version of the multiple correlation coefficient between independent variable $j$ and the other independent variables, $R_{j}$ , and is given by

v_{j} = \frac{1}{1 - R_{j}}, j = 1, 2, \dots, m .

The $m$ variance inflation factors are calculated as the diagonal elements of the matrix:

{({~ X}^{T} ~ X + h I)}^{- 1} {~ X}^{T} ~ X {({~ X}^{T} ~ X + h I)}^{- 1},

which, using the SVD of $~ X$ , is equivalent to the diagonal elements of the matrix:

V {(D^{T} D + h I)}^{- 1} D^{T} D {(D^{T} D + h I)}^{- 1} V^{T} .

Given a value of $h$ , any or all of the following prediction criteria are available:

Generalized cross-validation (GCV):

$\frac{n s}{{(n - γ)}^{2}};$
Unbiased estimate of variance (UEV):

$\frac{s}{n - γ};$
Future prediction error (FPE):

$\frac{1}{n} (s + \frac{2 γ s}{n - γ});$
Bayesian information criterion (BIC):

$\frac{1}{n} (s + \frac{log (n) γ s}{n - γ});$
Leave-one-out cross-validation (LOOCV),

where $s$ is the sum of squares of residuals.

Although parameter estimates $~ b$ are calculated by using $~ X$ , it is usual to report the parameter estimates $b$ associated with $X$ . These are calculated from $~ b$ , and the means and scalings of $X$ . Optionally, either $~ b$ or $b$ may be calculated.

References

Hastie, T, Tibshirani, R and Friedman, J, 2003, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer Series in Statistics

Moody, J.E., 1992, The effective number of parameters: An analysis of generalisation and regularisation in nonlinear learning systems, In: Neural Information Processing Systems, (eds J E Moody, S J Hanson, and R P Lippmann), 4, 847–854, Morgan Kaufmann San Mateo CA

NAG and Python

Return to Front

naginterfaces.library.correg.ridge¶