naginterfaces.library.correg.lars_xtx¶

naginterfaces.library.correg.lars_xtx(mtype, n, dtd, dty, yty, pred=2, intcpt=1, isx=None, mnstep=None, ropt=None, io_manager=None)[source]¶

lars_xtx performs Least Angle Regression (LARS), forward stagewise linear regression or Least Absolute Shrinkage and Selection Operator (LASSO) using cross-product matrices.

For full information please refer to the NAG Library document for g02mb

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g02/g02mbf.html

Parameters

mtypeint

Indicates the type of model to fit.

$m t y p e = 1$

LARS is performed.

$m t y p e = 2$

Forward linear stagewise regression is performed.

$m t y p e = 3$

LASSO model is fit.

$m t y p e = 4$

A positive LASSO model is fit.

nint

$n$ , the number of observations.

dtdfloat, array-like, shape $(:, :)$

Note: the required extent for this argument in dimension 1 is determined as follows: if $d t d . s h a p e [0] = 1$ : $1$ ; if $d t d . s h a p e [1] = 1$ : $m \times (m + 1) / 2$ ; otherwise: $m$ .

Note: the required extent for this argument in dimension 2 is determined as follows: if $d t d . s h a p e [0] = 1$ : $m \times (m + 1) / 2$ ; if $d t d . s h a p e [1] = 1$ : $1$ ; otherwise: $m$ .

$D^{T} D$ , the cross-product matrix, which along with $i s x$ , defines the design matrix cross-product $X^{T} X$ .

If the cross-product matrix is packed into a single row or column of $d t d$ , $d t d [0, i \times (i - 1) / 2 + j - 1]$ or $d t d [i \times (i - 1) / 2 + j - 1, 0]$ must contain the cross-product of the $i$ th and $j$ th variable, for $j = 1, 2, \dots, m$ , for $i = 1, 2, \dots, m$ .

That is the cross-product stacked by columns as returned by ssqmat(), for example.

Otherwise $d t d [i - 1, j - 1]$ must contain the cross-product of the $i$ th and $j$ th variable, for $j = 1, 2, \dots, m$ , for $i = 1, 2, \dots, m$ .

It should be noted that, even though $D^{T} D$ is symmetric, the full matrix must be supplied.

The matrix specified in $d t d$ must be a valid cross-products matrix.

dtyfloat, array-like, shape $(m)$

$D^{T} y$ , the cross-product between the dependent variable, $y$ , and the independent variables $D$ .

ytyfloat

$y^{T} y$ , the sums of squares of the dependent variable.

predint, optional

Indicates the type of preprocessing to perform on the cross-products involving the independent variables, i.e., those supplied in $d t d$ and $d t y$ .

$p r e d = 0$

No preprocessing is performed.

$p r e d = 2$

Each independent variable is normalized, with the $j$ th variable scaled by $1 / \sqrt{x_{j}^{T} x_{j}}$ . The scaling factor used by variable $j$ is returned in $b [j - 1, n s t e p]$ .

intcptint, optional

Indicates the type of data preprocessing that was perform on the dependent variable, $y$ , prior to calling this function.

$i n t c p t = 0$

No preprocessing was performed.

$i n t c p t = 1$

The dependent variable, $y$ , was mean centred.

isxNone or int, array-like, shape $(lisx)$ , optional

Indicates which independent variables from $d t d$ will be included in the design matrix, $X$ .

If $i s x$ is None, all variables are included in the design matrix.

Otherwise $i s x [j - 1]$ must be set as follows, for $j = 1, 2, \dots, m$ :

$i s x [j - 1] = 1$

To indicate that the $j$ th variable, as supplied in $d t d$ , is included in the design matrix;

$i s x [j - 1] = 0$

To indicate that the $j$ th variable, as supplied in $d t d$ , is not included in the design matrix;

and $p = \sum_{1}^{m} i s x [j - 1]$ .

mnstepNone or int, optional

Note: if this argument is None then a default value will be used, determined as follows: if $m t y p e = 1$ : $m$ ; otherwise: $200 \times m$ .

The maximum number of steps to carry out in the model fitting process.

If $m t y p e = 1$ , i.e., a LARS is being performed, the maximum number of steps the algorithm will take is $m i n (p, n)$ if $i n t c p t = 0$ , otherwise $m i n (p, n - 1)$ .

If $m t y p e = 2$ , i.e., a forward linear stagewise regression is being performed, the maximum number of steps the algorithm will take is likely to be several orders of magnitude more and is no longer bound by $p$ or $n$ .

If $m t y p e = 3$ or $4$ , i.e., a LASSO or positive LASSO model is being fit, the maximum number of steps the algorithm will take lies somewhere between that of the LARS and forward linear stagewise regression, again it is no longer bound by $p$ or $n$ .

roptNone or float, array-like, shape $(lropt)$ , optional

Options to control various aspects of the LARS algorithm.

The default value will be used for $r o p t [i - 1]$ if $lropt < i$ , therefore, setting $lropt = 0$ will use the default values for all options and $r o p t$ need not be set and may be None.

The default value will also be used if an invalid value is supplied for a particular argument, for example, setting $r o p t [i - 1] = - 1$ will use the default value for argument $i$ .

$r o p t [0]$

The minimum step size that will be taken.

Default is $100 \times eps$ is used, where $eps$ is the machine precision returned by machine.precision.

$r o p t [1]$

General tolerance, used amongst other things, for comparing correlations.

Default is $r o p t [0]$ .

$r o p t [2]$

If set to $1$ , parameter estimates are rescaled before being returned. If set to $0$ , no rescaling is performed. This argument has no effect when $p r e d = 0$ .

Default is for the parameter estimates to be rescaled.

io_managerFileObjManager, optional

Manager for I/O in this routine.

Returns

ipint

$p$ , number of parameter estimates.

If $i s x$ is None, $p = m$ , i.e., the number of variables in $d t d$ .

Otherwise $p$ is the number of nonzero values in $i s x$ .

nstepint

$K$ , the actual number of steps carried out in the model fitting process.

bfloat, ndarray, shape $(i p, n s t e p + 1)$

$β$ the parameter estimates, with $b [j - 1, k - 1] = β_{k j}$ , the parameter estimate for the $j$ th variable, $j = 1, 2, \dots, p$ at the $k$ th step of the model fitting process, $k = 1, 2, \dots, n s t e p$ .

By default, when $p r e d = 2$ the parameter estimates are rescaled prior to being returned.

If the parameter estimates are required on the normalized scale, then this can be overridden via $r o p t$ .

The values held in the remaining part of $b$ depend on the type of preprocessing performed.

$\begin{matrix} If p r e d = 0 & b [j - 1, n s t e p] & = & 1, if p r e d = 2 & b [j - 1, n s t e p] & = & 1 / \sqrt{x_{j}^{T} x_{j}}, \end{matrix}$

for $j = 1, 2, \dots p$ .

fitsumfloat, ndarray, shape $(6, m n s t e p + 1)$

Summaries of the model fitting process. When $k = 1, 2, \dots, n s t e p$

$f i t s u m [0, k - 1]$

${∥ β_{k} ∥}_{1}$ , the sum of the absolute values of the parameter estimates for the $k$ th step of the modelling fitting process. If $p r e d = 2$ , the scaled parameter estimates are used in the summation.

$f i t s u m [1, k - 1]$

${RSS}_{k}$ , the residual sums of squares for the $k$ th step, where ${RSS}_{k} = {∥ ∥ y - X^{T} β_{k} ∥ ∥}_{k}^{2}$ .

$f i t s u m [2, k - 1]$

$ν_{k}$ , approximate degrees of freedom for the $k$ th step.

$f i t s u m [3, k - 1]$

$C_{p}^{(k)}$ , a $C_{p}$ -type statistic for the $k$ th step, where $C_{p}^{(k)} = \frac{{RSS}_{k}}{σ^{2}} - n + 2 ν_{k}$ .

$f i t s u m [4, k - 1]$

${^C}_{k}$ , correlation between the residual at step $k - 1$ and the most correlated variable not yet in the active set $A$ , where the residual at step $0$ is $y$ .

$f i t s u m [5, k - 1]$

${^γ}_{k}$ , the step size used at step $k$ .

In addition

$f i t s u m [0, n s t e p]$

$0$ .

$f i t s u m [1, n s t e p]$

${RSS}_{0}$ , the residual sums of squares for the null model, where ${RSS}_{0} = y^{T} y$ .

$f i t s u m [2, n s t e p]$

$ν_{0}$ , the degrees of freedom for the null model, where $ν_{0} = 0$ if $i n t c p t = 0$ and $ν_{0} = 1$ otherwise.

$f i t s u m [3, n s t e p]$

$C_{p}^{(0)}$ , a $C_{p}$ -type statistic for the null model, where $C_{p}^{(0)} = \frac{{RSS}_{0}}{σ^{2}} - n + 2 ν_{0}$ .

$f i t s u m [4, n s t e p]$

$σ^{2}$ , where $σ^{2} = \frac{n - {RSS}_{K}}{ν_{K}}$ and $K = n s t e p$ .

Although the $C_{p}$ statistics described above are returned when $e r r n o$ = 122 they may not be meaningful due to the estimate $σ^{2}$ not being based on the saturated model.

Raises

NagValueError

(errno $11$ )

On entry, $m t y p e = ⟨ v a l u e ⟩$ .

Constraint: $m t y p e = 1$ , $2$ , $3$ or $4$ .

(errno $21$ )

On entry, $p r e d = ⟨ v a l u e ⟩$ .

Constraint: $p r e d = 0$ or $2$ .

(errno $31$ )

On entry, $i n t c p t = ⟨ v a l u e ⟩$ .

Constraint: $i n t c p t = 0$ or $1$ .

(errno $41$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 1$ .

(errno $51$ )

On entry, $m = ⟨ v a l u e ⟩$ .

Constraint: $m \geq 1$ .

(errno $61$ )

The cross-product matrix supplied in $d t d$ is not symmetric.

(errno $62$ )

On entry, $d t d [0, ⟨ v a l u e ⟩] = ⟨ v a l u e ⟩$ .

Constraint: diagonal elements of $D^{T} D$ must be positive.

(errno $62$ )

On entry, $i = ⟨ v a l u e ⟩$ and $d t d [i - 1, i - 1] = ⟨ v a l u e ⟩$ .

Constraint: diagonal elements of $D^{T} D$ must be positive.

(errno $81$ )

On entry, $i s x [⟨ v a l u e ⟩] = ⟨ v a l u e ⟩$ .

Constraint: $i s x [i] = 0$ or $1$ , for all $i$ .

(errno $82$ )

On entry, all values of $i s x$ are zero.

Constraint: at least one value of $i s x$ must be nonzero.

(errno $91$ )

On entry, $lisx = ⟨ v a l u e ⟩$ and $m = ⟨ v a l u e ⟩$ .

Constraint: $lisx = 0$ or $m$ .

(errno $111$ )

On entry, $y t y = ⟨ v a l u e ⟩$ .

Constraint: $y t y > 0.0$ .

(errno $112$ )

A negative value for the residual sums of squares was obtained. Check the values of $d t d$ , $d t y$ and $y t y$ .

(errno $121$ )

On entry, $m n s t e p = ⟨ v a l u e ⟩$ .

Constraint: $m n s t e p \geq 1$ .

(errno $191$ )

On entry, $lropt = ⟨ v a l u e ⟩$ .

Constraint: $0 \leq lropt \leq 3$ .

Warns

NagAlgorithmicWarning

(errno $122$ )

Fitting process did not finished in $m n s t e p$ steps. Try increasing the size of $m n s t e p$ and supplying larger output arrays.

All output is returned as documented, up to step $m n s t e p$ , however, $σ$ and the $C_{p}$ statistics may not be meaningful.

(errno $171$ )

$σ^{2}$ is approximately zero and hence the $C_{p}$ -type criterion cannot be calculated. All other output is returned as documented.

(errno $172$ )

$ν_{K} = n$ , therefore, sigma has been set to a large value. Output is returned as documented.

(errno $173$ )

Degenerate model, no variables added and $n s t e p = 0$ . Output is returned as documented.

Notes

lars_xtx implements the LARS algorithm of Efron et al. (2004) as well as the modifications needed to perform forward stagewise linear regression and fit LASSO and positive LASSO models.

Given a vector of $n$ observed values, $y = {y_{i} : i = 1, 2, \dots, n}$ and an $n \times p$ design matrix $X$ , where the $j$ th column of $X$ , denoted $x_{j}$ , is a vector of length $n$ representing the $j$ th independent variable $x_{j}$ , standardized such that $\sum_{1}^{n} x_{i j} = 0$ , and $\sum_{1}^{n} x_{i j}^{2} = 1$ and a set of model parameters $β$ to be estimated from the observed values, the LARS algorithm can be summarised as:

Set $k = 1$ and all coefficients to zero, that is $β = 0$ .
Find the variable most correlated with $y$ , say $x_{j_{1}}$ . Add $x_{j_{1}}$ to the ‘most correlated’ set $A$ . If $p = 1$ go to (8).
Take the largest possible step in the direction of $x_{j_{1}}$ (i.e., increase the magnitude of $β_{j_{1}}$ ) until some other variable, say $x_{j_{2}}$ , has the same correlation with the current residual, $y - x_{j_{1}} β_{j_{1}}$ .
Increment $k$ and add $x_{j_{k}}$ to $A$ .
If $| A | = p$ go to (8).
Proceed in the ‘least angle direction’, that is, the direction which is equiangular between all variables in $A$ , altering the magnitude of the parameter estimates of those variables in $A$ , until the $k$ th variable, $x_{j_{k}}$ , has the same correlation with the current residual.
Go to (4).
Let $K = k$ .

As well as being a model selection process in its own right, with a small number of modifications the LARS algorithm can be used to fit the LASSO model of Tibshirani (1996), a positive LASSO model, where the independent variables enter the model in their defined direction, forward stagewise linear regression (Hastie et al. (2001)) and forward selection (Weisberg (1985)). Details of the required modifications in each of these cases are given in Efron et al. (2004).

The LASSO model of Tibshirani (1996) is given by

{minimize}_{α, β_{k} \in R^{p}} {∥ ∥ y - α - X^{T} β_{k} ∥ ∥}_{k}^{2} subject to {∥ β_{k} ∥}_{1} \leq t_{k}

for all values of $t_{k}$ , where $α = ¯ y = n^{- 1} \sum_{1}^{n} y_{i}$ . The positive LASSO model is the same as the standard LASSO model, given above, with the added constraint that

β_{k j} \geq 0, j = 1, 2, \dots, p .

Unlike the standard LARS algorithm, when fitting either of the LASSO models, variables can be dropped as well as added to the set $A$ . Therefore, the total number of steps $K$ is no longer bounded by $p$ .

Forward stagewise linear regression is an iterative procedure of the form:

Initialize $k = 1$ and the vector of residuals $r_{0} = y - α$ .
For each $j = 1, 2, \dots, p$ calculate $c_{j} = x_{j}^{T} r_{k - 1}$ . The value $c_{j}$ is, therefore, proportional to the correlation between the $j$ th independent variable and the vector of previous residual values, $r_{k}$ .
Calculate $j_{k} = {argmax}_{j} ∣ ∣ c_{j} ∣ ∣$ , the value of $j$ with the largest absolute value of $c_{j}$ .
If $∣ ∣ c_{j_{k}} ∣ ∣ < ϵ$ then go to (7).
Update the residual values, with

$r_{k} = r_{k - 1} + δ s i g n (c_{j_{k}}) x_{j_{k}}$

where $δ$ is a small constant and $s i g n (c_{j_{k}}) = - 1$ when $c_{j_{k}} < 0$ and $1$ otherwise.
Increment $k$ and go to (2).
Set $K = k$ .

If the largest possible step were to be taken, that is $δ = ∣ ∣ c_{j_{k}} ∣ ∣$ then forward stagewise linear regression reverts to the standard forward selection method as implemented in linregm_fit_onestep().

The LARS procedure results in $K$ models, one for each step of the fitting process. In order to aid in choosing which is the most suitable Efron et al. (2004) introduced a $C_{p}$ -type statistic given by

C_{p}^{(k)} = \frac{{∥ ∥ y - X^{T} β_{k} ∥ ∥}_{k}^{2}}{σ^{2}} - n + 2 ν_{k},

where $ν_{k}$ is the approximate degrees of freedom for the $k$ th step and

σ^{2} = \frac{n - y^{T} y}{ν_{K}} .

One way of choosing a model is, therefore, to take the one with the smallest value of $C_{p}^{(k)}$ .

References

Efron, B, Hastie, T, Johnstone, I and Tibshirani, R, 2004, Least Angle Regression, The Annals of Statistics (Volume 32) (2), 407–499

Hastie, T, Tibshirani, R and Friedman, J, 2001, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer (New York)

Tibshirani, R, 1996, Regression Shrinkage and Selection via the Lasso, Journal of the Royal Statistics Society, Series B (Methodological) (Volume 58) (1), 267–288

Weisberg, S, 1985, Applied Linear Regression, Wiley

NAG and Python

Return to Front

naginterfaces.library.correg.lars_xtx¶

naginterfaces.library.correg.lars_​xtx¶

naginterfaces.library.correg.lars_xtx¶