NAG C Library Function Document

nag_lars_xtx (g02mbc) performs Least Angle Regression (LARS), forward stagewise linear regression or Least Absolute Shrinkage and Selection Operator (LASSO) using cross-product matrices.

2

Specification

#include <nag.h>

#include <nagg02.h>

void

nag_lars_xtx (Nag_LARSModelType mtype, Nag_LARSPreProcess pred, Nag_LARSPreProcess intcpt, Integer n, Integer m, const double dtd[], Integer pddtd, const Integer isx[], const double dty[], double yty, Integer mnstep, Integer *ip, Integer *nstep, double b[], Integer pdb, double fitsum[], const double ropt[], Integer lropt, NagError *fail)

3

Description

nag_lars_xtx (g02mbc) implements the LARS algorithm of Efron et al. (2004) as well as the modifications needed to perform forward stagewise linear regression and fit LASSO and positive LASSO models.

Given a vector of

n

observed values,

y = \{y_{i} : i = 1, 2, \dots, n\}

and an

n \times p

design matrix

X

, where the

j

th column of

X

, denoted

x_{j}

, is a vector of length

n

representing the

j

th independent variable

x_{j}

, standardized such that

\sum_{i = 1}^{n} x_{i j} = 0

, and

\sum_{i = 1}^{n} x_{i j}^{2} = 1

and a set of model parameters

β

to be estimated from the observed values, the LARS algorithm can be summarised as:

1.	Set $k = 1$ and all coefficients to zero, that is $β = 0$ .
2.	Find the variable most correlated with $y$ , say $x_{j_{1}}$ . Add $x_{j_{1}}$ to the ‘most correlated’ set $A$ . If $p = 1$ go to 8.
3.	Take the largest possible step in the direction of $x_{j_{1}}$ (i.e., increase the magnitude of $β_{j_{1}}$ ) until some other variable, say $x_{j_{2}}$ , has the same correlation with the current residual, $y - x_{j_{1}} β_{j_{1}}$ .
4.	Increment $k$ and add $x_{j_{k}}$ to $A$ .
5.	If $\|A\| = p$ go to 8.
6.	Proceed in the ‘least angle direction’, that is, the direction which is equiangular between all variables in $A$ , altering the magnitude of the parameter estimates of those variables in $A$ , until the $k$ th variable, $x_{j_{k}}$ , has the same correlation with the current residual.
7.	Go to 4.
8.	Let $K = k$ .

As well as being a model selection process in its own right, with a small number of modifications the LARS algorithm can be used to fit the LASSO model of Tibshirani (1996), a positive LASSO model, where the independent variables enter the model in their defined direction, forward stagewise linear regression (Hastie et al. (2001)) and forward selection (Weisberg (1985)). Details of the required modifications in each of these cases are given in Efron et al. (2004).

The LASSO model of Tibshirani (1996) is given by

\underset{α, β_{k} \in ℝ^{p}}{minimize} {‖y - α - X^{T} β_{k}‖}^{2} subject to {‖β_{k}‖}_{1} \leq t_{k}

for all values of

t_{k}

, where

α = \bar{y} = n^{- 1} \sum_{i = 1}^{n} y_{i}

. The positive LASSO model is the same as the standard LASSO model, given above, with the added constraint that

β_{k j} \geq 0, j = 1, 2, \dots, p .

Unlike the standard LARS algorithm, when fitting either of the LASSO models, variables can be dropped as well as added to the set

A

. Therefore the total number of steps

K

is no longer bounded by

p

Forward stagewise linear regression is an iterative procedure of the form:

Initialize

k = 1

and the vector of residuals

r_{0} = y - α

For each

j = 1, 2, \dots, p

calculate

c_{j} = x_{j}^{T} r_{k - 1}

. The value

c_{j}

is therefore proportional to the correlation between the

j

th independent variable and the vector of previous residual values,

r_{k}

Calculate

j_{k} = \underset{j}{argmax} |c_{j}|

, the value of

j

with the largest absolute value of

c_{j}

|c_{j_{k}}| < ε

then go to 7.

Update the residual values, with

r_{k} = r_{k - 1} + δ ​ ​ sign (c_{j_{k}}) x_{j_{k}}

where

δ

is a small constant and

sign (c_{j_{k}}) = - 1

when

c_{j_{k}} < 0

and

1

otherwise.

Increment

k

and go to 2.

Set

K = k

If the largest possible step were to be taken, that is

δ = |c_{j_{k}}|

then forward stagewise linear regression reverts to the standard forward selection method as implemented in nag_step_regsn (g02eec).

The LARS procedure results in

K

models, one for each step of the fitting process. In order to aid in choosing which is the most suitable Efron et al. (2004) introduced a

C_{p}

-type statistic given by

C_{p}^{(k)} = \frac{{‖y - X^{T} β_{k}‖}^{2}}{σ^{2}} - n + 2 ν_{k},

where

ν_{k}

is the approximate degrees of freedom for the

k

th step and

σ^{2} = \frac{n - y^{T} y}{ν_{K}} .

One way of choosing a model is therefore to take the one with the smallest value of

C_{p}^{(k)}

4

References

Efron B, Hastie T, Johnstone I and Tibshirani R (2004) Least Angle Regression The Annals of Statistics (Volume 32) 2 407–499

Hastie T, Tibshirani R and Friedman J (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction Springer (New York)

Tibshirani R (1996) Regression Shrinkage and Selection via the Lasso Journal of the Royal Statistics Society, Series B (Methodological) (Volume 58) 1 267–288

Weisberg S (1985) Applied Linear Regression Wiley

5

Arguments

1: $mtype$ – Nag_LARSModelTypeInput

On entry: indicates the type of model to fit.

$mtype = Nag_LARS_LAR$: LARS is performed.
$mtype = Nag_LARS_ForwardStagewise$: Forward linear stagewise regression is performed.
$mtype = Nag_LARS_LASSO$: LASSO model is fit.
$mtype = Nag_LARS_PositiveLASSO$: A positive LASSO model is fit.

Constraint:

mtype = Nag_LARS_LAR

Nag_LARS_ForwardStagewise

Nag_LARS_LASSO

Nag_LARS_PositiveLASSO

2: $pred$ – Nag_LARSPreProcessInput

On entry: indicates the type of preprocessing to perform on the cross-products involving the independent variables, i.e., those supplied in dtd and dty.

$pred = Nag_LARS_None$: No preprocessing is performed.
$pred = Nag_LARS_Normalized$: Each independent variable is normalized, with the $j$ th variable scaled by $1 / \sqrt{x_{j}^{T} x_{j}}$ . The scaling factor used by variable $j$ is returned in $b [nstep \times pdb + j - 1]$ .

Constraint:

pred = Nag_LARS_None

Nag_LARS_Normalized

3: $intcpt$ – Nag_LARSPreProcessInput

On entry: indicates the type of data preprocessing that was perform on the dependent variable,

y

, prior to calling this function.

$intcpt = Nag_LARS_None$: No preprocessing was performed.
$intcpt = Nag_LARS_Centered$: The dependent variable, $y$ , was mean centred.

Constraint:

intcpt = Nag_LARS_None

Nag_LARS_Centered

4: $n$ – IntegerInput

On entry:

n

, the number of observations.

Constraint:

n \geq 1

5: $m$ – IntegerInput

On entry:

m

, the total number of independent variables.

Constraint:

m \geq 1

6: $dtd [\dim]$ – const doubleInput

Note: the dimension, dim, of the array dtd must be at least

$pddtd \times (m (m + 1) / 2)$ when $pddtd = 1$ ;
$pddtd \times m$ when .

On entry:

D^{T} D

, the cross-product matrix, which along with isx, defines the design matrix cross-product

X^{T} X

pddtd = 1

dtd [(i \times (i - 1) / 2 + j - 1) \times pddtd]

must contain the cross-product of the

i

th and

j

th variable, for

i = 1, 2, \dots, m

and

j = 1, 2, \dots, m

. That is the cross-product stacked by columns as returned by nag_sum_sqs (g02buc), for example.

Otherwise

dtd [(j - 1) \times pddtd + i - 1]

must contain the cross-product of the

i

th and

j

th variable, for

i = 1, 2, \dots, m

and

j = 1, 2, \dots, m

. It should be noted that, even though

D^{T} D

is symmetric, the full matrix must be supplied.

The matrix specified in dtd must be a valid cross-products matrix.

7: $pddtd$ – IntegerInput

On entry: the stride separating row elements in the two-dimensional data stored in the array dtd.

Constraint:

pddtd = 1 or pddtd \geq m

8: $isx [\dim]$ – const IntegerInput

Note: the dimension, dim, of the array isx must be at least

$m$ , when $isx is not NULL$ .

On entry: indicates which independent variables from dtd will be included in the design matrix,

X

If isx is NULL, all variables are included in the design matrix.

Otherwise, for

j = 1, 2, \dots, m

when

isx [j - 1]

must be set as follows:

$isx [j - 1] = 1$: To indicate that the $j$ th variable, as supplied in dtd, is included in the design matrix;
$isx [j - 1] = 0$: To indicate that the $j$ th variable, as supplied in dtd, is not included in the design matrix;

and

p = \sum_{j = 1}^{m} isx [j - 1]

Constraint:

isx [j - 1] = 0

1

and at least one value of

isx [j - 1] \neq 0

, for

j = 1, 2, \dots, m

9: $dty [m]$ – const doubleInput

On entry:

D^{T} y

, the cross-product between the dependent variable,

y

, and the independent variables

D

10: $yty$ – doubleInput

On entry:

y^{T} y

, the sums of squares of the dependent variable.

Constraint:

yty > 0.0

11: $mnstep$ – IntegerInput

On entry: the maximum number of steps to carry out in the model fitting process.

mtype = Nag_LARS_LAR

, the maximum number of steps the algorithm will take is

\min (p, n)

intcpt = Nag_LARS_None

, otherwise

\min (p, n - 1)

mtype = Nag_LARS_ForwardStagewise

, the maximum number of steps the algorithm will take is likely to be several orders of magnitude more and is no longer bound by

p

n

mtype = Nag_LARS_LASSO

Nag_LARS_PositiveLASSO

, the maximum number of steps the algorithm will take lies somewhere between that of the LARS and forward linear stagewise regression, again it is no longer bound by

p

n

Constraint:

mnstep \geq 1

12: $ip$ – Integer *Output

On exit:

p

, number of parameter estimates.

If isx is NULL,

p = m

, i.e., the number of variables in dtd.

Otherwise

p

is the number of nonzero values in isx.

13: $nstep$ – Integer *Output

On exit:

K

, the actual number of steps carried out in the model fitting process.

14: $b [\dim]$ – doubleOutput

Note: the dimension, dim, of the array b must be at least

pdb \times (mnstep + 1)

On exit:

β

the parameter estimates, with

b [(k - 1) \times pdb + j - 1] = β_{k j}

, the parameter estimate for the

j

th variable,

j = 1, 2, \dots, p

at the

k

th step of the model fitting process,

k = 1, 2, \dots, nstep

By default, when

pred = Nag_LARS_Normalized

the parameter estimates are rescaled prior to being returned. If the parameter estimates are required on the normalized scale, then this can be overridden via ropt.

The values held in the remaining part of b depend on the type of preprocessing performed.

\begin{array}{l} If ​ pred = Nag_LARS_None & b [nstep \times pdb + j - 1] & = & 1, \\ if ​ pred = Nag_LARS_Normalized & b [nstep \times pdb + j - 1] & = & 1 / \sqrt{x_{j}^{T} x_{j}}, \end{array}

for

j = 1, 2, \dots p

15: $pdb$ – IntegerInput

On entry: the stride separating row elements in the two-dimensional data stored in the array b.

Constraint:

pdb \geq p

, where

p

is the number of parameter estimates as described in ip.

16: $fitsum [6 \times (mnstep + 1)]$ – doubleOutput

On exit: summaries of the model fitting process. When

k = 1, 2, \dots, nstep

$fitsum [(k - 1) \times 6]$: ${‖β_{k}‖}_{1}$ , the sum of the absolute values of the parameter estimates for the $k$ th step of the modelling fitting process. If $pred = Nag_LARS_Normalized$ , the scaled parameter estimates are used in the summation.
$fitsum [(k - 1) \times 6 + 1]$: ${RSS}_{k}$ , the residual sums of squares for the $k$ th step, where ${RSS}_{k} = {‖y - X^{T} β_{k}‖}^{2}$ .
$fitsum [(k - 1) \times 6 + 2]$: $ν_{k}$ , approximate degrees of freedom for the $k$ th step.
$fitsum [(k - 1) \times 6 + 3]$: $C_{p}^{(k)}$ , a $C_{p}$ -type statistic for the $k$ th step, where $C_{p}^{(k)} = \frac{{RSS}_{k}}{σ^{2}} - n + 2 ν_{k}$ .
$fitsum [(k - 1) \times 6 + 4]$: ${\hat{C}}_{k}$ , correlation between the residual at step $k - 1$ and the most correlated variable not yet in the active set $A$ , where the residual at step $0$ is $y$ .
$fitsum [(k - 1) \times 6 + 5]$: ${\hat{γ}}_{k}$ , the step size used at step $k$ .

In addition

$fitsum [nstep \times 6]$: $0$ .
$fitsum [nstep \times 6 + 1]$: ${RSS}_{0}$ , the residual sums of squares for the null model, where ${RSS}_{0} = y^{T} y$ .
$fitsum [nstep \times 6 + 2]$: $ν_{0}$ , the degrees of freedom for the null model, where $ν_{0} = 0$ if $intcpt = Nag_LARS_None$ and $ν_{0} = 1$ otherwise.
$fitsum [nstep \times 6 + 3]$: $C_{p}^{(0)}$ , a $C_{p}$ -type statistic for the null model, where $C_{p}^{(0)} = \frac{{RSS}_{0}}{σ^{2}} - n + 2 ν_{0}$ .
$fitsum [nstep \times 6 + 4]$: $σ^{2}$ , where $σ^{2} = \frac{n - {RSS}_{K}}{ν_{K}}$ and $K = nstep$ .

Although the

C_{p}

statistics described above are returned when

fail . code =

NW_LIMIT_REACHED they may not be meaningful due to the estimate

σ^{2}

not being based on the saturated model.

17: $ropt [lropt]$ – const doubleInput

On entry: optional parameters to control various aspects of the LARS algorithm.

The default value will be used for

ropt [i - 1]

lropt < i

, therefore setting

lropt = 0

will use the default values for all optional arguments and ropt need not be set and may be NULL. The default value will also be used if an invalid value is supplied for a particular argument, for example, setting

ropt [i - 1] = - 1

will use the default value for argument

i

$ropt [0]$: The minimum step size that will be taken.
Default is $100 \times eps$ is used, where $eps$ is the machine precision returned by nag_machine_precision (X02AJC).
$ropt [1]$: General tolerance, used amongst other things, for comparing correlations.
Default is $ropt [0]$ .
$ropt [2]$: If set to $1$ then parameter estimates are rescaled before being returned. If set to $0$ then no rescaling is performed. This argument has no effect when $pred = Nag_LARS_None$ .
Default is for the parameter estimates to be rescaled.

Constraints:

$ropt [0] > machine precision$ ;
$ropt [1] > machine precision$ .

18: $lropt$ – IntegerInput

On entry: length of the options array ropt.

Constraint:

0 \leq lropt \leq 3

19: $fail$ – NagError *Input/Output

The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6

Error Indicators and Warnings

NE_ALLOC_FAIL: Dynamic memory allocation failed.
See Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
NE_ARRAY_SIZE: On entry, $lropt = 〈value〉$ .
Constraint: $0 \leq lropt \leq 3$ .

On entry, $pdb = 〈value〉$ and $m = 〈value〉$ .
Constraint: if isx is NULL then $pdb \geq$ or $m$ .

On entry, $pdb = 〈value〉$ and $p = 〈value〉$ .
Constraint: if isx is not NULL, $pdb \geq p$ .

On entry, $pddtd = 〈value〉$ and $m = 〈value〉$
Constraint: $pddtd = 1 or pddtd \geq m$ .
NE_BAD_PARAM: On entry, argument $〈value〉$ had an illegal value.
NE_INT: On entry, $m = 〈value〉$ .
Constraint: $m \geq 1$ .

On entry, $n = 〈value〉$ .
Constraint: $n \geq 1$ .
NE_INT_ARRAY: On entry, all values of isx are zero.
Constraint: at least one value of isx must be nonzero.

On entry, $isx [〈value〉] = 〈value〉$ .
Constraint: $isx [i] = 0$ or $1$ , for all $i$ .

On entry, $isx [〈value〉] = 〈value〉$ .
Constraint: $isx [j - 1] = 0$ or $1$ and at least one value of $isx [j - 1] \neq 0$ , for $j = 1, 2, \dots, m$ .
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
NE_MAX_STEP: On entry, $mnstep = 〈value〉$ .
Constraint: $mnstep \geq 1$ .
NE_NEG_ELEMENT: On entry, $dtd [〈value〉 \times pddtd] = 〈value〉$ .
Constraint: diagonal elements of $D^{T} D$ must be positive.

On entry, $i = 〈value〉$ and $dtd [(i - 1) \times pddtd + i - 1] = 〈value〉$ .
Constraint: diagonal elements of $D^{T} D$ must be positive.
NE_NEG_SX: A negative value for the residual sums of squares was obtained. Check the values of dtd, dty and yty.
NE_NO_LICENCE: Your licence key may have expired or may not have been installed correctly.
See Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
NE_REAL: On entry, $yty = 〈value〉$ .
Constraint: $yty > 0.0$ .
NE_SYMM_MATRIX: The cross-product matrix supplied in dtd is not symmetric.
NW_LIMIT_REACHED: Fitting process did not finished in mnstep steps. Try increasing the size of mnstep and supplying larger output arrays.
All output is returned as documented, up to step mnstep, however, $σ$ and the $C_{p}$ statistics may not be meaningful.
NW_OVERFLOW_WARN: $ν_{K} = n$ , therefore sigma has been set to a large value. Output is returned as documented.

$σ^{2}$ is approximately zero and hence the $C_{p}$ -type criterion cannot be calculated. All other output is returned as documented.
NW_POTENTIAL_PROBLEM: Degenerate model, no variables added and $nstep = 0$ . Output is returned as documented.

7

Accuracy

Not applicable.

8

Further Comments

The solution path to the LARS, LASSO and stagewise regression analysis is a continuous, piecewise linear. nag_lars_xtx (g02mbc) returns the parameter estimates at various points along this path. nag_lars_param (g02mcc) can be used to obtain estimates at different points along the path.

If you have the raw data values, that is

D

and

y

, then nag_lars (g02mac) can be used instead of nag_lars_xtx (g02mbc).

9

Example

This example performs a LARS on a simulated dataset with

20

observations and

6

independent variables.

The example uses nag_sum_sqs (g02buc) to get the cross-products of the augmented matrix

[D y]

. The first

m (m + 1) / 2

elements of the (column packed) cross-products matrix returned therefore contain the elements of

D^{T} D

, the next

m

elements contain

D^{T} y

and the last element

y^{T} y

This example plot shows the regression coefficients (

β_{k}

) plotted against the scaled absolute sum of the parameter estimates (

{‖β_{k}‖}_{1}

NAG C Library Manual, Mark 26

NAG Library Manual, Mark 26

NAG AD Library Manual, Mark 26

g02 Chapter Contents

g02 Chapter Introduction

NAG C Library Function Document

nag_lars_xtx (g02mbc)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Further Comments

9 Example

9.1 Program Text

9.2 Program Data

9.3 Program Results

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Further Comments

9

Example

9.1

Program Text

9.2

Program Data

9.3

Program Results