NAG CPP Interface
nagcpp::correg::lars (g02ma)

1 Purpose

lars performs Least Angle Regression (LARS), forward stagewise linear regression or Least Absolute Shrinkage and Selection Operator (LASSO).

2 Specification

#include "g02/nagcpp_g02ma.hpp"
template <typename D, typename ISX, typename Y, typename B, typename FITSUM>

void function lars(const types::f77_integer mtype, const D &d, const ISX &isx, const Y &y, types::f77_integer &ip, types::f77_integer &nstep, B &&b, FITSUM &&fitsum, OptionalG02MA opt)
template <typename D, typename ISX, typename Y, typename B, typename FITSUM>

void function lars(const types::f77_integer mtype, const D &d, const ISX &isx, const Y &y, types::f77_integer &ip, types::f77_integer &nstep, B &&b, FITSUM &&fitsum)

3 Description

lars implements the LARS algorithm of Efron et al. (2004) as well as the modifications needed to perform forward stagewise linear regression and fit LASSO and positive LASSO models.
Given a vector of n observed values, y = {yi:i=1,2,,n} and an n×p design matrix X, where the jth column of X, denoted xj, is a vector of length n representing the jth independent variable xj, standardized such that i=1 n xij =0 , and i=1 n xij2 =1 and a set of model parameters β to be estimated from the observed values, the LARS algorithm can be summarised as:
  1. 1.Set k=1 and all coefficients to zero, that is β=0.
  2. 2.Find the variable most correlated with y, say xj1. Add xj1 to the ‘most correlated’ set A. If p=1 go to 8.
  3. 3.Take the largest possible step in the direction of xj1 (i.e., increase the magnitude of βj1) until some other variable, say xj2, has the same correlation with the current residual, y-xj1βj1.
  4. 4.Increment k and add xjk to A.
  5. 5.If |A|=p go to 8.
  6. 6.Proceed in the ‘least angle direction’, that is, the direction which is equiangular between all variables in A, altering the magnitude of the parameter estimates of those variables in A, until the kth variable, xjk, has the same correlation with the current residual.
  7. 7.Go to 4.
  8. 8.Let K=k.
As well as being a model selection process in its own right, with a small number of modifications the LARS algorithm can be used to fit the LASSO model of Tibshirani (1996), a positive LASSO model, where the independent variables enter the model in their defined direction (i.e., βkj0), forward stagewise linear regression (Hastie et al. (2001)) and forward selection (Weisberg (1985)). Details of the required modifications in each of these cases are given in Efron et al. (2004).
The LASSO model of Tibshirani (1996) is given by
minimize α , βk p y-α-XTβk 2   subject to   βk1 tk  
for all values of tk, where α=y¯=n-1 i=1 n yi. The positive LASSO model is the same as the standard LASSO model, given above, with the added constraint that
βkj 0 ,   j=1,2,,p .  
Unlike the standard LARS algorithm, when fitting either of the LASSO models, variables can be dropped as well as added to the set A. Therefore the total number of steps K is no longer bounded by p.
Forward stagewise linear regression is an iterative procedure of the form:
  1. 1.Initialize k=1 and the vector of residuals r0=y-α.
  2. 2.For each j=1,2,,p calculate cj=xjTrk-1. The value cj is therefore proportional to the correlation between the jth independent variable and the vector of previous residual values, rk.
  3. 3.Calculate jk = argmax j |cj| , the value of j with the largest absolute value of cj.
  4. 4.If |cjk|<ε then go to 7.
  5. 5.Update the residual values, with
    rk = rk-1 + δ ​ ​ sign(cjk) xjk  
    where δ is a small constant and sign(cjk)=-1 when cjk<0 and 1 otherwise.
  6. 6.Increment k and go to 2.
  7. 7.Set K=k.
If the largest possible step were to be taken, that is δ=|cjk| then forward stagewise linear regression reverts to the standard forward selection method as implemented in g02eef (no CPP interface).
The LARS procedure results in K models, one for each step of the fitting process. In order to aid in choosing which is the most suitable Efron et al. (2004) introduced a Cp-type statistic given by
Cp(k) = y-XTβk 2 σ2 -n+2νk,  
where νk is the approximate degrees of freedom for the kth step and
σ2 = n-yTyνK .  
One way of choosing a model is therefore to take the one with the smallest value of Cp(k).

4 References

Efron B, Hastie T, Johnstone I and Tibshirani R (2004) Least Angle Regression The Annals of Statistics (Volume 32) 2 407–499
Hastie T, Tibshirani R and Friedman J (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction Springer (New York)
Tibshirani R (1996) Regression Shrinkage and Selection via the Lasso Journal of the Royal Statistics Society, Series B (Methodological) (Volume 58) 1 267–288
Weisberg S (1985) Applied Linear Regression Wiley

5 Arguments

1: mtype types::f77_integer Input
On entry: indicates the type of model to fit.
mtype=1
LARS is performed.
mtype=2
Forward linear stagewise regression is performed.
mtype=3
LASSO model is fit.
mtype=4
A positive LASSO model is fit.
Constraint: mtype=1, 2, 3 or 4.
2: d(n,m) double array Input
On entry: D, the data, which along with pred and isx, defines the design matrix X. The ith observation for the jth variable must be supplied in d(i-1,j-1), for i=1,2,,n and j=1,2,,m.
3: isx(lisx) types::f77_integer array Input
On entry: indicates which independent variables from d will be included in the design matrix, X.
If isx is nullptr, all variables are included in the design matrix.
Otherwiseisx(j-1) must be set as follows, for j=1,2,,m:
isx(j-1)=1
To indicate that the jth variable, as supplied in d, is included in the design matrix;
isx(j-1)=0
To indicated that the jth variable, as supplied in d, is not included in the design matrix;
and p= j=1 m isx(j-1).
Constraint: if lisx=m, isx(j-1)=0 or 1 and at least one value of isx(j-1)0, for j=1,2,,m.
4: y(n) double array Input
On entry: y, the observations on the dependent variable.
5: ip types::f77_integer Output
On exit: p, number of parameter estimates.
If isx is nullptr, p=m, i.e., the number of variables in d.
Otherwise p is the number of nonzero values in isx.
6: nstep types::f77_integer Output
On exit: K, the actual number of steps carried out in the model fitting process.
7: b(vl_p,mnstep+2) double array Output
On exit: β the parameter estimates, with b(j-1,k-1)=βkj, the parameter estimate for the jth variable, j=1,2,,p at the kth step of the model fitting process, k=1,2,,nstep.
By default, when pred=2 or 3 the parameter estimates are rescaled prior to being returned. If the parameter estimates are required on the normalized scale, then this can be overridden via ropt.
The values held in the remaining part of b depend on the type of preprocessing performed.
If pred=0,
b(j-1,nstep) = 1 b(j-1,nstep+1) = 0
If pred=1,
b(j-1,nstep) = 1 b(j-1,nstep+1) = x¯j
If pred=2,
b(j-1,nstep) = 1/ xjTxj b(j-1,nstep+1) = 0
If pred=3,
b(j-1,nstep) = 1/ (xj-x¯j) T (xj-x¯j) b(j-1,nstep+1) = x¯j
for j=1,2,,p.
8: fitsum(6,mnstep+1) double array Output
On exit: summaries of the model fitting process. When k=1,2,,nstep,
fitsum(0,k-1)
βk1, the sum of the absolute values of the parameter estimates for the kth step of the modelling fitting process. If pred=2 or 3, the scaled parameter estimates are used in the summation.
fitsum(1,k-1)
RSSk, the residual sums of squares for the kth step, where RSSk= y-XTβk 2 .
fitsum(2,k-1)
νk, approximate degrees of freedom for the kth step.
fitsum(3,k-1)
Cp(k), a Cp-type statistic for the kth step, where Cp(k)=RSSkσ2-n+2νk.
fitsum(4,k-1)
C^k, correlation between the residual at step k-1 and the most correlated variable not yet in the active set A, where the residual at step 0 is y.
fitsum(5,k-1)
γ^k, the step size used at step k.
In addition
fitsum(0,nstep)
α, with α=y¯ if prey=1 and 0 otherwise.
fitsum(1,nstep)
RSS0, the residual sums of squares for the null model, where RSS0=yTy when prey=0 and RSS0=(y-y¯)T(y-y¯) otherwise.
fitsum(2,nstep)
ν0, the degrees of freedom for the null model, where ν0=0 if prey=0 and ν0=1 otherwise.
fitsum(3,nstep)
Cp(0), a Cp-type statistic for the null model, where Cp(0)=RSS0σ2-n+2ν0.
fitsum(4,nstep)
σ2, where σ2=n-RSSKνK and K=nstep.
Although the Cp statistics described above are returned when errorid=112 they may not be meaningful due to the estimate σ2 not being based on the saturated model.
9: opt OptionalG02MA Input/Output
Optional parameter container, derived from Optional.
Container for:
predtypes::f77_integer
This optional parameter may be set using the method OptionalG02MA::pred and accessed via OptionalG02MA::get_pred.
Default: 3
On entry: indicates the type of data preprocessing to perform on the independent variables supplied in d to comply with the standardized form of the design matrix.
pred=0
No preprocessing is performed.
pred=1
Each of the independent variables, xj, for j=1,2,,p, are mean centred prior to fitting the model. The means of the independent variables, x¯, are returned in b, with x¯j=b(j-1,nstep+1), for j=1,2,,p.
pred=2
Each independent variable is normalized, with the jth variable scaled by 1/xjTxj. The scaling factor used by variable j is returned in b(j-1,nstep).
pred=3
As pred=1 and 2, all of the independent variables are mean centred prior to being normalized.
Suggested value: pred=3.
Constraint: pred=0, 1, 2 or 3.
preytypes::f77_integer
This optional parameter may be set using the method OptionalG02MA::prey and accessed via OptionalG02MA::get_prey.
Default: 1
On entry: indicates the type of data preprocessing to perform on the dependent variable supplied in y.
prey=0
No preprocessing is performed, this is equivalent to setting α=0.
prey=1
The dependent variable, y, is mean centred prior to fitting the model, so α=y¯. Which is equivalent to fitting a non-penalized intercept to the model and the degrees of freedom etc. are adjusted accordingly.
The value of α used is returned in fitsum(0,nstep).
Suggested value: prey=1.
Constraint: prey=0 or 1.
mnsteptypes::f77_integer
This optional parameter may be set using the method OptionalG02MA::mnstep and accessed via OptionalG02MA::get_mnstep.
Default: if (mtype=1): m; otherwise: 200*m
On entry: the maximum number of steps to carry out in the model fitting process.
If mtype=1, i.e., a LARS is being performed, the maximum number of steps the algorithm will take is min(p,n) if prey=0, otherwise min(p,n-1).
If mtype=2, i.e., a forward linear stagewise regression is being performed, the maximum number of steps the algorithm will take is likely to be several orders of magnitude more and is no longer bound by p or n.
If mtype=3 or 4, i.e., a LASSO or positive LASSO model is being fit, the maximum number of steps the algorithm will take lies somewhere between that of the LARS and forward linear stagewise regression, again it is no longer bound by p or n.
Constraint: mnstep1.
roptvector<double> array
This optional parameter may be set using the method OptionalG02MA::ropt and accessed via OptionalG02MA::get_ropt.
On entry: optional parameters to control various aspects of the LARS algorithm.
The default value will be used for ropt(i-1) if the length of ropt is less than i, therefore to use the default values for all optional parameters ropt need not be set. The default value will also be used if an invalid value is supplied for a particular argument, for example, setting ropt(i-1)=-1 will use the default value for argument i.
ropt(0)
The minimum step size that will be taken.
Default is 100×eps, where eps is the machine precision returned by precision.
ropt(1)
General tolerance, used amongst other things, for comparing correlations.
Default is ropt(0).
ropt(2)
If set to 1, parameter estimates are rescaled before being returned.
If set to 0, no rescaling is performed.
This argument has no effect when pred=0 or 1.
Default is for the parameter estimates to be rescaled.
ropt(3)
If set to 1, it is assumed that the model contains an intercept during the model fitting process and when calculating the degrees of freedom.
If set to 0, no intercept is assumed.
This has no effect on the amount of preprocessing performed on y.
Default is to treat the model as having an intercept when prey=1 and as not having an intercept when prey=0.
ropt(4)
As implemented, the LARS algorithm can either work directly with y and X, or it can work with the cross-product matrices, XTy and XTX. In most cases it is more efficient to work with the cross-product matrices. This flag allows you direct control over which method is used, however, the default value will usually be the best choice.
If ropt(4)=1, y and X are worked with directly.
If ropt(4)=0, the cross-product matrices are used.
Default is 1 when p500 and n<p and 0 otherwise.
Constraints:
  • ropt(0)>machine precision;
  • ropt(1)>machine precision;
  • ropt(2)=0 or 1;
  • ropt(3)=0 or 1;
  • ropt(4)=0 or 1.

5.1Additional Quantities

1: n
n, the number of observations.
2: m
m, the total number of independent variables.
3: lisx
Length of the isx array.
4: ldb
The first dimension of the array b.
5: lropt
Length of the options array ropt

6 Exceptions and Warnings

Errors or warnings detected by the function:
Note: in some cases lars may return useful information.
All errors and warnings have an associated numeric error code field, errorid, stored either as a member of the thrown exception object (see errorid), or as a member of opt.ifail, depending on how errors and warnings are being handled (see Error Handling for more details).
Raises: ErrorException
errorid=11
On entry, mtype = value.
Constraint: mtype = 1,2,3​ or ​4.
errorid=21
On entry, pred = value.
Constraint: pred = 0,1,2​ or ​3.
errorid=31
On entry, prey = value.
Constraint: prey = 0​ or ​1.
errorid=41
On entry, n = value.
Constraint: n 1.
errorid=51
On entry, m = value.
Constraint: m 1.
errorid=81
On entry, isx[value] = value.
Constraint: isx[i] = 0​ or ​1, for all i.
errorid=82
On entry, all values of isx are zero.
Constraint: at least one value of isx must be nonzero.
errorid=91
On entry, lisx = value and m = value.
Constraint: lisx = 0​ or ​m.
errorid=111
On entry, mnstep = value.
Constraint: mnstep 1.
errorid=151
On entry, ldb = value and m = value.
Constraint: if lisx = 0 then ldb m.
errorid=152
On entry, ldb = value and p=value.
Constraint: if lisx = m then ldbp.
errorid=181
On entry, lropt = value.
Constraint: 0≤;lropt≤;5.
errorid=10601
On entry, argument value must be a vector of size value array.
Supplied argument has value dimensions.
errorid=10601
On entry, argument value must be a vector of size value array.
Supplied argument was a vector of size value.
errorid=10601
On entry, argument value must be a vector of size value array.
The size for the supplied array could not be ascertained.
errorid=10601
On entry, argument value must be a value x value array.
Supplied argument has value dimensions.
errorid=10601
On entry, argument value must be a value x value array.
Supplied argument was a value x value array.
errorid=10601
On entry, argument value must be a value x value array.
Not all of the sizes for the supplied array could be ascertained.
errorid=10602
On entry, the raw data component of value is null.
errorid=10603
On entry, unable to ascertain a value for value.
errorid=10604
On entry, the data in value is stored in value Major Order.
The data was expected to be in value Major Order.
errorid=10703
An exception was thrown during IO (writing).
errorid=-99
An unexpected error has been triggered by this routine.
errorid=-399
Your licence key may have expired or may not have been installed correctly.
errorid=-999
Dynamic memory allocation failed.
Raises: WarningException
errorid=112
Fitting process did not finish in mnstep steps.
Try increasing the size of mnstep and supplying larger output arrays.
All output is returned as documented, up to step mnstep, however,
σ and the Cp statistics may not be meaningful.
errorid=161
σ2 is approximately zero and hence the Cp-type criterion
cannot be calculated. All other output is returned as documented.
errorid=162
νK=n, therefore σ has been set to a large value.
Output is returned as documented.
errorid=163
Degenerate model, no variables added and nstep=0.
Output is returned as documented.
errorid=163
Degenerate model, no variables added and nstep=0.
Output is returned as documented.

7 Accuracy

Not applicable.

8 Parallelism and Performance

Please see the description for the underlying computational routine in this section of the FL Interface documentation.

9 Further Comments

lars returns the parameter estimates at various points along the solution path of a LARS, LASSO or stagewise regression analysis. If the solution is required at a different set of points, for example when performing cross-validation, then g02mcf (no CPP interface) can be used.
For datasets with a large number of observations, n, it may be impractical to store the full X matrix in memory in one go. In such instances the cross-product matrices XTy and XTX can be calculated, using for example, multiple calls to g02buf (no CPP interface) and g02bzf (no CPP interface), and g02mbf (no CPP interface) called to perform the analysis.
The amount of workspace used by lars depends on whether the cross-product matrices are being used internally (as controlled by ropt). If the cross-product matrices are being used then lars internally allocates approximately 2p2+4p+max(np) elements of real storage compared to p2+3p+max(np)+2n+n×p elements when X and y are used directly. In both cases approximately 5p elements of integer storage are also used. If a forward linear stagewise analysis is performed than an additional p2+5p elements of real storage are required.

10 Example

This example performs a LARS on a simulated dataset with 20 observations and 6 independent variables.
Source FileDataResults
ex_g02ma.cppNoneex_g02ma.r