g02mb:: Correlation and Regression Analysis (NAG Toolbox)

Given a vector of

n

observed values,

y = \{y_{i} : i = 1, 2, \dots, n\}

and an

n \times p

design matrix

X

, where the

j

th column of

X

, denoted

x_{j}

, is a vector of length

n

representing the

j

th independent variable

x_{j}

, standardized such that

\sum_{i = 1}^{n} x_{i j} = 0

, and

\sum_{i = 1}^{n} x_{i j}^{2} = 1

and a set of model parameters

β

to be estimated from the observed values, the LARS algorithm can be summarised as:

1.	Set $k = 1$ and all coefficients to zero, that is $β = 0$ .
2.	Find the variable most correlated with $y$ , say $x_{j_{1}}$ . Add $x_{j_{1}}$ to the ‘most correlated’ set $A$ . If $p = 1$ go to 8.
3.	Take the largest possible step in the direction of $x_{j_{1}}$ (i.e., increase the magnitude of $β_{j_{1}}$ ) until some other variable, say $x_{j_{2}}$ , has the same correlation with the current residual, $y - x_{j_{1}} β_{j_{1}}$ .
4.	Increment $k$ and add $x_{j_{k}}$ to $A$ .
5.	If $\|A\| = p$ go to 8.
6.	Proceed in the ‘least angle direction’, that is, the direction which is equiangular between all variables in $A$ , altering the magnitude of the parameter estimates of those variables in $A$ , until the $k$ th variable, $x_{j_{k}}$ , has the same correlation with the current residual.
7.	Go to 4.
8.	Let $K = k$ .

Forward stagewise linear regression is an iterative procedure of the form:

Initialize

k = 1

and the vector of residuals

r_{0} = y - α

For each

j = 1, 2, \dots, p

calculate

c_{j} = x_{j}^{T} r_{k - 1}

. The value

c_{j}

is therefore proportional to the correlation between the

j

th independent variable and the vector of previous residual values,

r_{k}

Calculate

j_{k} = \underset{j}{argmax} |c_{j}|

, the value of

j

with the largest absolute value of

c_{j}

|c_{j_{k}}| < ε

then go to 7.

Update the residual values, with

r_{k} = r_{k - 1} + δ ​ ​ sign (c_{j_{k}}) x_{j_{k}}

where

δ

is a small constant and

sign (c_{j_{k}}) = - 1

when

c_{j_{k}} < 0

and

1

otherwise.

Increment

k

and go to 2.

Set

K = k

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

Accuracy

Further Comments

Example

function g02mb_example


fprintf('g02mb example results\n\n');

% Going to be fitting a LAR model via g02mb
mtype = int64(1);

% Augmented matrix [D y]
dy = [10.28  1.77  9.69 15.58  8.23 10.44  -46.47;
       9.08  8.99 11.53  6.57 15.89 12.58  -35.80;
      17.98 13.10  1.04 10.45 10.12 16.68 -129.22;
      14.82 13.79 12.23  7.00  8.14  7.79  -42.44;
      17.53  9.41  6.24  3.75 13.12 17.08  -73.51;
       7.78 10.38  9.83  2.58 10.13  4.25  -26.61;
      11.95 21.71  8.83 11.00 12.59 10.52  -63.90;
      14.60 10.09 -2.70  9.89 14.67  6.49  -76.73;
       3.63  9.07 12.59 14.09  9.06  8.19  -32.64;
       6.35  9.79  9.40 12.79  8.38 16.79  -83.29;
       4.66  3.55 16.82 13.83 21.39 13.88  -16.31;
       8.32 14.04 17.17  7.93  7.39 -1.09   -5.82;
      10.86 13.68  5.75 10.44 10.36 10.06  -47.75;
       4.76  4.92 17.83  2.90  7.58 11.97   18.38;
       5.05 10.41  9.89  9.04  7.90 13.12  -54.71;
       5.41  9.32  5.27 15.53  5.06 19.84  -55.62;
       9.77  2.37  9.54 20.23  9.33  8.82  -45.28;
      14.28  4.34 14.23 14.95 18.16 11.03  -22.76;
      10.17  6.80  3.17  8.57 16.07 15.93 -104.32;
       5.39  2.67  6.37 13.56 10.68  7.35  -55.94];

% Number of observations in the dataset
n = int64(size(dy,1));

% Calculate the means and cross-product matrix around the mean
mean_p = 'M';
[~,wmean,dtd,ifail] = g02bu( ...
                             dy,'mean_p',mean_p);

% Number of variables
m = int64(size(dy,2) - 1);

% The first pm elements of dtd contain the cross-products of D
% with itself, the next m elements hold the cross-product of D
% and y and the last element holds the cross-product of y with
% itself
pm = m*(m+1)/2;

% g02mb can issue warnings, but return sensible results,
% so save current warning state and turn warnings on
warn_state = nag_issue_warnings();
nag_issue_warnings(true);

[b,fitsum,ifail] = g02mb( ...
                          mtype,n,dtd(1:pm),dtd((pm+1):(pm+m)),dtd(end));

% Reset the warning state to its initial value
nag_issue_warnings(warn_state);

% Print the results
ip = size(b,1);
nstep = size(b,2) - 1;

fprintf('  Step %s Parameter Estimate\n ',repmat(' ',1,max(ip-2,0)*5));
fprintf(repmat('-',1,5+ip*10));
fprintf('\n');
for k = 1:nstep
  fprintf('  %3d',k);
  for j = 1:ip
    fprintf(' %9.3f',b(j,k));
  end
  fprintf('\n');
end
fprintf('\n');
fprintf(' alpha: %9.3f\n', wmean(m+1));
fprintf('\n');
fprintf('  Step     Sum      RSS       df       Cp       Ck     Step Size\n ');
fprintf(repmat('-',1,64));
fprintf('\n');
for k = 1:nstep
  fprintf('  %3d %9.3f %9.3f %6.0f  %9.3f %9.3f %9.3f\n', ...
          k,fitsum(1,k),fitsum(2,k),fitsum(3,k), ...
          fitsum(4,k),fitsum(5,k),fitsum(6,k));
end
fprintf('\n');
fprintf(' sigma^2: %9.3f\n', fitsum(5,nstep+1));

% Plot the parameter estimates
fig1 = figure;
ip = size(b,1);
nstep = size(b,2) - 2;

% Extract the sum of the absolute parameter estimates
xpos = transpose(repmat(fitsum(1,1:nstep),ip,1));

% Extract the parameter estimates
ypos = transpose(b(1:ip,1:nstep));

% Start both xpos and ypos at zero
xpos = [zeros(1,ip);xpos];
ypos = [zeros(1,ip);ypos];

% Get min and max for X and Y
xmin = min(min(xpos));
xmax = max(max(xpos));
ymin = min(min(ypos));
ymax = max(max(ypos));

% Get a range that is 10% past the data
ext = 1 + [-0.1 0.1];
xrng = [min(xmin*ext),max(xmax*ext)];
yrng = [min(ymin*ext),max(ymax*ext)];

% Extend the end of the lines we plot to cover this range
xpos = [xpos;xrng(2)*ones(1,ip)];
ypos = [ypos;ypos(end,:)];

% Produce the plot
plot(xpos,ypos);

% Change the axis limits
xlim(xrng);
ylim(yrng);

% Add title and labels
title({'{\bf g02mb Example Plot}'; ...
       'Estimates for LAR model fitted to simulated dataset'});
xlabel('{\bf \Sigma_j |\beta_{kj} |}');
ylabel('{\bf Parameter Estimates (\beta_{kj})}');

% Add legend
label = [repmat('\beta_{k',ip,1) num2str(transpose(linspace(1,ip,ip))) ...
         repmat('}',ip,1)];
h = legend(label,'Location','SouthOutside','Orientation','Horizontal');
set(h,'FontSize',get(h,'FontSize')*0.8);

g02mb example results

  Step                      Parameter Estimate
 -----------------------------------------------------------------
    1     0.000     0.000     3.125     0.000     0.000     0.000
    2     0.000     0.000     3.792     0.000     0.000    -0.713
    3    -0.446     0.000     3.998     0.000     0.000    -1.151
    4    -0.628    -0.295     4.098     0.000     0.000    -1.466
    5    -1.060    -1.056     4.110    -0.864     0.000    -1.948
    6    -1.073    -1.132     4.118    -0.935    -0.059    -1.981

 alpha:   -50.037

  Step     Sum      RSS       df       Cp       Ck     Step Size
 ----------------------------------------------------------------
    1    72.446  8929.855      2     13.355   123.227    72.446
    2   103.385  6404.701      3      7.054    50.781    24.841
    3   126.243  5258.247      4      5.286    30.836    16.225
    4   145.277  4657.051      5      5.309    19.319    11.587
    5   198.223  3959.401      6      5.016    12.266    24.520
    6   203.529  3954.571      7      7.000     0.910     2.198

 sigma^2:   304.198

NAG Toolbox: nag_correg_lars_xtx (g02mb)

▸▿ Contents

Purpose

Syntax

Description