For a set of
possible independent variables there are
linear regression models with from zero to
independent variables in each model. For example if
and the variables are
,
and
then the possible models are:
(i) |
null model |
(ii) |
|
(iii) |
|
(iv) |
|
(v) |
and |
(vi) |
and |
(vii) |
and |
(viii) |
, and . |
nag_correg_linregm_rssq (g02ea) calculates the residual sums of squares from each of the
possible models. The method used involves a
decomposition of the matrix of possible independent variables. Independent variables are then moved into and out of the model by a series of Givens rotations and the residual sums of squares computed for each model; see
Clark (1981) and
Smith and Bremner (1989).
The computed residual sums of squares are then ordered first by increasing number of terms in the model, then by decreasing size of residual sums of squares. So the first model will always have the largest residual sum of squares and the th will always have the smallest. This aids you in selecting the best possible model from the given set of independent variables.
Clark M R B (1981) A Givens algorithm for moving from one linear model to another without going back to the data Appl. Statist. 30 198–203
For a discussion of the improved accuracy obtained by using a method based on the
decomposition see
Smith and Bremner (1989).
The data for this example is given in
Weisberg (1985). The independent variables and the dependent variable are read, as are the names of the variables. These names are as given in
Weisberg (1985). The residual sums of squares computed and printed with the names of the variables in the model.
function g02ea_example
fprintf('g02ea example results\n\n');
x = [ 0, 1125, 232, 7160, 85.9, 8905;
7, 920, 268, 8804, 86.5, 7388;
15, 835, 271, 8108, 85.2, 5348;
22, 1000, 237, 6370, 83.8, 8056;
29, 1150, 192, 6441, 82.1, 6960;
37, 990, 202, 5154, 79.2, 5690;
44, 840, 184, 5896, 81.2, 6932;
58, 650, 200, 5336, 80.6, 5400;
65, 640, 180, 5041, 78.4, 3177;
72, 583, 165, 5012, 79.3, 4461;
80, 570, 151, 4825, 78.7, 3901;
86, 570, 171, 4391, 78.0, 5002;
93, 510, 243, 4320, 72.3, 4665;
100, 555, 147, 3709, 74.9, 4642;
107, 460, 286, 3969, 74.4, 4840;
122, 275, 198, 3558, 72.5, 4479;
129, 510, 196, 4361, 57.7, 4200;
151, 165, 210, 3301, 71.8, 3410;
171, 244, 327, 2964, 72.5, 3360;
220, 79, 334, 2777, 71.9, 2599];
y = [ 1.5563; 0.8976; 0.7482; 0.7160; 0.3010;
0.3617; 0.1139; 0.1139; -0.2218; -0.1549;
0.0000; 0.0000; -0.0969; -0.2218; -0.3979;
-0.1549; -0.2218; -0.3979; -0.5229; -0.0458];
[n,m] = size(x);
mean_p = 'M';
isx = ones(m,1,'int64');
isx(1) = 0;
vname = {'DAY'; 'BOD'; 'TKN'; 'TS '; 'TVS'; 'COD'};
[nmod, model, rss, nterms, mrank, ifail] = ...
g02ea(mean_p, x, vname, isx, y);
fprintf(' Parameters RSS rank model\n');
for j = 1:nmod
fprintf('%8d%11.4f%4d ', nterms(j), rss(j), mrank(j));
fprintf(' %s', model{j,:});
fprintf('\n');
end