One method of selecting a linear regression model from a given set of independent variables is by forward selection. The following procedure is used:
(i) |
Select the best fitting independent variable, i.e., the independent variable which gives the smallest residual sum of squares. If the -test for this variable is greater than a chosen critical value, , then include the variable in the model, else stop. |
(ii) |
Find the independent variable that leads to the greatest reduction in the residual sum of squares when added to the current model. |
(iii) |
If the -test for this variable is greater than a chosen critical value, , then include the variable in the model and go to (ii), otherwise stop. |
At any step the variables not in the model are known as the free terms.
nag_correg_linregm_fit_onestep (g02ee) computes one step of the forward selection procedure at a call. The results produced at each step may be printed or used as inputs to
nag_correg_linregm_update (g02dd), in order to compute the regression coefficients for the model fitted at that step. Repeated calls to
nag_correg_linregm_fit_onestep (g02ee) should be made until
is indicated.
Note: after the initial call to
nag_correg_linregm_fit_onestep (g02ee) with
all arguments except
fin must not be changed by you between calls.
None.
The data, from an oxygen uptake experiment, is given by
Weisberg (1985). The names of the variables are as given in
Weisberg (1985). The independent and dependent variables are read and
nag_correg_linregm_fit_onestep (g02ee) is repeatedly called until
. At each step the
statistic, the free variables and their extra sum of squares are printed; also, except for when
, the new variable, the change in the residual sum of squares and the terms in the model are printed.
function g02ee_example
fprintf('g02ee example results\n\n');
x = [ 0, 1125, 232, 7160, 85.9, 8905;
7, 920, 268, 8804, 86.5, 7388;
15, 835, 271, 8108, 85.2, 5348;
22, 1000, 237, 6370, 83.8, 8056;
29, 1150, 192, 6441, 82.1, 6960;
37, 990, 202, 5154, 79.2, 5690;
44, 840, 184, 5896, 81.2, 6932;
58, 650, 200, 5336, 80.6, 5400;
65, 640, 180, 5041, 78.4, 3177;
72, 583, 165, 5012, 79.3, 4461;
80, 570, 151, 4825, 78.7, 3901;
86, 570, 171, 4391, 78.0, 5002;
93, 510, 243, 4320, 72.3, 4665;
100, 555, 147, 3709, 74.9, 4642;
107, 460, 286, 3969, 74.4, 4840;
122, 275, 198, 3558, 72.5, 4479;
129, 510, 196, 4361, 57.7, 4200;
151, 165, 210, 3301, 71.8, 3410;
171, 244, 327, 2964, 72.5, 3360;
220, 79, 334, 2777, 71.9, 2599];
y = [ 1.5563; 0.8976; 0.7482; 0.7160; 0.3010;
0.3617; 0.1139; 0.1139; -0.2218; -0.1549;
0.0000; 0.0000; -0.0969; -0.2218; -0.3979;
-0.1549; -0.2218; -0.3979; -0.5229; -0.0458];
[n,m] = size(x);
mean_p = 'M';
isx = ones(m,1,'int64');
isx(1) = 0;
isx(m) = 2;
vname = {'DAY'; 'BOD'; 'TKN'; 'TS '; 'TVS'; 'COD'};
nzero = int64(0);
model = {' '; ' '; ' '; ' '; ' '; ' '};
nterm = nzero;
rss = 0;
idf = nzero;
ifr = nzero;
free = model;
q = zeros(n,m+2);
p = zeros(m+1,1);
istep = nzero;
addvar = true;
while addvar
[istep, addvar, newvar, chrss, f, model, nterm, ...
rss, idf, ifr, free, exss, q, p, ifail] = ...
g02ee( ...
istep, mean_p, x, vname, isx, y, model, nterm, ...
rss, idf, ifr, free, q, p);
fprintf('Step %3d\n', istep);
if ~addvar
fprintf('No further variables added max F = %7.2f\n', f);
else
fprintf('Added variable is %s\n', newvar);
fprintf('Change in residual sum of squares = %13.4e\n', chrss);
fprintf('F Statistic = %7.2f\n\n', f);
fprintf('Variables in model :');
fprintf(' %s', model{1:nterm,1});
fprintf('\n\nResidual sum of squares = %13.4e\n', rss);
fprintf('Degrees of freedom = %2d\n\n', idf);
end
if ifr==0
fprintf('No free variables remaining\n');
addvar = false;
else
fprintf('Free variables :')
fprintf(' %s', free{1:ifr,1});
fprintf('\nChange in RSS for free variables:\n%33s',' ');
fprintf('%8.4f', exss(1:ifr));
fprintf('\n\n');
end
end