When selecting a linear regression model for a set of
observations a balance has to be found between the number of independent variables in the model and fit as measured by the residual sum of squares. The more variables included the smaller will be the residual sum of squares. Two statistics can help in selecting the best model.
(a) |
represents the proportion of variation in the dependent variable that is explained by the independent variables.
where |
(if mean is fitted, otherwise ) and |
|
, where
. |
The -values can be examined to find a model with a high -value but with small number of independent variables. |
(b) |
statistic.
where is the number of arguments (including the mean) in the model and is an estimate of the true variance of the errors. This can often be obtained from fitting the full model.
A well fitting model will have . is often plotted against to see which models are closest to the line. |
nag_cp_stat (g02ecc) may be called after
nag_all_regsn (g02eac) which calculates the residual sums of squares for all possible linear regression models.
- 1:
mean – Nag_IncludeMeanInput
On entry: indicates if a mean term is to be included.
- A mean term, intercept, will be included in the model.
- The model will pass through the origin, zero-point.
Constraint:
or .
- 2:
n – IntegerInput
On entry: , the number of observations used in the regression model.
Constraint:
must be greater than , where is the largest number of independent variables fitted (including the mean if fitted).
- 3:
sigsq – doubleInput
On entry: the best estimate of true variance of the errors, .
Constraint:
.
- 4:
tss – doubleInput
On entry: the total sum of squares for the regression model.
Constraint:
.
- 5:
nmod – IntegerInput
On entry: the number of regression models.
Constraint:
.
- 6:
nterms[nmod] – const IntegerInput
On entry: must contain the number of independent variables (not counting the mean) fitted to the th model, for .
On entry: must contain the residual sum of squares for the th model.
Constraint:
, for .
- 8:
rsq[nmod] – doubleOutput
On exit: contains the -value for the th model, for .
- 9:
cp[nmod] – doubleOutput
On exit: contains the -value for the th model, for .
- 10:
fail – NagError *Input/Output
-
The NAG error argument (see
Section 3.6 in the Essential Introduction).
Accuracy is sufficient for all practical purposes.
Not applicable.
None.
The data, from an oxygen uptake experiment, is given by
Weisberg (1985). The independent and dependent variables are read and the residual sums of squares for all possible models computed using
nag_all_regsn (g02eac). The values of
and
are then computed and printed along with the names of variables in the models.