NAG Library Function Document
nag_all_regsn (g02eac)
1 Purpose
nag_all_regsn (g02eac) calculates the residual sums of squares for all possible linear regressions for a given set of independent variables.
2 Specification
#include <nag.h> |
#include <nagg02.h> |
void |
nag_all_regsn (Nag_OrderType order,
Nag_IncludeMean mean,
Integer n,
Integer m,
const double x[],
Integer pdx,
const char *var_names[],
const Integer sx[],
const double y[],
const double wt[],
Integer *nmod,
const char *model[],
double rss[],
Integer nterms[],
Integer mrank[],
NagError *fail) |
|
3 Description
For a set of
possible independent variables there are
linear regression models with from zero to
independent variables in each model. For example if
and the variables are
,
and
then the possible models are:
(i) |
null model |
(ii) |
|
(iii) |
|
(iv) |
|
(v) |
and |
(vi) |
and |
(vii) |
and |
(viii) |
, and . |
nag_all_regsn (g02eac) calculates the residual sums of squares from each of the
possible models. The method used involves a
decomposition of the matrix of possible independent variables. Independent variables are then moved into and out of the model by a series of Givens rotations and the residual sums of squares computed for each model; see
Clark (1981) and
Smith and Bremner (1989).
The computed residual sums of squares are then ordered first by increasing number of terms in the model, then by decreasing size of residual sums of squares. So the first model will always have the largest residual sum of squares and the th will always have the smallest. This aids you in selecting the best possible model from the given set of independent variables.
nag_all_regsn (g02eac) allows you to specify some independent variables that must be in the model, the forced variables. The other independent variables from which the possible models are to be formed are the free variables.
4 References
Clark M R B (1981) A Givens algorithm for moving from one linear model to another without going back to the data Appl. Statist. 30 198–203
Smith D M and Bremner J M (1989) All possible subset regressions using the decomposition Comput. Statist. Data Anal. 7 217–236
Weisberg S (1985) Applied Linear Regression Wiley
5 Arguments
- 1:
order – Nag_OrderTypeInput
-
On entry: the
order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by
. See
Section 3.2.1.3 in the Essential Introduction for a more detailed explanation of the use of this argument.
Constraint:
or .
- 2:
mean – Nag_IncludeMeanInput
On entry: indicates if a mean term is to be included.
- A mean term, intercept, will be included in the model.
- The model will pass through the origin, zero-point.
Constraint:
or .
- 3:
n – IntegerInput
On entry: , the number of observations.
Constraints:
- ;
- , is the number of independent variables to be considered (forced plus free plus mean if included), as specified by mean and sx.
- 4:
m – IntegerInput
On entry: the number of variables contained in
x.
Constraint:
.
- 5:
x[] – const doubleInput
-
Note: the dimension,
dim, of the array
x
must be at least
- when ;
- when .
Where
appears in this document, it refers to the array element
- when ;
- when .
On entry: must contain the th observation for the th independent variable, for and .
- 6:
pdx – IntegerInput
-
On entry: the stride separating row or column elements (depending on the value of
order) in the array
x.
Constraints:
- if ,
;
- if , .
- 7:
var_names[m] – const char *Input
On entry:
must contain the name of the independent variable in row
of
x, for
.
- 8:
sx[m] – const IntegerInput
On entry: indicates which independent variables are to be considered in the model.
- The variable contained in the th column of X is included in all regression models, i.e., is a forced variable.
- The variable contained in the th column of X is included in the set from which the regression models are chosen, i.e., is a free variable.
- The variable contained in the th column of X is not included in the models.
Constraints:
- , for ;
- at least one value of .
- 9:
y[n] – const doubleInput
On entry: must contain the th observation on the dependent variable, , for .
- 10:
wt[] – const doubleInput
On entry: optionally, the weights to be used in the weighted regression.
If , then the th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
If weights are not provided then
wt must be set to
NULL and the effective number of observations is
n.
Constraint:
if , , for .
- 11:
nmod – Integer *Output
On exit: the total number of models for which residual sums of squares have been calculated.
- 12:
model[] – const char *Output
-
Note: the dimension,
dim, of the array
model
must be at least
big enough to hold the names of all the free independent variables which appear in all the models. This will never exceed
, where
is the number of free variables in the model.
On exit: the names of the independent variables in each model, represented as pointers to the names provided by you in
var_names. The model names are stored as follows:
- if the first model has three names, i.e., ; then , and will contain these three names;
- if the second model has two names, i.e., ; then , will contain these two names.
On exit: contains the residual sum of squares for the th model, for .
- 14:
nterms[] – IntegerOutput
On exit: contains the number of independent variables in the th model, not including the mean if one is fitted, for .
- 15:
mrank[] – IntegerOutput
On exit: contains the rank of the residual sum of squares for the th model.
- 16:
fail – NagError *Input/Output
-
The NAG error argument (see
Section 3.6 in the Essential Introduction).
6 Error Indicators and Warnings
- NE_ALLOC_FAIL
-
Dynamic memory allocation failed.
- NE_BAD_PARAM
-
On entry, argument had an illegal value.
- NE_FREE_VARS
-
There are no free
x variables.
- NE_FULL_RANK
-
Full model is not of full rank.
- NE_INDEP_VARS_OBS
-
Number of requested -variables number of observations.
- NE_INT
-
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INT_2
-
On entry, and .
Constraint: .
On entry, and .
Constraint: .
- NE_INT_ARRAY_ELEM_CONS
-
On entry, .
- NE_INTERNAL_ERROR
-
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
- NE_REAL_ARRAY_ELEM_CONS
-
On entry, .
7 Accuracy
For a discussion of the improved accuracy obtained by using a method based on the
decomposition see
Smith and Bremner (1989).
8 Parallelism and Performance
nag_all_regsn (g02eac) is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
nag_all_regsn (g02eac) makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
Users' Note for your implementation for any additional implementation-specific information.
nag_cp_stat (g02ecc) may be used to compute
and
-values from the results of nag_all_regsn (g02eac).
If a mean has been included in the model and no variables are forced in then contains the total sum of squares and in many situations a reasonable estimate of the variance of the errors is given by .
10 Example
The data for this example is given in
Weisberg (1985). The independent variables and the dependent variable are read, as are the names of the variables. These names are as given in
Weisberg (1985). The residual sums of squares computed and printed with the names of the variables in the model.
10.1 Program Text
Program Text (g02eace.c)
10.2 Program Data
Program Data (g02eace.d)
10.3 Program Results
Program Results (g02eace.r)