naginterfaces.library.correg.linregm_rssq¶

naginterfaces.library.correg.linregm_rssq(x, vname, isx, y, mean='M', wt=None)[source]¶

linregm_rssq calculates the residual sums of squares for all possible linear regressions for a given set of independent variables.

For full information please refer to the NAG Library document for g02ea

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g02/g02eaf.html

Parameters

xfloat, array-like, shape $(n, m)$

$x [i - 1, j - 1]$ must contain the $i$ th observation for the $j$ th independent variable, for $j = 1, 2, \dots, m$ , for $i = 1, 2, \dots, n$ .

vnamestr, array-like, shape $(m)$

$v n a m e [j - 1]$ must contain the name of the variable in column $j$ of $x$ , for $j = 1, 2, \dots, m$ .

isxint, array-like, shape $(m)$

Indicates which independent variables are to be considered in the model.

$i s x [j - 1] \geq 2$

The variable contained in the $j$ th column of $x$ is included in all regression models, i.e., is a forced variable.

$i s x [j - 1] = 1$

The variable contained in the $j$ th column of $x$ is included in the set from which the regression models are chosen, i.e., is a free variable.

$i s x [j - 1] = 0$

The variable contained in the $j$ th column of $x$ is not included in the models.

yfloat, array-like, shape $(n)$

$y [i - 1]$ must contain the $i$ th observation on the dependent variable, $y_{i}$ , for $i = 1, 2, \dots, n$ .

meanstr, length 1, optional

Indicates if a mean term is to be included.

$m e a n ='M'$

A mean term, intercept, will be included in the model.

$m e a n ='Z'$

The model will pass through the origin, zero-point.

wtNone or float, array-like, shape $(n)$ , optional

If provided $w t$ must contain the weights to be used with the model.

If $w t [i - 1] = 0.0$ , the $i$ th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.

If $w t$ is not provided the effective number of observations is $n$ .

Returns

nmodint: The total number of models for which residual sums of squares have been calculated.
modlstr, ndarray, shape $(max (2^{k}, m), m)$: The first $n t e r m s [i - 1]$ elements of the $i$ th row of $m o d l$ contain the names of the independent variables, as given in $v n a m e$ , that are included in the $i$ th model.
rssfloat, ndarray, shape $(max (2^{k}, m))$: $r s s [i - 1]$ contains the residual sum of squares for the $i$ th model, for $i = 1, 2, \dots, n m o d$ .
ntermsint, ndarray, shape $(max (2^{k}, m))$: $n t e r m s [i - 1]$ contains the number of independent variables in the $i$ th model, not including the mean if one is fitted, for $i = 1, 2, \dots, n m o d$ .
mrankint, ndarray, shape $(max (2^{k}, m))$: $m r a n k [i - 1]$ contains the rank of the residual sum of squares for the $i$ th model.

Raises

NagValueError

(errno $1$ )

On entry, $ldmodl = ⟨ v a l u e ⟩$ and $m = ⟨ v a l u e ⟩$ .

Constraint: $ldmodl \geq m$ .

(errno $1$ )

On entry, $weight = ⟨ v a l u e ⟩$ .

Constraint: $weight ='W'$ or $'U'$ .

(errno $1$ )

On entry, $m e a n = ⟨ v a l u e ⟩$ .

Constraint: $m e a n ='M'$ or $'Z'$ .

(errno $1$ )

On entry, $m = ⟨ v a l u e ⟩$ .

Constraint: $m \geq 2$ .

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 2$ .

(errno $2$ )

On entry, $w t [⟨ v a l u e ⟩] < 0.0$ .

Constraint: $w t [i - 1] \geq 0.0$ , for $i = 1, 2, \dots, n$ .

(errno $3$ )

On entry, $i s x [⟨ v a l u e ⟩] < 0$ .

Constraint: $i s x [i - 1] \geq 0$ , for $i = 1, 2, \dots, m$ .

(errno $3$ )

There are no free variables, i.e., no element of $i s x = 1$ .

(errno $4$ )

On entry, $ldmodl = ⟨ v a l u e ⟩$ and number of possible models is $⟨ v a l u e ⟩$ .

Constraint: $ldmodl \geq$ the number of possible models.

(errno $5$ )

On entry, the number of independent variables to be considered (forced plus free plus mean if included) is greater or equal to the effective number of observations.

(errno $6$ )

The full model is not of full rank, i.e., some of the independent variables may be linear combinations of other independent variables. Variables must be excluded from the model in order to give full rank.

Notes

For a set of $k$ possible independent variables there are $2^{k}$ linear regression models with from zero to $k$ independent variables in each model. For example if $k = 3$ and the variables are $A$ , $B$ and $C$ then the possible models are:

null model
$A$
$B$
$C$
$A$ and $B$
$A$ and $C$
$B$ and $C$
$A$ , $B$ and $C$ .

linregm_rssq calculates the residual sums of squares from each of the $2^{k}$ possible models. The method used involves a $Q R$ decomposition of the matrix of possible independent variables. Independent variables are then moved into and out of the model by a series of Givens rotations and the residual sums of squares computed for each model; see Clark (1981) and Smith and Bremner (1989).

The computed residual sums of squares are then ordered first by increasing number of terms in the model, then by decreasing size of residual sums of squares. So the first model will always have the largest residual sum of squares and the $2^{k}$ th will always have the smallest. This aids you in selecting the best possible model from the given set of independent variables.

linregm_rssq allows you to specify some independent variables that must be in the model, the forced variables. The other independent variables from which the possible models are to be formed are the free variables.

References

Clark, M R B, 1981, A Givens algorithm for moving from one linear model to another without going back to the data, Appl. Statist. (30), 198–203

Smith, D M and Bremner, J M, 1989, All possible subset regressions using the $Q R$ decomposition, Comput. Statist. Data Anal. (7), 217–236

Weisberg, S, 1985, Applied Linear Regression, Wiley

NAG and Python

Return to Front

naginterfaces.library.correg.linregm_rssq¶

naginterfaces.library.correg.linregm_​rssq¶

naginterfaces.library.correg.linregm_rssq¶