naginterfaces.library.blgm.lm_submodel¶

naginterfaces.library.blgm.lm_submodel(what, hform, hxdesc, lisx, lplab, lvinfo, lenlab=210)[source]¶

lm_submodel produces labels for the columns of a design matrix, model parameters and a vector of column inclusion flags suitable for use with functions in submodule correg. Thus allowing for submodels to be fit using the same design matrix.

For full information please refer to the NAG Library document for g22yd

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g22/g22ydf.html

Parameters

whatstr

Controls what labels are to be produced:

$w h a t = ‘S'$

Labels for a submodel are required. The submodel must be supplied in $h f o r m$ .

$w h a t = ‘X'$

Labels for the design matrix $X$ .

If $h x d e s c$ was returned by correg.lmm_init in $hlmm$ then $X$ is the design matrix associated with the fixed parameters.

$w h a t = ‘Z'$

Labels for the design matrix $Z$ .

If $h x d e s c$ was returned by correg.lmm_init in $hlmm$ then $Z$ is the part of the design matrix associated with the random parameters.

$w h a t = ‘V'$

Labels for the variance components.

hformHandle

A G22 handle to the internal data structure containing a description of the required submodel $M_{S}$ , as returned in $hform$ by lm_formula(). If $w h a t! = ‘S'$ $h f o r m$ is not referenced and need not be set.

hxdescHandle

A G22 handle to the internal data structure containing a description of the design matrix, $D$ .

lisxint

Length of $i s x$ .

lplabint

The length of $p l a b$ .

As $p \leq m_{x} + 1$ , if labels are required, using $l p l a b = m_{x} + 1$ will always be sufficient.

lvinfoint

The length of $v i n f o$ .

Let $n_{T}$ denote the number of terms in $M_{S}$ , $n_{T t}$ denote the number of variables in the $t$ th term and $m_{x t}$ denote the number of columns of $X$ corresponding to the $t$ th term.

The required size of $v i n f o$ , denoted $a$ is given by:

a = n_{T} \sum 1 m_{x t} (1 + 3 n_{T t}) .

If the model includes a mean effect, $a$ should be incremented by one.

The values $n_{T}$ , $n_{T t}$ and $m_{x t}$ are not trivial to calculate as they require the formula describing the model to be fully expanded and the contrast / dummy variable encoding to be known.

Therefore, if $l i s x$ , $l p l a b$ or $l v i n f o$ are too small and $l v i n f o \geq 3$ , $e r r n o$ = 102 is returned and the required sizes for these arrays are returned in $v i n f o [0]$ , $v i n f o [1]$ and $v i n f o [2]$ respectively.

lenlabint, optional

Length of the strings allocated in $p l a b$ . At most $l e n l a b$ characters will be written into each element of $p l a b$ .

Returns

intcptstr

If $i n t c p t = ‘M'$ , in order to fit the model $M_{S}$ to $D$ using $X$ , any analysis function should include an implicit mean effect (intercept term).

$i n t c p t = ‘Z'$ , if $M_{S}$ does not include a mean effect or the mean effect has been explicitly included in the design matrix.

ipint

$p$ , the number of parameters in the (sub)model, including the intercept if one is present. If $w h a t = ‘S'$ , then the submodel is the one specified in $h f o r m$ otherwise the model is the one used when defining the design matrix described in $h x d e s c$ .

If $l i s x \neq 0$ , if $i n t c p t = ‘Z'$ , $p = \sum_{i = 1}^{m_{x}} i s x [i - 1]$ , otherwise $p = \sum_{i = 1}^{m_{x}} i s x [i - 1] + 1$ .

isxNone or int, ndarray, shape $(l i s x)$

If $l i s x \neq 0$ , an array indicating which columns of the design matrix from the model specified in $h f o r m$ are to be used.

$i s x [j - 1] = 0$

The $j$ th column of the design matrix, $X$ , should not be included in the analysis.

$i s x [j - 1] = 1$

The $j$ th column of the design matrix, $X$ , should be included in the analysis.

If $l i s x = 0$ , $i s x$ is not referenced.

plabNone or str, ndarray, shape $(min (i p, l p l a b))$

If $l p l a b \neq 0$ , the names associated with the $p$ parameters in the model.

If $i n t c p t = ‘Z'$ , the labels in $p l a b$ are also the labels for the columns of design matrix used in the analysis.

If $i n t c p t = ‘M'$ , columns $p l a b [1]$ to $p l a b [p - 1]$ are the corresponding column labels.

If a mean effect is present in $M_{S}$ , the corresponding label is always in $p l a b [0]$ .

If $l p l a b = 0$ , $p l a b$ is not referenced.

vinfoNone or int, ndarray, shape $(l v i n f o)$

If $l v i n f o \neq 0$ , information encoding a description of the parameters in the model.

The encoding information can be extracted as follows:

Set $k = 1$ .
Iterate $j$ from $1$ to $p$ .
1. Set $b = v i n f o [k - 1]$ .
2. Increment $k$ .
3. Iterate $i$ from $1$ to $b$ .
  1. Set $v_{i} = v i n f o [k - 1]$ .
  2. Set $l_{i} = v i n f o [k]$ .
  3. Set $c_{i} = v i n f o [k + 1]$ .
  4. Increment $k$ by $3$ .
4. The $j$ th model parameter corresponds to the interaction between the $b$ variables held in columns $v_{1}, v_{2}, \dots, v_{b}$ of $D$ . Therefore, $b = 1$ indicates a main effect, $b = 2$ a two-way interaction, etc..
  
  If $b = 0$ , the $j$ th model parameter corresponds to the mean effect.
  
  If $l_{i} = 0$ , the corresponding variable $v_{i}$ is binary, ordinal or continuous.
  
  Otherwise, $l_{i}$ is the level for the corresponding variable for model parameter $j$ .
  
  $c_{i}$ is a numeric flag indicating the contrast used in the case of a categorical variable.
  
  With $c_{i} = 0$ indicating that dummy variables were used for variable $v_{i}$ in this term.
  
  The remaining six types of contrast; treatment contrasts (with respect to the first and last levels), sum contrasts (with respect to the first and last levels), Helmert contrasts and polynomial contrasts, as described in lm_design_matrix(), are identified by the integers one to six respectively.

If $l v i n f o = 0$ , $v i n f o$ is not referenced.

Raises

NagValueError

(errno $11$ )

On entry, $w h a t = ⟨ v a l u e ⟩$ was an illegal value.

(errno $12$ )

Supplied value of $w h a t$ is not valid for the G22 handle supplied in $h x d e s c$ .

(errno $21$ )

$h f o r m$ has not been initialized or is corrupt.

(errno $22$ )

$h f o r m$ is not a G22 handle as generated by lm_formula().

(errno $23$ )

A variable name used when creating $h f o r m$ is not present in $h x d e s c$ .

Variable name: $⟨ v a l u e ⟩$ .

(errno $24$ )

The model and the design matrix are not consistent. The design matrix was constructed in the presence of a mean effect and the model does not include a mean effect.

(errno $25$ )

The model and the design matrix are not consistent. The model includes a term not present in the design matrix.

Term: $⟨ v a l u e ⟩$ .

(errno $26$ )

The model and the design matrix are not consistent.

Term: $⟨ v a l u e ⟩$ .

This is likely due to the design matrix being constructed in the presence of either a mean effect or main effect that is not present in the model.

(errno $31$ )

$h x d e s c$ has not been initialized or is corrupt.

(errno $32$ )

$h x d e s c$ is not a G22 handle as generated by lm_design_matrix().

(errno $61$ )

On entry, $l i s x = ⟨ v a l u e ⟩$ and $m_{x} = ⟨ v a l u e ⟩$ .

Constraint: $l i s x = 0$ or $l i s x \geq m_{x}$ .

(errno $81$ )

On entry, $l p l a b = ⟨ v a l u e ⟩$ and $p = ⟨ v a l u e ⟩$ .

Constraint: $l p l a b = 0$ or $l p l a b \geq p$ .

(errno $91$ )

On entry, $p l a b$ is too short to hold the parameter labels. Long labels will be truncated.

The longest parameter label is $⟨ v a l u e ⟩$ .

(errno $101$ )

On entry, $l v i n f o$ is too small.

$l v i n f o = ⟨ v a l u e ⟩$ .

Constraint: $l v i n f o = 0$ or $l v i n f o \geq ⟨ v a l u e ⟩$ .

Warns

NagAlgorithmicWarning

(errno $27$ )

The model and the design matrix are not consistent. The model specifies different contrasts to those used when the design matrix was constructed. The contrasts specified in $h f o r m$ will be ignored.

(errno $28$ )

The model may not be as expected.

This is due to the model not containing the categorical variable adjusted to account for no mean effect when the design matrix was constructed.

(errno $33$ )

$h x d e s c$ has not passed through the model fitting function.

(errno $102$ )

On entry, one or more of $l i s x$ , $l p l a b$ or $l v i n f o$ are nonzero, but too small.

Minimum values are zero, or $⟨ v a l u e ⟩$ , $⟨ v a l u e ⟩$ and $⟨ v a l u e ⟩$ respectively.

The minimum values are returned in the first three elements of $v i n f o$ .

Notes

lm_submodel is a utility function for use with lm_formula(), lm_describe_data() and lm_design_matrix(). It can be used to construct labels for the columns for an $n \times m_{x}$ design matrix, $X$ , created by lm_design_matrix() and return additional input vectors and flags required by a number of NAG Library model fitting functions.

Many of the analysis functions that require a design matrix to be supplied allow submodels to be defined through the use of a vector of ones or zeros indicating whether a column of $X$ should be included or excluded from the analyses (see for example $isx$ in correg.linregm_fit or correg.glm_normal). This allows nested models to be fit without having to reconstructed the design matrix for each analysis.

Let $M$ denote a model constructed by lm_formula(), $D$ a data matrix as described by lm_describe_data() and $X$ be the corresponding design matrix constructed by lm_design_matrix() from $M$ and $D$ . A different model, $M_{S}$ is a submodel of $M$ if each term in $M_{S}$ , including the mean effect (intercept term) is also present in $M$ .

If $M_{S}$ is a submodel of $M$ , you can fit $M_{S}$ to $D$ using a design matrix whose columns are a subset of the columns of $X$ .

NAG and Python

Return to Front

naginterfaces.library.blgm.lm_submodel¶

naginterfaces.library.blgm.lm_​submodel¶

naginterfaces.library.blgm.lm_submodel¶