naginterfaces.library.correg.lmm_init¶

naginterfaces.library.correg.lmm_init(hlmm, hddesc, hfixed, y, dat, hrndm=None, wt=None)[source]¶

lmm_init preprocesses a dataset prior to fitting a linear mixed effects regression model via lmm_fit().

Note: this function uses optional algorithmic parameters, see also: blgm.optset, blgm.optget, lmm_fit().

For full information please refer to the NAG Library document for g02jf

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g02/g02jff.html

Parameters

hlmmHandle, modified in place

On entry: must be set to a null Handle or, alternatively, an existing G22 handle may be supplied in which case lmm_init will destroy the supplied G22 handle as if blgm.handle_free had been called.

On exit: holds a G22 handle to the internal data structure containing a description of the model. You must not change the G22 handle other than through the functions in submodule correg or submodule blgm.

hddescHandle

A G22 handle to the internal data structure containing a description of the data matrix, $D$ , as returned in $hddesc$ by blgm.lm_describe_data.

hfixedHandle

A G22 handle to the internal data structure containing a description of the fixed part of the model $M_{f}$ as returned in $hform$ by blgm.lm_formula.

If $h f i x e d$ is a null Handle then the model is assumed to not have a fixed part.

yfloat, array-like, shape $(n)$

$y$ , the vector of observations on the dependent variable.

datfloat, array-like, shape $(:, :)$

The data matrix, $D$ . By default, $D_{i j}$ , the $i$ th value for the $j$ th variable, for $j = 1, 2, \dots, m_{d}$ , for $i = 1, 2, \dots, n$ , should be supplied in $d a t [i - 1, j - 1]$ .

If the option ‘Storage Order’, described in blgm.lm_describe_data, is set to ‘VAROBS’, $D_{i j}$ should be supplied in $d a t [j - 1, i - 1]$ .

If either $y_{i}$ , $w_{i}$ or $D_{i j}$ , for a variable $j$ used in the model, is NaN (Not A Number) then that value is treated as missing and the whole observation is excluded from the analysis.

hrndmNone or Handle, list, shape $(nrndm)$ , optional

A series of G22 handles to internal data structures containing a description of the random part of the model $M_{r}$ as returned in $hform$ by blgm.lm_formula.

wtNone or float, array-like, shape $(n)$ , optional

Optionally, the diagonal elements of the weight matrix $W_{c}$ .

If $w t [i - 1] = 0.0$ , the $i$ th observation is not included in the model and the effective number of observations is the number of observations with nonzero weights.

If weights are not provided then $w t$ must be set to None, and the effective number of observations is $n$ .

Returns

fnlsvint: The number of levels for the overall subject variable in $M_{f}$ . If there is no overall subject variable, $f n l s v = 1$ .
nffint: The number of fixed effects estimated in each of the $f n l s v$ subject blocks. The number of columns, $p$ , in the design matrix $X$ is given by $p = n f f \times f n l s v$ .
rnlsvint: The number of levels for the overall subject variable in $M_{r}$ . If there is no overall subject variable, $r n l s v = 1$ .
nrfint: The number of random effects estimated in each of the $r n l s v$ subject blocks. The number of columns, $q$ , in the design matrix $Z$ is given by $q = n r f \times r n l s v$ .
nvprint: $g$ , the number of variance components being estimated (excluding the overall variance, $σ_{R}^{2}$ ). This is defined by the number of terms in the random part of the model, $M_{r}$ (see Algorithmic Details for details).
commdict, communication object: Communication structure.

Other Parameters

‘Gamma Lower Bound’float

Default $= \sqrt{machine precision} / 100$

A lower bound for the elements of $γ^{*}$ , where $γ^{*} = γ / σ_{R}^{2}$ .

‘Gamma Upper Bound’float

Default $= 10^{20}$

An upper bound for the elements of $γ^{*}$ , where $γ^{*} = γ / σ_{R}^{2}$ .

‘Initial Distance’float

Default $= 100000.0$

The initial distance from the solution.

When $‘Solver' ='E04LB'$ , lmm_fit() passes ‘Initial Distance’ to the solver as $stepmx$ .

When $‘Solver' ='E04UC'$ , this option is ignored.

‘Initial Value Strategy’int

Default $= special$

Controls how lmm_fit() will choose the initial values for the variance components, $γ$ , if not supplied.

$‘Initial Value Strategy' = 0$: The MIVQUE0 estimates of the variance components based on the likelihood specified by ‘Likelihood’ are used.
$‘Initial Value Strategy' = 1$: The MIVQUE0 estimates based on the maximum likelihood are used, irrespective of the value of ‘Likelihood’.

See Rao (1972) for a description of the minimum variance quadratic unbiased estimators (MIVQUE0).

By default, for small problems, $‘Initial Value Strategy' = 0$ and for large problems $‘Initial Value Strategy' = 1$ .

‘Likelihood’str

Default $='REML'$

‘Likelihood’ defines whether lmm_fit() will use the restricted maximum likelihood (REML) or the maximum likelihood (ML) when fitting the model.

‘Linear Minimization Accuracy’float

Default $= 0.9$

The accuracy of the linear minimizations.

When $‘Solver' ='E04LB'$ , lmm_fit() passes ‘Linear Minimization Accuracy’ to the solver as $eta$ .

When $‘Solver' ='E04UC'$ , this option is ignored.

‘Line Search Tolerance’float

Default $= 0.9$

The line search tolerance.

When $‘Solver' ='E04LB'$ , this option is ignored.

When $‘Solver' ='E04UC'$ , lmm_fit() passes ‘Line Search Tolerance’ to the solver as ‘Line Search Tolerance’.

‘List’valueless

Option ‘List’ enables printing of each option specification as it is supplied. ‘NoList’ suppresses this printing.

‘NoList’valueless

Default

Option ‘List’ enables printing of each option specification as it is supplied. ‘NoList’ suppresses this printing.

‘Major Iteration Limit’int

Default $= special$

The number of major iterations.

When $‘Solver' ='E04LB'$ , lmm_fit() passes ‘Major Iteration Limit’ to the solver as $maxcal$ . In this case, the default value used is $1000$ .

When $‘Solver' ='E04UC'$ , lmm_fit() passes ‘Major Iteration Limit’ to the solver as ‘Major Iteration Limit’. In this case, the default value used is $m a x (50, 3 \times g)$ , where $g$ is the number of variance components being estimated (excluding the overall variance, $σ_{R}^{2}$ ).

‘Major Print Level’int

Default $= special$

The frequency that monitoring information is output to ‘Unit Number’.

When $‘Solver' ='E04LB'$ , lmm_fit() passes ‘Major Print Level’ to the solver as $iprint$ . In this case, the default value used is $- 1$ and hence no monitoring information will be output.

When $‘Solver' ='E04UC'$ , lmm_fit() passes ‘Major Print Level’ to the solver as ‘Major Print Level’. In this case, the default value used is $0$ and hence no monitoring information will be output.

‘Maximum Number of Threads’int

Default $= special$

Controls the maximum number of threads used by lmm_fit() in a multithreaded library. By default, the maximum number of available threads are used.

In a library that is not multithreaded, this option has no effect.

‘Minor Iteration Limit’int

Default $= m a x (50, 3 \times g)$

The number of minor iterations.

When $‘Solver' ='E04LB'$ , this option is ignored.

When $‘Solver' ='E04UC'$ , lmm_fit() passes ‘Minor Iteration Limit’ to the solver as ‘Minor Iteration Limit’. In this case, the default value used is $m a x (50, 3 \times g)$ , where $g$ is the number of variance components being estimated (excluding the overall variance, $σ_{R}^{2}$ ).

‘Minor Print Level’int

Default $= 0$

The frequency that additional monitoring information is output to ‘Unit Number’.

When $‘Solver' ='E04LB'$ , this option is ignored.

When $‘Solver' ='E04UC'$ , lmm_fit() passes ‘Minor Print Level’ to the solver as ‘Minor Print Level’. The default value of $0$ means that no additional monitoring information will be output.

‘Optimality Tolerance’float

Default $= {machine precision}^{0.72}$

The optimality tolerance.

When $‘Solver' ='E04LB'$ , this option is ignored.

When $‘Solver' ='E04UC'$ , lmm_fit() passes ‘Optimality Tolerance’ to the solver as ‘Optimality Tolerance’.

‘Parallelisation Strategy’int

Default $= special$

If $‘Maximum Number of Threads' > 0$ then ‘Parallelisation Strategy’ controls how lmm_fit() is parallelised in a multithreaded library.

$‘Parallelisation Strategy' = 1$: lmm_fit() will attempt to parallelise operations involving $Z$ , even if $r n l s v = 1$ .
$‘Parallelisation Strategy' = 2$: lmm_fit() will only attempt to parallelise operations involving $Z$ , if $r n l s v > 1$ .

By default, $‘Parallelisation Strategy' = 1$ , however, for some models / datasets, this may be slower than using $‘Parallelisation Strategy' = 2$ when $r n l s v = 1$ .

In a library that is not multithreaded, this option has no effect.

‘Solution Accuracy’float

Default $= 0.0$

The accuracy to which the solution is required.

When $‘Solver' ='E04LB'$ , lmm_fit() passes ‘Solution Accuracy’ to the solver as $xtol$ .

When $‘Solver' ='E04UC'$ , this option is ignored.

‘Solver’str

Default $= special$

Controls which solver lmm_fit() will use when fitting the model. By default, $‘Solver' ='E04LB'$ is used for small problems and $‘Solver' ='E04UC'$ , otherwise.

If $‘Solver' ='E04LB'$ , then the solver used is the one implemented in opt.bounds_mod_deriv2_comp and if $‘Solver' ='E04UC'$ , then the solver used is the one implemented in opt.nlp1_solve.

‘Sweep Tolerance’float

Default $= special$

The sweep tolerance used by lmm_fit() when performing the sweep operation Wolfinger et al. (1994). The default value used is $‘Sweep Tolerance' = m a x (ϵ, ϵ \times ({max}_{i} ({(Z^{T})}_{i i}^{T})))$ , where $ϵ = \sqrt{machine precision}$ .

‘Unit Number’int

Default $= advisory message unit number$

The monitoring unit number to which lmm_fit() will send any monitoring information.

Raises

NagValueError

(errno $11$ )

On entry, $h l m m$ is not a null Handle or a recognised G22 handle.

(errno $21$ )

$h d d e s c$ has not been initialized or is corrupt.

(errno $22$ )

$h d d e s c$ is not a G22 handle as generated by blgm.lm_describe_data.

(errno $31$ )

$h f i x e d$ has not been initialized or is corrupt.

(errno $32$ )

$h f i x e d$ is not a G22 handle as generated by blgm.lm_formula.

(errno $33$ )

A variable name used when creating $h f i x e d$ is not present in $h d d e s c$ .

Variable name: $⟨ v a l u e ⟩$ .

(errno $41$ )

On entry, $nrndm = ⟨ v a l u e ⟩$ .

Constraint: $nrndm \geq 0$ .

(errno $51$ )

$i = ⟨ v a l u e ⟩$ .

$h r n d m [i - 1]$ has not been initialized or is corrupt.

(errno $52$ )

$i = ⟨ v a l u e ⟩$ .

$h r n d m [i - 1]$ is not a G22 handle as generated by blgm.lm_formula.

(errno $53$ )

No model has been specified.

(errno $54$ )

A variable name used when creating $h r n d m$ is not present in $h d d e s c$ .

Variable name: $⟨ v a l u e ⟩$ .

(errno $61$ )

On entry, $lwt = ⟨ v a l u e ⟩$ and $n = ⟨ v a l u e ⟩$ .

Constraint: $lwt = 0$ or $n$ .

(errno $71$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 1$ .

(errno $72$ )

On entry, $n = ⟨ v a l u e ⟩$ and $n_{d} = ⟨ v a l u e ⟩$ .

Constraint: $n \leq n_{d}$ .

(errno $73$ )

On entry, no observations due to zero weights or missing values.

(errno $91$ )

On entry, $i = ⟨ v a l u e ⟩$ and $w t [i - 1] = ⟨ v a l u e ⟩$ .

Constraint: $w t [i - 1] \geq 0.0$ .

(errno $101$ )

On entry, column $j$ of the data matrix, $D$ , is not consistent with information supplied in $h d d e s c$ , $j = ⟨ v a l u e ⟩$ .

(errno $111$ )

On entry, $n = ⟨ v a l u e ⟩$ and $lddat = ⟨ v a l u e ⟩$ .

Constraint: $lddat \geq n$ .

(errno $112$ )

On entry, $m_{d} = ⟨ v a l u e ⟩$ and $lddat = ⟨ v a l u e ⟩$ .

Constraint: $lddat \geq m_{d}$ .

(errno $121$ )

On entry, $m_{d} = ⟨ v a l u e ⟩$ and $sddat = ⟨ v a l u e ⟩$ .

Constraint: $sddat \geq m_{d}$ .

(errno $122$ )

On entry, $n = ⟨ v a l u e ⟩$ and $sddat = ⟨ v a l u e ⟩$ .

Constraint: $sddat \geq n$ .

Warns

NagAlgorithmicWarning

(errno $34$ ): The fixed part of the model contains categorical variables, but no intercept or main effects terms have been requested.
(errno $102$ ): Column $j$ of the data matrix, $D$ , required rounding more than expected when being treated as a categorical variable, $j = ⟨ v a l u e ⟩$ .

Notes

lmm_init must be called prior to fitting a linear mixed effects regression model via lmm_fit().

The model is of the form:

y = X β + Z ν + ϵ

where	$y$ is a vector of $n$ observations on the dependent variable,
	$X$ is an $n \times p$ design matrix of fixed independent variables,
	$β$ is a vector of $p$ unknown fixed effects,
	$Z$ is an $n \times q$ design matrix of random independent variables,
	$ν$ is a vector of length $q$ of unknown random effects,
	$ϵ$ is a vector of length $n$ of unknown random errors.

Both $ν$ and $ϵ$ are assumed to have a Gaussian distribution with expectation zero and variance/covariance matrix defined by

\begin{matrix} V a r [\begin{matrix} ν ϵ \end{matrix}] = [\begin{matrix} G & 0 0 & R \end{matrix}] \end{matrix}

where $R = σ_{R}^{2} I$ , $I$ is the $n \times n$ identity matrix and $G$ is a diagonal matrix. It is assumed that the random variables, $Z$ , can be subdivided into $g \leq q$ groups with each group being identically distributed with expectation zero and variance $σ_{i}^{2}$ . The diagonal elements of matrix $G$ , therefore, take one of the values ${σ_{i}^{2} : i = 1, 2, \dots, g}$ , depending on which group the associated random variable belongs to.

The model, therefore, contains three sets of unknowns: the fixed effects $β$ , the random effects $ν$ and a vector of $g + 1$ variance components $γ$ , where $γ = {σ_{1}^{2}, σ_{2}^{2}, \dots, σ_{g - 1}^{2}, σ_{g}^{2}, σ_{R}^{2}}$ .

Case weights can be incorporated into the model by replacing $X$ and $Z$ with $W_{c}^{1 / 2} X$ and $W_{c}^{1 / 2} Z$ respectively where $W_{c}$ is a diagonal weight matrix.

The design matrices, $X$ and $Z$ , are constructed from an $n \times m_{d}$ data matrix, $D$ , a description of the fixed independent variables, $M_{f}$ , and a description of the random independent variables, $M_{r}$ . See Algorithmic Details for further details.

References

Rao, C R, 1972, Estimation of variance and covariance components in a linear model, J. Am. Stat. Assoc. (67), 112–115

Wolfinger, R, Tobias, R and Sall, J, 1994, Computing Gaussian likelihoods and their derivatives for general linear mixed models, SIAM Sci. Statist. Comput. (15), 1294–1310

NAG and Python

Return to Front

naginterfaces.library.correg.lmm_init¶

naginterfaces.library.correg.lmm_​init¶

naginterfaces.library.correg.lmm_init¶