Integer, Intent (In)	::	nrndm, n, lddat, sddat, lrcomm, licomm
Integer, Intent (Inout)	::	ifail
Integer, Intent (Out)	::	fnlsv, nff, rnlsv, nrf, nvpr, icomm(licomm)
Real (Kind=nag_wp), Intent (In)	::	y(n), wt(*), dat(lddat,sddat)
Real (Kind=nag_wp), Intent (Out)	::	rcomm(lrcomm)
Character (1), Intent (In)	::	weight
Type (c_ptr), Intent (In)	::	hddesc, hfixed, hrndm(nrndm)
Type (c_ptr), Intent (Inout)	::	hlmm

C Header Interface

#include <nag.h>

void

g02jff_ (void **hlmm, void **hddesc, void **hfixed, const Integer *nrndm, void ***hrndm[], const char *weight, const Integer *n, const double y[], const double wt[], const double dat[], const Integer *lddat, const Integer *sddat, Integer *fnlsv, Integer *nff, Integer *rnlsv, Integer *nrf, Integer *nvpr, double rcomm[], const Integer *lrcomm, Integer icomm[], const Integer *licomm, Integer *ifail, const Charlen length_weight)

The routine may be called by the names g02jff or nagf_correg_lmm_init.

3 Description

g02jff must be called prior to fitting a linear mixed effects regression model via g02jhf.

The model is of the form:

y = X β + Z ν + ε

where	$y$ is a vector of $n$ observations on the dependent variable,
	$X$ is an $n \times p$ design matrix of fixed independent variables,
	$β$ is a vector of $p$ unknown fixed effects,
	$Z$ is an $n \times q$ design matrix of random independent variables,
	$ν$ is a vector of length $q$ of unknown random effects,
	$ε$ is a vector of length $n$ of unknown random errors.

Both

ν

and

ε

are assumed to have a Gaussian distribution with expectation zero and variance/covariance matrix defined by

Var [\begin{matrix} ν \\ ε \end{matrix}] = [\begin{matrix} G & 0 \\ 0 & R \end{matrix}]

where

R = σ_{R}^{2} I

I

is the

n \times n

identity matrix and

G

is a diagonal matrix. It is assumed that the random variables,

Z

, can be subdivided into

g \leq q

groups with each group being identically distributed with expectation zero and variance

σ_{i}^{2}

. The diagonal elements of matrix

G

, therefore, take one of the values

{σ_{i}^{2} : i = 1, 2, \dots, g}

, depending on which group the associated random variable belongs to.

The model, therefore, contains three sets of unknowns: the fixed effects

β

, the random effects

ν

and a vector of

g + 1

variance components

γ

, where

γ = {σ_{1}^{2}, σ_{2}^{2}, \dots, σ_{g - 1}^{2}, σ_{g}^{2}, σ_{R}^{2}}

Case weights can be incorporated into the model by replacing

X

and

Z

with

W_{c}^{1 / 2} X

and

W_{c}^{1 / 2} Z

respectively where

W_{c}

is a diagonal weight matrix.

The design matrices,

X

and

Z

, are constructed from an

n \times m_{d}

data matrix,

D

, a description of the fixed independent variables,

M_{f}

, and a description of the random independent variables,

M_{r}

. See Section 11 for further details.

4 References

Rao C R (1972) Estimation of variance and covariance components in a linear model J. Am. Stat. Assoc. 67 112–115

Wolfinger R, Tobias R and Sall J (1994) Computing Gaussian likelihoods and their derivatives for general linear mixed models SIAM Sci. Statist. Comput. 15 1294–1310

5 Arguments

1: $hlmm$ – Type (c_ptr) Input/Output

On entry: must be set to c_null_ptr or, alternatively, an existing G22 handle may be supplied in which case g02jff will destroy the supplied G22 handle as if g22zaf had been called.

On exit: holds a G22 handle to the internal data structure containing a description of the model. You must not change the G22 handle other than through the routines in Chapters G02 or G22.

2: $hddesc$ – Type (c_ptr) Input

On entry: a G22 handle to the internal data structure containing a description of the data matrix,

D

, as returned in hddesc by g22ybf.

3: $hfixed$ – Type (c_ptr) Input

On entry: a G22 handle to the internal data structure containing a description of the fixed part of the model

M_{f}

as returned in hform by g22yaf.

If hfixed is c_null_ptr then the model is assumed to not have a fixed part.

4: $nrndm$ – Integer Input

On entry: the number of elements used to describe the random part of the model.

Constraint:

nrndm \geq 0

5: $hrndm (nrndm)$ – Type (c_ptr) Input

On entry: a series of G22 handles to internal data structures containing a description of the random part of the model

M_{r}

as returned in hform by g22yaf.

6: $weight$ – Character(1) Input

On entry: indicates if weights are to be used.

$weight ='U'$: No weights are used.
$weight ='W'$: Case weights are used and must be supplied in array wt.

Constraint:

weight ='U'

'W'

7: $n$ – Integer Input

On entry:

n

, the number of observations in the dataset,

D

Constraint:

1 \leq n \leq n_{d}

, where

n_{d}

is the value supplied in nobs when hddesc was created.

8: $y (n)$ – Real (Kind=nag_wp) array Input

On entry:

y

, the vector of observations on the dependent variable.

Constraint:

y (i) \neq 0.0

for at least one

i = 1, 2, \dots, n

9: $wt (*)$ – Real (Kind=nag_wp) array Input

Note: the dimension of the array wt must be at least

n

weight ='W'

On entry: if

weight ='W'

, wt must contain the diagonal elements of the weight matrix

W_{c}

wt (i) = 0.0

, the

i

th observation is not included in the model and the effective number of observations is the number of observations with nonzero weights.

weight ='U'

, wt is not referenced and the effective number of observations is

n

Constraint: if

weight ='W'

wt (i) \geq 0.0

, for

i = 1, 2, \dots, n

10: $dat (lddat, sddat)$ – Real (Kind=nag_wp) array Input

On entry: the data matrix,

D

. By default,

D_{i j}

, the

i

th value for the

j

th variable, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m_{d}

, should be supplied in

dat (i, j)

If the optional parameter Storage Order, described in g22ybf, is set to

VAROBS

D_{i j}

should be supplied in

dat (j, i)

If either

y_{i}

w_{i}

D_{i j}

, for a variable

j

used in the model, is NaN (Not A Number) then that value is treated as missing and the whole observation is excluded from the analysis.

11: $lddat$ – Integer Input

On entry: the first dimension of the array dat as declared in the (sub)program from which g02jff is called.

Constraints:

if the optional parameter Storage Order, described in g22ybf, is set to $VAROBS$ , $lddat \geq m_{d}$ ;
otherwise $lddat \geq n$ .

12: $sddat$ – Integer Input

On entry: the second dimension of the array dat as declared in the (sub)program from which g02jff is called.

Constraints:

if the optional parameter Storage Order, described in g22ybf, is set to $VAROBS$ , $sddat \geq n$ ;
otherwise $sddat \geq m_{d}$ .

13: $fnlsv$ – Integer Output

On exit: the number of levels for the overall subject variable in

M_{f}

. If there is no overall subject variable,

fnlsv = 1

14: $nff$ – Integer Output

On exit: the number of fixed effects estimated in each of the fnlsv subject blocks. The number of columns,

p

, in the design matrix

X

is given by

p = nff \times fnlsv

15: $rnlsv$ – Integer Output

On exit: the number of levels for the overall subject variable in

M_{r}

. If there is no overall subject variable,

rnlsv = 1

16: $nrf$ – Integer Output

On exit: the number of random effects estimated in each of the rnlsv subject blocks. The number of columns,

q

, in the design matrix

Z

is given by

q = nrf \times rnlsv

17: $nvpr$ – Integer Output

On exit:

g

, the number of variance components being estimated (excluding the overall variance,

σ_{R}^{2}

). This is defined by the number of terms in the random part of the model,

M_{r}

(see Section 11 for details).

18: $rcomm (lrcomm)$ – Real (Kind=nag_wp) array Communication Array

On exit: a communication array as required by the routines g02jgf or g02jhf.

19: $lrcomm$ – Integer Input

On entry: the dimension of the array rcomm as declared in the (sub)program from which g02jff is called.

20: $icomm (licomm)$ – Integer array Communication Array

On exit: a communication array as required by the routines g02jgf or g02jhf.

If licomm or lrcomm are too small and

licomm \geq 2

, then

ifail = 192

and

icomm (1)

holds the minimum required value for licomm and

icomm (2)

holds the minimum required value for lrcomm.

21: $licomm$ – Integer Input

On entry: the dimension of the array icomm as declared in the (sub)program from which g02jff is called.

22: $ifail$ – Integer Input/Output

On entry: ifail must be set to

0

−1

1

to set behaviour on detection of an error; these values have no effect when no error is detected.

A value of

0

causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of

−1

means that an error message is printed while a value of

1

means that it is not.

If halting is not appropriate, the value

−1

1

is recommended. If message printing is undesirable, then the value

1

is recommended. Otherwise, the value

0

is recommended. When the value $- 1$ or $1$ is used it is essential to test the value of ifail on exit.

On exit:

ifail = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry

ifail = 0

−1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Errors or warnings detected by the routine:

$ifail = 11$: On entry, hlmm is not c_null_ptr or a recognised G22 handle.

$ifail = 21$: hddesc has not been initialized or is corrupt.

$ifail = 22$: hddesc is not a G22 handle as generated by g22ybf.

$ifail = 31$: hfixed has not been initialized or is corrupt.

$ifail = 32$: hfixed is not a G22 handle as generated by g22yaf.

$ifail = 33$: A variable name used when creating hfixed is not present in hddesc.
Variable name: $⟨ value ⟩$ .

$ifail = 34$: The fixed part of the model contains categorical variables, but no intercept or main effects terms have been requested.

$ifail = 41$: On entry, $nrndm = ⟨ value ⟩$ .
Constraint: $nrndm \geq 0$ .

$ifail = 51$: $i = ⟨ value ⟩$ .
$hrndm (i)$ has not been initialized or is corrupt.

$ifail = 52$: $i = ⟨ value ⟩$ .
$hrndm (i)$ is not a G22 handle as generated by g22yaf.

$ifail = 53$: No model has been specified.

$ifail = 54$: A variable name used when creating hrndm is not present in hddesc.
Variable name: $⟨ value ⟩$ .

$ifail = 61$: On entry, weight had an illegal value.
Constraint: $weight ='U'$ or $'W'$ .

$ifail = 71$: On entry, $n = ⟨ value ⟩$ .
Constraint: $n \geq 1$ .

$ifail = 72$: On entry, $n = ⟨ value ⟩$ and $n_{d} = ⟨ value ⟩$ .
Constraint: $n \leq n_{d}$ , where $n_{d}$ is the value supplied in nobs when hddesc was created.

$ifail = 73$: On entry, no observations due to zero weights or missing values.

$ifail = 91$: On entry, $i = ⟨ value ⟩$ and $wt (i) = ⟨ value ⟩$ .
Constraint: $wt (i) \geq 0.0$ .

$ifail = 101$: On entry, column $j$ of the data matrix, $D$ , is not consistent with information supplied in hddesc, $j = ⟨ value ⟩$ .

$ifail = 102$: Column $j$ of the data matrix, $D$ , required rounding more than expected when being treated as a categorical variable, $j = ⟨ value ⟩$ .
All output is returned using the rounded value(s).

$ifail = 111$: On entry, $n = ⟨ value ⟩$ and $lddat = ⟨ value ⟩$ .
Constraint: $lddat \geq n$ .

$ifail = 112$: On entry, $m_{d} = ⟨ value ⟩$ and $lddat = ⟨ value ⟩$ .
Constraint: $lddat \geq m_{d}$ .

$ifail = 121$: On entry, $m_{d} = ⟨ value ⟩$ and $sddat = ⟨ value ⟩$ .
Constraint: $sddat \geq m_{d}$ .

$ifail = 122$: On entry, $n = ⟨ value ⟩$ and $sddat = ⟨ value ⟩$ .
Constraint: $sddat \geq n$ .

$ifail = 191$: On entry, $licomm = ⟨ value ⟩$ and $lrcomm = ⟨ value ⟩$ .
Constraint: $licomm \geq ⟨ value ⟩$ and $lrcomm \geq ⟨ value ⟩$ . The minimum array sizes for licomm and lrcomm are held in the first two elements of icomm repectively.

$ifail = 192$: On entry, $licomm = ⟨ value ⟩$ and $lrcomm = ⟨ value ⟩$ .
Constraint: $licomm \geq ⟨ value ⟩$ and $lrcomm \geq ⟨ value ⟩$ . icomm is not large enough to hold the minimum array sizes.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

Not applicable.

8 Parallelism and Performance

Background information to multithreading can be found in the Multithreading documentation.

g02jff makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

None.

10 Example

This example fits a random effects model with three random submodels and two fixed effects to a simulated dataset with

90

observations and

12

variables. The model is fit using maximum likelihood (ML). Standard labels for the parameter estimates and variance components are obtained from g22ydf. See g02jhf for an example of how to construct custom labels.

11 Algorithmic Details

11.1 Fixed Effects Design Matrix, $X$

The fixed effects design matrix,

X

, is constructed from the data matrix

D

and

M_{f}

, as encoded in hfixed. Details of the construction are described in Section 3 in g22yaf and Section 3 in g22ycf.

It is possible to store the cross-product matrix,

X^{T} X

in a block diagonal form if

M_{f}

contains an overall subject effect,

S_{f}

. In this context

S_{f}

is defined as a main effect or interaction term that is contained in all other terms. For example, if

M_{f}

simplifies to

V_{1} . V_{4} + V_{1} . V_{2} . V_{4} + V_{1} . V_{2} . V_{3} . V_{4}

, then

S_{f} = V_{1} . V_{4}

. If it is advantageous to do so, g02jff will make use of this block diagonal structure and fnlsv will be set to the number of levels in

S_{f}

, otherwise

fnlsv = 1

11.2 Random Effects Design Matrix, $Z$

The random effects design matrix,

Z

, is constructed from the data matrix

D

and

M_{r}

which is made up of nrndm submodels,

M_{r i}

, where

M_{r i}

is encoded in

hrndm (i)

. Each submodel is made up of two parts, the random effects and a subject term. The random effects are specified as described in Section 3 in g22yaf and the subject term is specified via the g22yaf optional parameter Subject. The design matrix

Z

is constructed as described in Section 3 in g22ycf using a model constructed from the nrndm submodels. As an example, if there were

3

submodels:

```
-1+V07+V08+V09 / SUBJECT = V13
```
```
-1+V05+V06     / SUBJECT = V11.V12
```
```
V03+V04        / SUBJECT = V10.V11.V12
```

then

Z

would be constructed as if g22ycf was called using the model

(V07+V08+V09).V13 + (V05+V06).V11.V12 + (V10.V11.V12 + (V03+V04).V10.V11.V12)

It should be noted that unless specified otherwise (by the inclusion of -1) a submodel will contain an intercept. This results in a term corresponding to the subject term being included in the combined model (V10.V11.V12 in this instance).

The above model expands out further to:

V07.V13 + V08.V13 + V09.V13 + V05.V11.V12 + V06.V11.V12 + V10.V11.V12 + V03.V10.V11.V12 + V04.V10.V11.V12

Each term in the expanded model corresponds to a variance component, so in this case,

g = 8

When constructing

Z

all contrast information specified when the submodels are constructed in calls to g22yaf is ignored and dummy variables are used throughout.

It is possible to store the cross-product matrix,

Z^{T} Z

in a block diagonal form if

M_{r}

contains an overall subject effect,

S_{r}

. In this context

S_{r}

is defined as a main effect or interaction term that is contained in all other subject terms. For example, if the random effects model is constructed from

3

submodels with subject terms

V_{1} . V_{4}

V_{1} . V_{2} . V_{4}

and

V_{1} . V_{2} . V_{3} . V_{4}

, then

S_{r} = V_{1} . V_{4}

and rnlsv will be set to the number of levels in

S_{r}

, otherwise

rnlsv = 1

12 Optional Parameters

As well as the optional parameters common to all G22 handles described in g22zmf and g22znf, a number of additional optional parameters can be specified for a G22 handle holding the description of a linear mixed model, as returned by g02jff in hlmm.

Each writeable optional parameter has an associated default value; to set any of them to a non-default value, use g22zmf. The value of any optional parameter can be queried using g22znf.

Most of the optional parameters described in this section are related to the behaviour g02jhf when fitting the model. These descriptions should, therefore, be read in conjunction with the documentation for that routine.

The remainder of this section can be skipped if you wish to use the default values for all optional parameters.

The following is a list of the optional parameters available. A full description of each optional parameter is provided in Section 12.1.

12.1 Description of the Optional Parameters

For each option, we give a summary line, a description of the optional parameter and details of constraints.

The summary line contains:

a parameter value, where the letters $a$ , $i$ and $r$ denote options that take character, integer and real values respectively;
the default value.

Keywords and character values are case and white space insensitive.

Gamma Lower Bound

r

Default

= \sqrt{machine precision} / 100

A lower bound for the elements of

γ^{*}

, where

γ^{*} = γ / σ_{R}^{2}

Gamma Upper Bound

r

Default

= 10^{20}

An upper bound for the elements of

γ^{*}

, where

γ^{*} = γ / σ_{R}^{2}

Initial Distance

r

Default

= 100000.0

The initial distance from the solution.

When $Solver = E04LB$ , g02jhf passes Initial Distance to the solver as stepmx.
When $Solver = E04UC$ , this option is ignored.

Initial Value Strategy

i

Default

= special

Controls how g02jhf will choose the initial values for the variance components,

γ

, if not supplied.

$Initial Value Strategy = 0$: The MIVQUE0 estimates of the variance components based on the likelihood specified by Likelihood are used.
$Initial Value Strategy = 1$: The MIVQUE0 estimates based on the maximum likelihood are used, irrespective of the value of Likelihood.

See Rao (1972) for a description of the minimum variance quadratic unbiased estimators (MIVQUE0).

By default, for small problems,

Initial Value Strategy = 0

and for large problems

Initial Value Strategy = 1

Constraint:

Initial Value Strategy = 0

1

Likelihood

a

Default

= REML

Likelihood defines whether g02jhf will use the restricted maximum likelihood (REML) or the maximum likelihood (ML) when fitting the model.

Constraint:

Likelihood = REML

ML

Linear Minimization Accuracy

r

Default

= 0.9

The accuracy of the linear minimizations.

When $Solver = E04LB$ , g02jhf passes Linear Minimization Accuracy to the solver as eta.
When $Solver = E04UC$ , this option is ignored.

Line Search Tolerance

r

Default

= 0.9

The line search tolerance.

When $Solver = E04LB$ , this option is ignored.
When $Solver = E04UC$ , g02jhf passes Line Search Tolerance to the solver as Line Search Tolerance.

List

NoList

Default

Optional parameter List enables printing of each optional parameter specification as it is supplied. NoList suppresses this printing.

Major Iteration Limit

i

Default

= special

The number of major iterations.

When $Solver = E04LB$ , g02jhf passes Major Iteration Limit to the solver as maxcal. In this case, the default value used is $1000$ .
When $Solver = E04UC$ , g02jhf passes Major Iteration Limit to the solver as Major Iteration Limit. In this case, the default value used is $\max (50, 3 \times g)$ , where $g$ is the number of variance components being estimated (excluding the overall variance, $σ_{R}^{2}$ ).

Major Print Level

i

Default

= special

The frequency that monitoring information is output to Unit Number.

When $Solver = E04LB$ , g02jhf passes Major Print Level to the solver as iprint. In this case, the default value used is $- 1$ and hence no monitoring information will be output.
When $Solver = E04UC$ , g02jhf passes Major Print Level to the solver as Major Print Level. In this case, the default value used is $0$ and hence no monitoring information will be output.

Maximum Number of Threads

i

Default

= special

Controls the maximum number of threads used by g02jhf in a multithreaded library. By default, the maximum number of available threads are used.

In a library that is not multithreaded, this option has no effect.

Constraint:

Maximum Number of Threads \geq 0

Minor Iteration Limit

i

Default

= \max (50, 3 \times g)

The number of minor iterations.

When $Solver = E04LB$ , this option is ignored.
When $Solver = E04UC$ , g02jhf passes Minor Iteration Limit to the solver as Minor Iteration Limit. In this case, the default value used is $\max (50, 3 \times g)$ , where $g$ is the number of variance components being estimated (excluding the overall variance, $σ_{R}^{2}$ ).

Minor Print Level

i

Default

= 0

The frequency that additional monitoring information is output to Unit Number.

When $Solver = E04LB$ , this option is ignored.
When $Solver = E04UC$ , g02jhf passes Minor Print Level to the solver as Minor Print Level. The default value of $0$ means that no additional monitoring information will be output.

Optimality Tolerance

r

Default

= {machine precision}^{0.72}

The optimality tolerance.

When $Solver = E04LB$ , this option is ignored.
When $Solver = E04UC$ , g02jhf passes Optimality Tolerance to the solver as Optimality Tolerance.

Parallelisation Strategy

i

Default

= special

Maximum Number of Threads > 0

then Parallelisation Strategy controls how g02jhf is parallelised in a multithreaded library.

$Parallelisation Strategy = 1$: g02jhf will attempt to parallelise operations involving $Z$ , even if $rnlsv = 1$ .
$Parallelisation Strategy = 2$: g02jhf will only attempt to parallelise operations involving $Z$ , if $rnlsv > 1$ .

By default,

Parallelisation Strategy = 1

, however, for some models / datasets, this may be slower than using

Parallelisation Strategy = 2

when

rnlsv = 1

In a library that is not multithreaded, this option has no effect.

Constraint:

Parallelisation Strategy = 1

2

Solution Accuracy

r

Default

= 0.0

The accuracy to which the solution is required.

When $Solver = E04LB$ , g02jhf passes Solution Accuracy to the solver as xtol.
When $Solver = E04UC$ , this option is ignored.

Solver

a

Default

= special

Controls which solver g02jhf will use when fitting the model. By default,

Solver = E04LB

is used for small problems and

Solver = E04UC

, otherwise.

Solver = E04LB

, then the solver used is the one implemented in e04lbf and if

Solver = E04UC

, then the solver used is the one implemented in e04ucf/e04uca.

Constraint:

Solver = E04LB

E04UC

Sweep Tolerance

r

Default

= special

The sweep tolerance used by g02jhf when performing the sweep operation Wolfinger et al. (1994). The default value used is

Sweep Tolerance = \max (ε, ε \times (\max_{i} {(Z^{T})}_{i i}))

, where

ε = \sqrt{machine precision}

Unit Number

i

Default

= advisory message unit number

The monitoring unit number to which g02jhf will send any monitoring information.

NAG Library Manual, Mark 28.6

Interfaces: FL CL CPP AD PY MB

NAG FL Interface Introduction

G02 (Correg) Chapter Contents

G02 (Correg) Chapter Introduction

g02jf: FL CL CPP AD PY MB

NAG FL Interfaceg02jff (lmm_​init)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

11 Algorithmic Details

11.1 Fixed Effects Design Matrix, X

11.2 Random Effects Design Matrix, Z

12 Optional Parameters

12.1 Description of the Optional Parameters

NAG FL Interface
g02jff (lmm_init)

11.1 Fixed Effects Design Matrix, $X$

11.2 Random Effects Design Matrix, $Z$