NAG FL Interface
g02jff (lmm_init)
1
Purpose
g02jff preprocesses a dataset prior to fitting a linear mixed effects regression model via
g02jhf.
2
Specification
Fortran Interface
Subroutine g02jff ( |
hlmm, hddesc, hfixed, nrndm, hrndm, weight, n, y, wt, dat, lddat, sddat, fnlsv, nff, rnlsv, nrf, nvpr, rcomm, lrcomm, icomm, licomm, ifail) |
Integer, Intent (In) |
:: |
nrndm, n, lddat, sddat, lrcomm, licomm |
Integer, Intent (Inout) |
:: |
ifail |
Integer, Intent (Out) |
:: |
fnlsv, nff, rnlsv, nrf, nvpr, icomm(licomm) |
Real (Kind=nag_wp), Intent (In) |
:: |
y(n), wt(*), dat(lddat,sddat) |
Real (Kind=nag_wp), Intent (Out) |
:: |
rcomm(lrcomm) |
Character (1), Intent (In) |
:: |
weight |
Type (c_ptr), Intent (In) |
:: |
hddesc, hfixed, hrndm(nrndm) |
Type (c_ptr), Intent (Inout) |
:: |
hlmm |
|
C Header Interface
#include <nag.h>
void |
g02jff_ (void **hlmm, void **hddesc, void **hfixed, const Integer *nrndm, void ***hrndm[], const char *weight, const Integer *n, const double y[], const double wt[], const double dat[], const Integer *lddat, const Integer *sddat, Integer *fnlsv, Integer *nff, Integer *rnlsv, Integer *nrf, Integer *nvpr, double rcomm[], const Integer *lrcomm, Integer icomm[], const Integer *licomm, Integer *ifail, const Charlen length_weight) |
|
C++ Header Interface
#include <nag.h> extern "C" {
void |
g02jff_ (void *&hlmm, void *&hddesc, void *&hfixed, const Integer &nrndm, void **&hrndm[], const char *weight, const Integer &n, const double y[], const double wt[], const double dat[], const Integer &lddat, const Integer &sddat, Integer &fnlsv, Integer &nff, Integer &rnlsv, Integer &nrf, Integer &nvpr, double rcomm[], const Integer &lrcomm, Integer icomm[], const Integer &licomm, Integer &ifail, const Charlen length_weight) |
}
|
The routine may be called by the names g02jff or nagf_correg_lmm_init.
3
Description
g02jff must be called prior to fitting a linear mixed effects regression model via
g02jhf.
The model is of the form:
where |
is a vector of observations on the dependent variable, |
|
is an by design matrix of fixed independent variables, |
|
is a vector of unknown fixed effects, |
|
is an by design matrix of random independent variables, |
|
is a vector of length of unknown random effects, |
|
is a vector of length of unknown random errors. |
Both
and
are assumed to have a Gaussian distribution with expectation zero and variance/covariance matrix defined by
where
,
is the
identity matrix and
is a diagonal matrix. It is assumed that the random variables,
, can be subdivided into
groups with each group being identically distributed with expectation zero and variance
. The diagonal elements of matrix
therefore take one of the values
, depending on which group the associated random variable belongs to.
The model therefore contains three sets of unknowns: the fixed effects , the random effects and a vector of variance components , where
.
Case weights can be incorporated into the model by replacing and with and respectively where is a diagonal weight matrix.
The design matrices,
and
, are constructed from an
data matrix,
, a description of the fixed independent variables,
, and a description of the random independent variables,
. See
Section 11 for further details.
4
References
Rao C R (1972) Estimation of variance and covariance components in a linear model J. Am. Stat. Assoc. 67 112–115
Wolfinger R, Tobias R and Sall J (1994) Computing Gaussian likelihoods and their derivatives for general linear mixed models SIAM Sci. Statist. Comput. 15 1294–1310
5
Arguments
-
1:
– Type (c_ptr)
Input/Output
-
On entry: must be set to
c_null_ptr or, alternatively, an existing G22 handle may be supplied in which case
g02jff will destroy the supplied G22 handle as if
g22zaf had been called.
On exit: holds a G22 handle to the internal data structure containing a description of the model. You
must not change the G22 handle other than through the routines in
Chapters G02 or
G22.
-
2:
– Type (c_ptr)
Input
-
On entry: a G22 handle to the internal data structure containing a description of the data matrix,
, as returned in
hddesc by
g22ybf.
-
3:
– Type (c_ptr)
Input
-
On entry: a G22 handle to the internal data structure containing a description of the fixed part of the model
as returned in
hform by
g22yaf.
If
hfixed is
c_null_ptr then the model is assumed to not have a fixed part.
-
4:
– Integer
Input
-
On entry: the number of elements used to describe the random part of the model.
Constraint:
.
-
5:
– Type (c_ptr)
Input
-
On entry: a series of G22 handles to internal data structures containing a description of the random part of the model
as returned in
hform by
g22yaf.
-
6:
– Character(1)
Input
-
On entry: indicates if weights are to be used.
- No weights are used.
- Case weights are used and must be supplied in array wt.
Constraint:
or .
-
7:
– Integer
Input
-
On entry: , the number of observations in the dataset, .
Constraint:
, where
is the value supplied in
nobs when
hddesc was created.
-
8:
– Real (Kind=nag_wp) array
Input
-
On entry: , the vector of observations on the dependent variable.
Constraint:
for at least one .
-
9:
– Real (Kind=nag_wp) array
Input
-
Note: the dimension of the array
wt
must be at least
if
.
On entry: if
,
wt must contain the diagonal elements of the weight matrix
.
If , the th observation is not included in the model and the effective number of observations is the number of observations with nonzero weights.
If
,
wt is not referenced and the effective number of observations is
.
Constraint:
if , , for .
-
10:
– Real (Kind=nag_wp) array
Input
-
On entry: the data matrix,
. By default,
, the
th value for the
th variable, for
and
, should be supplied in
.
If the optional parameter
Storage Order, described in
g22ybf, is set to
,
should be supplied in
.
If either , or , for a variable used in the model, is NaN (Not A Number) then that value is treated as missing and the whole observation is excluded from the analysis.
-
11:
– Integer
Input
-
On entry: the first dimension of the array
dat as declared in the (sub)program from which
g02jff is called.
Constraints:
- if the optional parameter Storage Order, described in g22ybf, is set to , ;
- otherwise .
-
12:
– Integer
Input
-
On entry: the second dimension of the array
dat as declared in the (sub)program from which
g02jff is called.
Constraints:
- if the optional parameter Storage Order, described in g22ybf, is set to , ;
- otherwise .
-
13:
– Integer
Output
-
On exit: the number of levels for the overall subject variable in . If there is no overall subject variable, .
-
14:
– Integer
Output
-
On exit: the number of fixed effects estimated in each of the
fnlsv subject blocks. The number of columns,
, in the design matrix
is given by
.
-
15:
– Integer
Output
-
On exit: the number of levels for the overall subject variable in . If there is no overall subject variable, .
-
16:
– Integer
Output
-
On exit: the number of random effects estimated in each of the
rnlsv subject blocks. The number of columns,
, in the design matrix
is given by
.
-
17:
– Integer
Output
-
On exit:
, the number of variance components being estimated (excluding the overall variance,
). This is defined by the number of terms in the random part of the model,
(see
Section 11 for details).
-
18:
– Real (Kind=nag_wp) array
Communication Array
-
On exit: a communication array as required by the routines
g02jgf or
g02jhf.
-
19:
– Integer
Input
-
On entry: the dimension of the array
rcomm as declared in the (sub)program from which
g02jff is called.
-
20:
– Integer array
Communication Array
-
On exit: a communication array as required by the routines
g02jgf or
g02jhf.
If
licomm or
lrcomm are too small and
, then
and
holds the minimum required value for
licomm and
holds the minimum required value for
lrcomm.
-
21:
– Integer
Input
-
On entry: the dimension of the array
icomm as declared in the (sub)program from which
g02jff is called.
-
22:
– Integer
Input/Output
-
On entry:
ifail must be set to
,
or
to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of means that an error message is printed while a value of means that it is not.
If halting is not appropriate, the value
or
is recommended. If message printing is undesirable, then the value
is recommended. Otherwise, the value
is recommended.
When the value or is used it is essential to test the value of ifail on exit.
On exit:
unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
or
, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
-
On entry,
hlmm is not
c_null_ptr or a recognised G22 handle.
-
hddesc has not been initialized or is corrupt.
-
hddesc is not a G22 handle as generated by
g22ybf.
-
hfixed has not been initialized or is corrupt.
-
hfixed is not a G22 handle as generated by
g22yaf.
-
A variable name used when creating
hfixed is not present in
hddesc.
Variable name:
.
-
The fixed part of the model contains categorical variables, but no intercept or main effects terms have been requested.
-
On entry, .
Constraint: .
-
.
has not been initialized or is corrupt.
-
.
is not a G22 handle as generated by
g22yaf.
-
No model has been specified.
-
A variable name used when creating
hrndm is not present in
hddesc.
Variable name:
.
-
On entry,
weight had an illegal value.
Constraint:
or
.
-
On entry, .
Constraint: .
-
On entry,
and
.
Constraint:
, where
is the value supplied in
nobs when
hddesc was created.
-
On entry, no observations due to zero weights or missing values.
-
On entry, and .
Constraint: .
-
On entry, column
of the data matrix,
, is not consistent with information supplied in
hddesc,
.
-
Column of the data matrix, , required rounding more than expected when being treated as a categorical variable, .
All output is returned using the rounded value(s).
-
On entry, and .
Constraint: .
-
On entry, and .
Constraint: .
-
On entry, and .
Constraint: .
-
On entry, and .
Constraint: .
-
On entry,
and
.
Constraint:
and
. The minimum array sizes for
licomm and
lrcomm are held in the first two elements of
icomm repectively.
-
On entry,
and
.
Constraint:
and
.
icomm is not large enough to hold the minimum array sizes.
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 7 in the Introduction to the NAG Library FL Interface for further information.
Your licence key may have expired or may not have been installed correctly.
See
Section 8 in the Introduction to the NAG Library FL Interface for further information.
Dynamic memory allocation failed.
See
Section 9 in the Introduction to the NAG Library FL Interface for further information.
7
Accuracy
Not applicable.
8
Parallelism and Performance
g02jff makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementation-specific information.
None.
10
Example
This example fits a random effects model with three random submodels and two fixed effects to a simulated dataset with
observations and
variables. The model is fit using maximum likelihood (ML). Standard labels for the parameter estimates and variance components are obtained from
g22ydf. See
Section 10 in
g02jhf for an example of how to construct custom labels.
10.1
Program Text
10.2
Program Data
10.3
Program Results
11
Algorithmic Details
11.1
Fixed Effects Design Matrix,
The fixed effects design matrix,
, is constructed from the data matrix
and
, as encoded in
hfixed. Details of the construction are described in
Section 3 in
g22yaf and
Section 3 in
g22ycf.
It is possible to store the cross-product matrix,
in a block diagonal form if
contains an overall subject effect,
. In this context
is defined as a main effect or interaction term that is contained in all other terms. For example, if
simplifies to
, then
. If it is advantageous to do so,
g02jff will make use of this block diagonal structure and
fnlsv will be set to the number of levels in
, otherwise
.
11.2
Random Effects Design Matrix,
The random effects design matrix,
, is constructed from the data matrix
and
which is made up of
nrndm submodels,
, where
is encoded in
. Each submodel is made up of two parts, the random effects and a subject term. The random effects are specified as described in
Section 3 in
g22yaf and the subject term is specified via the
g22yaf optional parameter
Subject. The design matrix
is constructed as described in
Section 3 in
g22ycf using a model constructed from the
nrndm submodels. As an example, if there were
submodels:
-1+V07+V08+V09 / SUBJECT = V13
-1+V05+V06 / SUBJECT = V11.V12
V03+V04 / SUBJECT = V10.V11.V12
then
would be constructed as if
g22ycf was called using the model
It should be noted that unless specified otherwise (by the inclusion of
-1) a submodel will contain an intercept. This results in a term corresponding to the subject term being included in the combined model (
V10.V11.V12 in this instance).
The above model expands out further to:
Each term in the expanded model corresponds to a variance component, so in this case,
.
When constructing
all contrast information specified when the submodels are constructed in calls to
g22yaf is ignored and dummy variables are used throughout.
It is possible to store the cross-product matrix,
in a block diagonal form if
contains an overall subject effect,
. In this context
is defined as a main effect or interaction term that is contained in all other subject terms. For example, if the random effects model is constructed from
submodels with subject terms
,
and
, then
and
rnlsv will be set to the number of levels in
, otherwise
.
12
Optional Parameters
As well as the optional parameters common to all G22 handles described in
g22zmf and
g22znf, a number of additional optional parameters can be specified for a G22 handle holding the description of a linear mixed model, as returned by
g02jff in
hlmm.
Each writeable optional parameter has an associated default value; to set any of them to a non-default value, use
g22zmf. The value of any optional parameter can be queried using
g22znf.
Most of the optional parameters described in this section are related to the behaviour
g02jhf when fitting the model. These descriptions should therefore be read in conjunction with the documentation for that routine.
The remainder of this section can be skipped if you wish to use the default values for all optional parameters.
The following is a list of the optional parameters available. A full description of each optional parameter is provided in
Section 12.1.
12.1
Description of the Optional Parameters
For each option, we give a summary line, a description of the optional parameter and details of constraints.
The summary line contains:
- a parameter value,
where the letters , and denote options that take character, integer and real values respectively;
- the default value.
Keywords and character values are case and white space insensitive.
Gamma Lower Bound | | Default |
A lower bound for the elements of , where .
Gamma Upper Bound | | Default |
An upper bound for the elements of , where .
Initial Distance | | Default |
The initial distance from the solution.
Initial Value Strategy | | Default |
Controls how
g02jhf will choose the initial values for the variance components,
, if not supplied.
- The MIVQUE0 estimates of the variance components based on the likelihood specified by Likelihood are used.
- The MIVQUE0 estimates based on the maximum likelihood are used, irrespective of the value of Likelihood.
See
Rao (1972) for a description of the minimum variance quadratic unbiased estimators (MIVQUE0).
By default, for small problems, and for large problems .
Constraint:
or .
Likelihood defines whether
g02jhf will use the restricted maximum likelihood (REML) or the maximum likelihood (ML) when fitting the model.
Constraint:
or .
Linear Minimization Accuracy | | Default |
The accuracy of the linear minimizations.
Line Search Tolerance | | Default |
The line search tolerance.
Optional parameter
List enables printing of each optional parameter specification as it is supplied.
NoList suppresses this printing.
Major Iteration Limit | | Default |
The number of major iterations.
Major Print Level | | Default |
The frequency that monitoring information is output to
Unit Number.
- When , g02jhf passes Major Print Level to the solver as
iprint.
In this case, the default value used is and hence no monitoring information will be output.
- When , g02jhf passes Major Print Level to the solver as
Major Print Level.
In this case, the default value used is and hence no monitoring information will be output.
Maximum Number of Threads | | Default |
Controls the maximum number of threads used by
g02jhf in a multithreaded library. By default, the maximum number of available threads are used.
In a library that is not multithreaded, this option has no effect.
Constraint:
.
Minor Iteration Limit | | Default |
The number of minor iterations.
- When , this option is ignored.
- When , g02jhf passes Minor Iteration Limit to the solver as
Minor Iteration Limit.
In this case, the default value used is , where is the number of variance components being estimated (excluding the overall variance, ).
Minor Print Level | | Default |
The frequency that additional monitoring information is output to
Unit Number.
- When , this option is ignored.
- When , g02jhf passes Minor Print Level to the solver as
Minor Print Level.
The default value of means that no additional monitoring information will be output.
Optimality Tolerance | | Default |
The optimality tolerance.
Parallelisation Strategy | | Default |
If
then
Parallelisation Strategy
controls how
g02jhf is parallelised in a multithreaded library.
- g02jhf will attempt to parallelise operations involving , even if .
- g02jhf will only attempt to parallelise operations involving , if .
By default,
, however, for some models / datasets, this may be slower than using
when
.
In a library that is not multithreaded, this option has no effect.
Constraint:
or .
Solution Accuracy | | Default |
The accuracy to which the solution is required.
Controls which solver
g02jhf will use when fitting the model. By default,
is used for small problems and
, otherwise.
If
, then the solver used is the one implemented in
e04lbf and if
, then the solver used is the one implemented in
e04ucf/e04uca.
Constraint:
or .
Sweep Tolerance | | Default |
The sweep tolerance used by
g02jhf when performing the sweep operation
Wolfinger et al. (1994). The default value used is
, where
.
Unit Number | | Default
|
The monitoring unit number to which
g02jhf will send any monitoring information.