NAG FL Interface
g02jdf (mixeff_hier_reml)
1
Purpose
g02jdf fits a multi-level linear mixed effects regression model using restricted maximum likelihood (REML). Prior to calling
g02jdf the initialization routine
g02jcf must be called.
2
Specification
Fortran Interface
Subroutine g02jdf ( |
lvpr, vpr, nvpr, gamma, effn, rnkx, ncov, lnlike, lb, id, ldid, b, se, czz, ldczz, cxx, ldcxx, cxz, ldcxz, rcomm, icomm, iopt, liopt, ropt, lropt, ifail) |
Integer, Intent (In) |
:: |
lvpr, vpr(lvpr), nvpr, lb, ldid, ldczz, ldcxx, ldcxz, iopt(liopt), liopt, lropt |
Integer, Intent (Inout) |
:: |
icomm(*), ifail |
Integer, Intent (Out) |
:: |
effn, rnkx, ncov, id(ldid,*) |
Real (Kind=nag_wp), Intent (In) |
:: |
ropt(lropt) |
Real (Kind=nag_wp), Intent (Inout) |
:: |
gamma(nvpr+1), czz(ldczz,*), cxx(ldcxx,*), cxz(ldcxz,*), rcomm(*) |
Real (Kind=nag_wp), Intent (Out) |
:: |
lnlike, b(lb), se(lb) |
|
C Header Interface
#include <nag.h>
void |
g02jdf_ (const Integer *lvpr, const Integer vpr[], const Integer *nvpr, double gamma[], Integer *effn, Integer *rnkx, Integer *ncov, double *lnlike, const Integer *lb, Integer id[], const Integer *ldid, double b[], double se[], double czz[], const Integer *ldczz, double cxx[], const Integer *ldcxx, double cxz[], const Integer *ldcxz, double rcomm[], Integer icomm[], const Integer iopt[], const Integer *liopt, const double ropt[], const Integer *lropt, Integer *ifail) |
|
C++ Header Interface
#include <nag.h> extern "C" {
void |
g02jdf_ (const Integer &lvpr, const Integer vpr[], const Integer &nvpr, double gamma[], Integer &effn, Integer &rnkx, Integer &ncov, double &lnlike, const Integer &lb, Integer id[], const Integer &ldid, double b[], double se[], double czz[], const Integer &ldczz, double cxx[], const Integer &ldcxx, double cxz[], const Integer &ldcxz, double rcomm[], Integer icomm[], const Integer iopt[], const Integer &liopt, const double ropt[], const Integer &lropt, Integer &ifail) |
}
|
The routine may be called by the names g02jdf or nagf_correg_mixeff_hier_reml.
3
Description
g02jdf fits a model of the form:
where |
is a vector of observations on the dependent variable, |
|
is a known by design matrix for the fixed independent variables, |
|
is a vector of length of unknown fixed effects, |
|
is a known by design matrix for the random independent variables, |
|
is a vector of length of unknown random effects, |
and |
is a vector of length of unknown random errors. |
Both
and
are assumed to have a Gaussian distribution with expectation zero and variance/covariance matrix defined by
where
,
is the
identity matrix and
is a diagonal matrix. It is assumed that the random variables,
, can be subdivided into
groups with each group being identically distributed with expectation zero and variance
. The diagonal elements of matrix
therefore take one of the values
, depending on which group the associated random variable belongs to.
The model therefore contains three sets of unknowns: the fixed effects
, the random effects
and a vector of
variance components
, where
. Rather than working directly with
,
g02jdf uses an iterative process to estimate
. Due to the iterative nature of the estimation a set of initial values,
, for
is required.
g02jdf allows these initial values either to be supplied by you or calculated from the data using the minimum variance quadratic unbiased estimators (MIVQUE0) suggested by
Rao (1972).
g02jdf fits the model by maximizing the restricted log-likelihood function:
where
Once the final estimates for
have been obtained, the value of
is given by
Case weights, , can be incorporated into the model by replacing and with and respectively, for a diagonal weight matrix .
The log-likelihood,
, is calculated using the sweep algorithm detailed in
Wolfinger et al. (1994).
4
References
Goodnight J H (1979) A tutorial on the SWEEP operator The American Statistician 33(3) 149–158
Harville D A (1977) Maximum likelihood approaches to variance component estimation and to related problems JASA 72 320–340
Rao C R (1972) Estimation of variance and covariance components in a linear model J. Am. Stat. Assoc. 67 112–115
Stroup W W (1989) Predictable functions and prediction space in the mixed model procedure Applications of Mixed Models in Agriculture and Related Disciplines Southern Cooperative Series Bulletin No. 343 39–48
Wolfinger R, Tobias R and Sall J (1994) Computing Gaussian likelihoods and their derivatives for general linear mixed models SIAM Sci. Statist. Comput. 15 1294–1310
5
Arguments
Note: prior to calling
g02jdf the initialization routine
g02jcf must be called, therefore this documention should be read in conjunction with the document for
g02jcf. In particular some argument names and conventions described in that document are also relevant here, but their definition has not been repeated. Specifically,
rndm,
weight,
n,
nff,
nrf,
nlsv,
levels,
fixed,
dat,
licomm and
lrcomm should be interpreted identically in both routines.
-
1:
– Integer
Input
-
On entry: the sum of the number of random parameters and the random intercept flags specified in the call to
g02jcf.
Constraint:
.
-
2:
– Integer array
Input
-
On entry: a vector of flags indicating the mapping between the random variables specified in
rndm and the variance components,
. See
Section 9 for more details.
Constraint:
, for .
-
3:
– Integer
Input
-
On entry: , the number of variance components being estimated (excluding the overall variance, ).
Constraint:
.
-
4:
– Real (Kind=nag_wp) array
Input/Output
-
On entry: holds the initial values of the variance components,
, with
the initial value for
, for
.
If
, the remaining elements of
gamma are ignored and the initial values for the variance components are estimated from the data using MIVQUE0.
On exit: , for , holds the final estimate of and holds the final estimate for .
Constraint:
or , for .
-
5:
– Integer
Output
-
On exit: effective number of observations. If no weights were supplied to
g02jcf or all supplied weights were nonzero,
.
-
6:
– Integer
Output
-
On exit: the rank of the design matrix, , for the fixed effects.
-
7:
– Integer
Output
-
On exit: number of variance components not estimated to be zero. If none of the variance components are estimated to be zero, then .
-
8:
– Real (Kind=nag_wp)
Output
-
On exit:
where
is the log of the restricted maximum likelihood calculated at
, the estimated variance components returned in
gamma.
-
9:
– Integer
Input
-
On entry: the dimension of the arrays
b and
se as declared in the (sub)program from which
g02jdf is called.
Constraint:
.
-
10:
– Integer array
Output
Note: the second dimension of the array
id
must be at least
(see
g02jcf).
On exit: an array describing the parameter estimates returned in
b. The first
columns of
id describe the parameter estimates for the random effects and the last
nff columns the parameter estimates for the fixed effects.
The example program for this routine includes a demonstration of decoding the parameter estimates given in
b using information from
id.
For fixed effects:
- for
- if contains the parameter estimate for the intercept, then
- if contains the parameter estimate for the th level of the th fixed variable, that is the vector of values held in the th column of dat when then
- if the th variable is continuous or binary, that is , ;
- any remaining rows of the th column of id are set to .
For random effects:
- let
- denote the number of random variables in the th random statement, that is ;
- denote the th random variable from the th random statement, that is the vector of values held in the th column of dat when ;
- denote the number of subject variables in the th random statement, that is ;
- denote the th subject variable from the th random statement, that is the vector of values held in the th column of dat when ;
- denote the number of levels for , that is ;
- then
- for , if contains the parameter estimate for the th level of when
, for and , i.e., is a valid value for the th subject variable, then
- if the parameter being estimated is for the intercept, then ;
- if the th variable is continuous, or binary, that is , then ;
- the remaining rows of the th column of id are set to .
In some situations, certain combinations of variables are never observed. In such circumstances all elements of the
th row of
id are set to
.
-
11:
– Integer
Input
-
On entry: the first dimension of the array
id as declared in the (sub)program from which
g02jdf is called.
Constraint:
, i.e.,
maximum number of subject variables (see
g02jcf).
-
12:
– Real (Kind=nag_wp) array
Output
-
On exit: the parameter estimates, with the first
elements of
containing the parameter estimates for the random effects,
, and the remaining
nff elements containing the parameter estimates for the fixed effects,
. The order of these estimates are described by the
id argument.
-
13:
– Real (Kind=nag_wp) array
Output
-
On exit: the standard errors of the parameter estimates given in
b.
-
14:
– Real (Kind=nag_wp) array
Output
Note: the second dimension of the array
czz
must be at least
(see
g02jcf).
On exit: if
, then
czz holds the lower triangular portion of the matrix
, where
and
are the estimates of
and
respectively. If
, then
czz holds this matrix in compressed form, with the first
nrf columns holding the part of the matrix corresponding to the first level of the overall subject variable, the next
nrf columns the part corresponding to the second level of the overall subject variable etc.
-
15:
– Integer
Input
-
On entry: the first dimension of the array
czz as declared in the (sub)program from which
g02jdf is called.
Constraint:
.
-
16:
– Real (Kind=nag_wp) array
Output
Note: the second dimension of the array
cxx
must be at least
.
On exit:
cxx holds the lower triangular portion of the matrix
, where
is the estimated value of
.
-
17:
– Integer
Input
-
On entry: the first dimension of the array
cxx as declared in the (sub)program from which
g02jdf is called.
Constraint:
.
-
18:
– Real (Kind=nag_wp) array
Output
Note: the second dimension of the array
cxz
must be at least
(see
g02jcf).
On exit: if
, then
cxz holds the matrix
, where
and
are the estimates of
and
respectively. If
, then
cxz holds this matrix in compressed form, with the first
nrf columns holding the part of the matrix corresponding to the first level of the overall subject variable, the next
nrf columns the part corresponding to the second level of the overall subject variable etc.
-
19:
– Integer
Input
-
On entry: the first dimension of the array
cxz as declared in the (sub)program from which
g02jdf is called.
Constraint:
.
-
20:
– Real (Kind=nag_wp) array
Communication Array
-
On entry: communication array initialized by a call to
g02jcf.
-
21:
– Integer array
Communication Array
-
On entry: communication array initialized by a call to
g02jcf.
-
22:
– Integer array
Input
-
On entry: optional parameters passed to the optimization routine.
By default
g02jdf fits the specified model using a modified Newton optimization algorithm as implemented in
e04lbf. In some cases, where the calculation of the derivatives is computationally expensive it may be more efficient to use a sequential QP algorithm. The sequential QP algorithm as implemented in
e04uca can be chosen by setting
. If
or
, then the modified Newton algorithm will be used.
Different optional parameters are available depending on the optimization routine used. In all cases, using a value of
will cause the default value to be used. In addition only the first
liopt values of
iopt are used, so for example, if only the first element of
iopt needs changing and default values for all other optional parameters are sufficient
liopt can be set to
.
The following table lists the association between elements of
iopt and arguments in the optimizer when the modified Newton algorithm is being used.
|
Description | Equivalent argument |
Default Value |
| Number of iterations | maxcal | |
| Unit number for monitoring information | n/a | As returned by x04abf |
| Print optional parameters ( print) | n/a | (no printing performed) |
| Frequency that monitoring information is printed | iprint | |
| Optimizer used | n/a | n/a |
If requested, monitoring information is displayed in a similar format to that given by the modified Newton optimizer.
The following table lists the association between elements of
iopt and optional parameters in the optimizer when the sequential QP algorithm is being used.
|
Description | Equivalent optional parameter |
Default Value |
| Number of iterations | Major Iteration Limit | |
| Unit number for monitoring information | n/a | As returned by x04abf |
| Print optional parameters ( print, otherwise no print) | List/Nolist | (no printing performed) |
| Frequency that monitoring information is printed | Major Print Level | |
| Optimizer used | n/a | n/a |
| Number of minor iterations | Minor Iteration Limit | |
| Frequency that additional monitoring information is printed | Minor Print Level | |
If
, default values are used for all optional parameters and
iopt is not referenced.
-
23:
– Integer
Input
-
On entry: length of the options array
iopt.
-
24:
– Real (Kind=nag_wp) array
Input
-
On entry: optional parameters passed to the optimization routine.
Different optional parameters are available depending on the optimization routine used. In all cases, using a value of
will cause the default value to be used. In addition only the first
lropt values of
ropt are used, so for example, if only the first element of
ropt needs changing and default values for all other optional parameters are sufficient
lropt can be set to
.
The following table lists the association between elements of
ropt and arguments in the optimizer when the modified Newton algorithm is being used.
|
Description | Equivalent argument |
Default Value |
| Sweep tolerance | n/a | |
| Lower bound for | n/a | |
| Upper bound for | n/a | |
| Accuracy of linear minimizations | eta | |
| Accuracy to which solution is required | xtol | |
| Initial distance from solution | stepmx | |
The following table lists the association between elements of
ropt and optional parameters in the optimizer when the sequential QP algorithm is being used.
|
Description | Equivalent optional parameter |
Default Value |
| Sweep tolerance | n/a | |
| Lower bound for | n/a | |
| Upper bound for | n/a | |
| Line search tolerance | Line Search Tolerance | |
| Optimality tolerance | Optimality Tolerance | |
where
is the
machine precision returned by
x02ajf and
denotes the
diagonal element of
.
If
, then default values are used for all optional parameters and
ropt is not referenced.
-
25:
– Integer
Input
-
On entry: length of the options array
ropt.
-
26:
– Integer
Input/Output
-
On entry:
ifail must be set to
,
or
to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of means that an error message is printed while a value of means that it is not.
If halting is not appropriate, the value
or
is recommended. If message printing is undesirable, then the value
is recommended. Otherwise, the value
is recommended.
When the value or is used it is essential to test the value of ifail on exit.
On exit:
unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
or
, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
-
On entry, .
Constraint: .
-
On entry, and .
Constraint: .
-
On entry, .
Constraint: .
-
On entry, .
Constraint: or .
-
On entry, .
Constraint: .
-
On entry, .
Constraint: .
-
On entry, .
Constraint: .
-
On entry, .
Constraint: .
-
On entry, .
Constraint: .
-
On entry,
icomm has not been initialized correctly.
-
On entry, at least one value of
i, for
, does not appear in
vpr.
-
Optimal solution found, but requested accuracy not achieved.
-
Too many major iterations.
-
Current point cannot be improved upon.
-
At least one negative estimate for
gamma was obtained. All negative estimates have been set to zero.
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 7 in the Introduction to the NAG Library FL Interface for further information.
Your licence key may have expired or may not have been installed correctly.
See
Section 8 in the Introduction to the NAG Library FL Interface for further information.
Dynamic memory allocation failed.
See
Section 9 in the Introduction to the NAG Library FL Interface for further information.
7
Accuracy
Not applicable.
8
Parallelism and Performance
g02jdf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g02jdf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementation-specific information.
The argument
vpr gives the mapping between the random variables and the variance components. In most cases
, for
. However, in some cases it might be necessary to associate more than one random variable with a single variance component, for example, when the columns of
dat hold dummy variables.
Consider a dataset with three variables:
where the first column corresponds to a categorical variable with three levels, the next to a categorical variable with two levels and the last column to a continuous variable. So in a call to
g02jcf
also assume a model with no fixed effects, no random intercept, no nesting and all three variables being included as random effects, then
Each of the three columns in
dat therefore correspond to a single variable and hence there are three variance components, one for each random variable included in the model, so
This is the recommended way of supplying the data to
g02jdf, however it is possible to reformat the above dataset by replacing each of the categorical variables with a series of dummy variables, one for each level. The dataset then becomes
where each column only has one level
Again a model with no fixed effects, no random intercept, no nesting and all variables being included as random effects is required, so
With the data entered in this manner, the first three columns of
dat correspond to a single variable (the first column of the original dataset) as do the next two columns (the second column of the original dataset). Therefore
vpr must reflect this
In most situations it is more efficient to supply the data to
g02jcf in terms of categorical variables rather than transform them into dummy variables.
9.1
Internal Changes
Internal changes have been made to this routine as follows:
- At Mark 27: The algorithm underlying this routine has been altered to improve efficiency. The design matrices are now reordered internally before performing the analysis. It is therefore no longer necessary to construct rndm in the previously recommended manner.
For details of all known issues which have been reported for the NAG Library please refer to the
Known Issues.
10
Example
This example fits a random effects model with three levels of nesting to a simulated dataset with observations and variables.
10.1
Program Text
10.2
Program Data
10.3
Program Results