For each of the routines g02jaf,g02jbf,g02jcf,g02jdfandg02jef there is not a direct one-to-one replacement, rather they have been replaced with a new suite of routines. This new suite allows a linear mixed effects model to be specified using a modelling language; giving a more natural way of specifying the model, allowing interaction terms to be specified and means that it is no longer necessary to create dummy variables when the model contains categorical variables.
The new suite of routines consists of:
g22ybf used to describe the dataset of interest. Calling this routine allows labels to be assign to variables, which can then be used when specifying the model.
g22yaf multiple calls to this routine are used to specify the fixed and random part of the model. The model is specified using strings and a modelling language, for example the string: would specify a model with the main effects of variables and and the interaction term between them. The modelling language is explained in detail in Section 3 in g22yaf.
g02jff pre-processes the dataset prior to calling the model fitting routine.
g02jhf fits the model and returns the parameter estimates etc.
In addition to the routines listed above, the following can also be used:
g02jgf combines information returned by multiple calls to g02jff. This is useful for large problems as it allows the dataset to be split up into smaller subsets of data, pre-processing each one separately before combining them into a single set of information as if g02jff had been called on the full dataset.
g22ydf can be used to obtain labels for the parameter estimates returned by g02jhf.
g22znf can be used to return the value of any optional arguments.
By default, the model fitting routine, g02jhf, fits the linear mixed effects model using restricted maximum likelihood (REML). In order to fit the model using maximum likelihood (ML) you need to call the optional argument setting routine, g22zmf with optstr set to , between the call to g02jff and the call to g02jhf.
The routine may be called by the names g02jaf or nagf_correg_mixeff_reml.
3Description
g02jaf fits a model of the form:
where
is a vector of observations on the dependent variable,
is a known design matrix for the fixed independent variables,
is a vector of length of unknown fixed effects,
is a known design matrix for the random independent variables,
is a vector of length of unknown random effects,
and
is a vector of length of unknown random errors.
Both and are assumed to have a Gaussian distribution with expectation zero and
where , is the identity matrix and is a diagonal matrix. It is assumed that the random variables, , can be subdivided into groups with each group being identically distributed with expectations zero and variance . The diagonal elements of matrix , therefore, take one of the values , depending on which group the associated random variable belongs to.
The model, therefore, contains three sets of unknowns, the fixed effects, , the random effects and a vector of variance components, , where . Rather than working directly with , g02jaf uses an iterative process to estimate . Due to the iterative nature of the estimation a set of initial values, , for is required. g02jaf allows these initial values either to be supplied by you or calculated from the data using the minimum variance quadratic unbiased estimators (MIVQUE0) suggested by Rao (1972).
g02jaf fits the model using a quasi-Newton algorithm to maximize the restricted log-likelihood function:
where
Once the final estimates for have been obtained, the value of is given by:
Case weights, , can be incorporated into the model by replacing and with and respectively, for a diagonal weight matrix .
The log-likelihood, , is calculated using the sweep algorithm detailed in Wolfinger et al. (1994).
4References
Goodnight J H (1979) A tutorial on the SWEEP operator The American Statistician33(3) 149–158
Harville D A (1977) Maximum likelihood approaches to variance component estimation and to related problems JASA72 320–340
Rao C R (1972) Estimation of variance and covariance components in a linear model J. Am. Stat. Assoc.67 112–115
Stroup W W (1989) Predictable functions and prediction space in the mixed model procedure Applications of Mixed Models in Agriculture and Related DisciplinesSouthern Cooperative Series Bulletin No. 343 39–48
Wolfinger R, Tobias R and Sall J (1994) Computing Gaussian likelihoods and their derivatives for general linear mixed models SIAM Sci. Statist. Comput.15 1294–1310
5Arguments
1: – IntegerInput
On entry: , the number of observations.
Constraint:
.
2: – IntegerInput
On entry: the number of columns in the data matrix, dat.
Constraint:
.
3: – IntegerInput
On entry: the first dimension of the array dat as declared in the (sub)program from which g02jaf is called.
Constraint:
.
4: – Real (Kind=nag_wp) arrayInput
On entry: array containing all of the data. For the th observation:
holds the dependent variable, ;
if , holds the case weights;
if , holds the subject variable.
The remaining columns hold the values of the independent variables.
Constraints:
if , ;
if , .
5: – Integer arrayInput
On entry: contains the number of levels associated with the th variable of the data matrix dat. If this variable is continuous or binary (i.e., only takes the values zero or one) then should be ; if the variable is discrete then is the number of levels associated with it and is assumed to take the values to , for .
Constraint:
, for .
6: – IntegerInput
On entry: the column of dat holding the dependent, , variable.
Constraint:
.
7: – IntegerInput
On entry: the column of dat holding the case weights.
If , no weights are used.
Constraint:
.
8: – IntegerInput
On entry: the number of independent variables in the model which are to be treated as being fixed.
Constraint:
.
9: – Integer arrayInput
On entry: the columns of the data matrix dat holding the fixed independent variables with holding the column number corresponding to the th fixed variable.
Constraint:
, for .
10: – IntegerInput
On entry: flag indicating whether a fixed intercept is included ().
Constraint:
or .
11: – IntegerInput
On entry: the number of independent variables in the model which are to be treated as being random.
Constraints:
;
.
12: – Integer arrayInput
On entry: the columns of the data matrix holding the random independent variables with holding the column number corresponding to the th random variable.
Constraint:
, for .
13: – IntegerInput
On entry: if and , nvpr is the number of variance components being , (), else .
If , is not referenced.
Constraint:
if , .
14: – Integer arrayInput
On entry: holds a flag indicating the variance of the th random variable. The variance of the th random variable is , where if and and otherwise. Random variables with the same value of are assumed to be taken from the same distribution.
Constraint:
, for .
15: – IntegerInput
On entry: flag indicating whether a random intercept is included ().
On entry: the column of dat holding the subject variable.
If , no subject variable is used.
Specifying a subject variable is equivalent to specifying the interaction between that variable and all of the random-effects. Letting the notation denote the interaction between variables and , fitting a model with , random-effects and subject variable is equivalent to fitting a model with random-effects and no subject variable. If the model is equivalent to fitting and no subject variable.
Constraint:
.
17: – Real (Kind=nag_wp) arrayInput/Output
On entry: holds the initial values of the variance components, , with
the initial value for , for . If and , , else .
If , the remaining elements of gamma are ignored and the initial values for the variance components are estimated from the data using MIVQUE0.
On exit: , for , holds the final estimate of and holds the final estimate for .
Constraint:
or , for .
18: – IntegerOutput
On exit: the number of fixed effects estimated (i.e., the number of columns, , in the design matrix ).
19: – IntegerOutput
On exit: the number of random effects estimated (i.e., the number of columns, , in the design matrix ).
20: – IntegerOutput
On exit: the degrees of freedom.
21: – Real (Kind=nag_wp)Output
On exit: where is the log of the restricted maximum likelihood calculated at , the estimated variance components returned in gamma.
On exit: the parameter estimates, , with the first nff elements of b containing the fixed effect parameter estimates, and the next nrf elements of b containing the random effect parameter estimates, .
Fixed effects
If , contains the estimate of the fixed intercept. Let denote the number of levels associated with the th fixed variable, that is . Define
if , else if , ;
, .
Then for :
if ,
contains the parameter estimate for the th level of the th fixed variable, for ;
if , contains the parameter estimate for the th fixed variable.
Random effects
Redefining to denote the number of levels associated with the th random variable, that is . Define
if , else if , ; , .
Then for :
if ,
if ,
contains the parameter estimate for the th level of the th random variable, for ;
if , contains the parameter estimate for the th random variable;
if ,
let denote the number of levels associated with the subject variable, that is ;
if ,
contains the parameter estimate for the interaction between the th level of the subject variable and the th level of the th random variable, for and ;
if ,
contains the parameter estimate for the interaction between the th level of the subject variable and the th random variable, for ;
if , contains the estimate of the random intercept.
24: – Real (Kind=nag_wp) arrayOutput
On exit: the standard errors of the parameter estimates given in b.
25: – IntegerInput
On entry: the maximum number of iterations.
If , the default value of is used.
If , the parameter estimates and corresponding standard errors are calculated based on the value of supplied in gamma.
26: – Real (Kind=nag_wp)Input
On entry: the tolerance used to assess convergence.
If , the default value of is used, where is the machine precision.
27: – IntegerOutput
On exit: is set to if a variance component was estimated to be a negative value during the fitting process. Otherwise warn is set to .
If , the negative estimate is set to zero and the estimation process allowed to continue.
28: – IntegerInput/Output
On entry: ifail must be set to , or to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of means that an error message is printed while a value of means that it is not.
If halting is not appropriate, the value or is recommended. If message printing is undesirable, then the value is recommended. Otherwise, the value is recommended. When the value or is used it is essential to test the value of ifail on exit.
On exit: unless the routine detects an error or a warning has been flagged (see Section 6).
6Error Indicators and Warnings
If on entry or , explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
On entry, and .
Constraint: and any supplied weights must be .
On entry, .
Constraint: number of observations with nonzero weights must be greater than one.
On entry, .
Constraint: .
On entry, and .
Constraint: .
On entry, and .
Constraint: and .
On entry, and .
Constraint: and ( or ).
On entry, .
Constraint: or .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, , for at least one .
On entry, invalid data: categorical variable with value greater than that specified in levels.
On entry, , for at least one .
On entry, .
Constraint: , for all .
On entry, .
Constraint: , for all .
On entry, .
Constraint: , for all .
Degrees of freedom : .
This is due to the number of parameters exceeding the effective number of observations.
Routine failed to converge in maxit iterations: .
See Section 10 for advice.
Routine failed to converge to specified tolerance: .
See Section 10 for advice.
An unexpected error has been triggered by this routine. Please
contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.
7Accuracy
The accuracy of the results can be adjusted through the use of the tol argument.
8Parallelism and Performance
Background information to multithreading can be found in the Multithreading documentation.
g02jaf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g02jaf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.
9Further Comments
Wherever possible any block structure present in the design matrix should be modelled through a subject variable, specified via svid, rather than being explicitly entered into dat.
g02jaf uses an iterative process to fit the specified model and for some problems this process may fail to converge (see ). If the routine fails to converge then the maximum number of iterations (see maxit) or tolerance (see tol) may require increasing; try a different starting estimate in gamma. Alternatively, the model can be fit using maximum likelihood (see g02jbf) or using the noniterative MIVQUE0.
To fit the model just using MIVQUE0, the first element of gamma should be set to and maxit should be set to zero.
Although the quasi-Newton algorithm used in g02jaf tends to require more iterations before converging compared to the Newton–Raphson algorithm recommended by Wolfinger et al. (1994), it does not require the second derivatives of the likelihood function to be calculated and consequentially takes significantly less time per iteration.
10Example
The following dataset is taken from Stroup (1989) and arises from a balanced split-plot design with the whole plots arranged in a randomized complete block-design.
In this example the full design matrix for the random independent variable, , is given by:
(1)
where
The block structure evident in (1) is modelled by specifying a four-level subject variable, taking the values . The first column of is added to by setting . The remaining columns of are specified by a three level factor, taking the values, .