NAG FL Interface
g02jef (mixeff_​hier_​ml)

Note: this routine is deprecated. Replaced by g02jhf.
Settings help

FL Name Style:


FL Specification Language:


1 Purpose

g02jef fits a multi-level linear mixed effects regression model using maximum likelihood (ML). Prior to calling g02jef the initialization routine g02jcf must be called.

2 Specification

Fortran Interface
Subroutine g02jef ( lvpr, vpr, nvpr, gamma, effn, rnkx, ncov, lnlike, lb, id, ldid, b, se, czz, ldczz, cxx, ldcxx, cxz, ldcxz, rcomm, icomm, iopt, liopt, ropt, lropt, ifail)
Integer, Intent (In) :: lvpr, vpr(lvpr), nvpr, lb, ldid, ldczz, ldcxx, ldcxz, iopt(liopt), liopt, lropt
Integer, Intent (Inout) :: icomm(*), ifail
Integer, Intent (Out) :: effn, rnkx, ncov, id(ldid,*)
Real (Kind=nag_wp), Intent (In) :: ropt(lropt)
Real (Kind=nag_wp), Intent (Inout) :: gamma(nvpr+1), czz(ldczz,*), cxx(ldcxx,*), cxz(ldcxz,*), rcomm(*)
Real (Kind=nag_wp), Intent (Out) :: lnlike, b(lb), se(lb)
C Header Interface
#include <nag.h>
void  g02jef_ (const Integer *lvpr, const Integer vpr[], const Integer *nvpr, double gamma[], Integer *effn, Integer *rnkx, Integer *ncov, double *lnlike, const Integer *lb, Integer id[], const Integer *ldid, double b[], double se[], double czz[], const Integer *ldczz, double cxx[], const Integer *ldcxx, double cxz[], const Integer *ldcxz, double rcomm[], Integer icomm[], const Integer iopt[], const Integer *liopt, const double ropt[], const Integer *lropt, Integer *ifail)
The routine may be called by the names g02jef or nagf_correg_mixeff_hier_ml.

3 Description

g02jef fits a model of the form:
y=Xβ+Zν+ε  
where y is a vector of n observations on the dependent variable,
X is a known n×p design matrix for the fixed independent variables,
β is a vector of length p of unknown fixed effects,
Z is a known n×q design matrix for the random independent variables,
ν is a vector of length q of unknown random effects,
and ε is a vector of length n of unknown random errors.
Both ν and ε are assumed to have a Gaussian distribution with expectation zero and variance/covariance matrix defined by
Var[ ν ε ] = [ G 0 0 R ]  
where R= σ R 2 I , I is the n×n identity matrix and G is a diagonal matrix. It is assumed that the random variables, Z , can be subdivided into g q groups with each group being identically distributed with expectation zero and variance σi2 . The diagonal elements of matrix G , therefore, take one of the values {σi2:i=1,2,,g} , depending on which group the associated random variable belongs to.
The model, therefore, contains three sets of unknowns: the fixed effects β , the random effects ν and a vector of g+1 variance components γ , where γ = {σ12,σ22,, σ g-1 2 ,σg2,σR2} . Rather than working directly with γ , g02jef uses an iterative process to estimate γ* = { σ12 / σR2 , σ22 / σR2 ,, σg-12 / σR2 , σg2 / σR2 ,1} . Due to the iterative nature of the estimation a set of initial values, γ0 , for γ* is required. g02jef allows these initial values either to be supplied by you or calculated from the data using the minimum variance quadratic unbiased estimators (MIVQUE0) suggested by Rao (1972).
g02jef fits the model by maximizing the log-likelihood function:
−2 l R = log(|V|) + n log( r TV-1r) + log(2π/n)  
where
V = ZG ZT + R,   r=y-Xb   and   b = (XTV-1X)-1 XT V-1 y .  
Once the final estimates for γ * have been obtained, the value of σR2 is given by
σR2 = (rTV-1r) / (n-p) .  
Case weights, Wc , can be incorporated into the model by replacing XTX and ZTZ with XTWcX and ZTWcZ respectively, for a diagonal weight matrix Wc .
The log-likelihood, lR, is calculated using the sweep algorithm detailed in Wolfinger et al. (1994).

4 References

Goodnight J H (1979) A tutorial on the SWEEP operator The American Statistician 33(3) 149–158
Harville D A (1977) Maximum likelihood approaches to variance component estimation and to related problems JASA 72 320–340
Rao C R (1972) Estimation of variance and covariance components in a linear model J. Am. Stat. Assoc. 67 112–115
Stroup W W (1989) Predictable functions and prediction space in the mixed model procedure Applications of Mixed Models in Agriculture and Related Disciplines Southern Cooperative Series Bulletin No. 343 39–48
Wolfinger R, Tobias R and Sall J (1994) Computing Gaussian likelihoods and their derivatives for general linear mixed models SIAM Sci. Statist. Comput. 15 1294–1310

5 Arguments

Note: prior to calling g02jef the initialization routine g02jcf must be called, therefore, this documention should be read in conjunction with the document for g02jcf. In particular some argument names and conventions described in that document are also relevant here, but their definition has not been repeated. Specifically, rndm, weight, n, nff, nrf, nlsv, levels, fixed, dat, licomm and lrcomm should be interpreted identically in both routines.
1: lvpr Integer Input
On entry: the sum of the number of random parameters and the random intercept flags specified in the call to g02jcf.
Constraint: lvpr=irndm(1,i)+rndm(2,i).
2: vpr(lvpr) Integer array Input
On entry: a vector of flags indicating the mapping between the random variables specified in rndm and the variance components, σi2. See Section 9 for more details.
Constraint: 1vpr(i)nvpr, for i=1,2,,lvpr.
3: nvpr Integer Input
On entry: g, the number of variance components being estimated (excluding the overall variance, σR2).
Constraint: 1nvprlvpr .
4: gamma(nvpr+1) Real (Kind=nag_wp) array Input/Output
On entry: holds the initial values of the variance components, γ0 , with gamma(i) the initial value for σi2/σR2, for i=1,2,,nvpr.
If gamma(1)=-1.0, the remaining elements of gamma are ignored and the initial values for the variance components are estimated from the data using MIVQUE0.
On exit: gamma(i), for i=1,2,,nvpr, holds the final estimate of σi2 and gamma(nvpr+1) holds the final estimate for σR2.
Constraint: gamma(1)=-1.0 or gamma(i)0.0, for i=1,2,,g.
5: effn Integer Output
On exit: effective number of observations. If no weights were supplied to g02jcf or all supplied weights were nonzero, effn=n.
6: rnkx Integer Output
On exit: the rank of the design matrix, X, for the fixed effects.
7: ncov Integer Output
On exit: number of variance components not estimated to be zero. If none of the variance components are estimated to be zero, then ncov=nvpr.
8: lnlike Real (Kind=nag_wp) Output
On exit: - 2 lR (γ^) where lR is the log of the maximum likelihood calculated at γ^ , the estimated variance components returned in gamma.
9: lb Integer Input
On entry: the dimension of the arrays b and se as declared in the (sub)program from which g02jef is called.
Constraint: lbnff+nrf×nlsv.
10: id(ldid,*) Integer array Output
Note: the second dimension of the array id must be at least lb (see g02jcf).
On exit: an array describing the parameter estimates returned in b. The first nlsv×nrf columns of id describe the parameter estimates for the random effects and the last nff columns the parameter estimates for the fixed effects.
The example program for this routine includes a demonstration of decoding the parameter estimates given in b using information from id.
For fixed effects:
  • for l=nrf×nlsv+1 ,, nrf×nlsv+nff
    • if b(l) contains the parameter estimate for the intercept, then
      id(1,l) = id(2,l) = id(3,l) = 0 ;  
    • if b(l) contains the parameter estimate for the ith level of the jth fixed variable, that is the vector of values held in the kth column of dat when fixed(j+2)=k then
      id(1,l)=0,  id(2,l)=j,  id(3,l)=i;  
    • if the jth variable is continuous or binary, that is levels(fixed(j+2))=1, id(3,l)=0;
    • any remaining rows of the lth column of id are set to 0.
For random effects:
  • let
    • NRb denote the number of random variables in the bth random statement, that is NRb=rndm(1,b);
    • Rjb denote the jth random variable from the bth random statement, that is the vector of values held in the kth column of dat when rndm(2+j,b)=k;
    • NSb denote the number of subject variables in the bth random statement, that is NSb=rndm(3+NRb,b);
    • Sjb denote the jth subject variable from the bth random statement, that is the vector of values held in the kth column of dat when rndm(3+NRb+j,b)=k;
    • L(Sjb) denote the number of levels for Sjb, that is L(Sjb)=levels(rndm(3+NRb+j,b));
  • then
    • for l=1,2, nrf×nlsv, if b(l) contains the parameter estimate for the ith level of Rjb when Skb=sk, for k=1,2,,NSb and 1skL(Sjb), i.e., sk is a valid value for the kth subject variable, then
      id(1,l)=b,  id(2,l)=j,  id(3,l)=i,  id(3+k,l)=sk, ​k=1,2,,NSb;  
    • if the parameter being estimated is for the intercept, then id(2,l)=id(3,l)=0;
    • if the jth variable is continuous, or binary, that is L(Sjb)=1, then id(3,l)=0;
    • the remaining rows of the lth column of id are set to 0.
In some situations, certain combinations of variables are never observed. In such circumstances all elements of the lth row of id are set to −999.
11: ldid Integer Input
On entry: the first dimension of the array id as declared in the (sub)program from which g02jef is called.
Constraint: ldid3+maxj (rndm( 3+ rndm (1,j) ,j)) , i.e.,3+ maximum number of subject variables (see g02jcf).
12: b(lb) Real (Kind=nag_wp) array Output
On exit: the parameter estimates, with the first nrf×nlsv elements of b containing the parameter estimates for the random effects, ν, and the remaining nff elements containing the parameter estimates for the fixed effects, β. The order of these estimates are described by the id argument.
13: se(lb) Real (Kind=nag_wp) array Output
On exit: the standard errors of the parameter estimates given in b.
14: czz(ldczz,*) Real (Kind=nag_wp) array Output
Note: the second dimension of the array czz must be at least nlsv×nrf (see g02jcf).
On exit: if nlsv=1, then czz holds the lower triangular portion of the matrix (1/ σ 2 ) (ZTR^-1Z+G^-1) , where R^ and G^ are the estimates of R and G respectively. If nlsv>1, then czz holds this matrix in compressed form, with the first nrf columns holding the part of the matrix corresponding to the first level of the overall subject variable, the next nrf columns the part corresponding to the second level of the overall subject variable etc.
15: ldczz Integer Input
On entry: the first dimension of the array czz as declared in the (sub)program from which g02jef is called.
Constraint: ldczznrf.
16: cxx(ldcxx,*) Real (Kind=nag_wp) array Output
Note: the second dimension of the array cxx must be at least nff.
On exit: cxx holds the lower triangular portion of the matrix (1/σ2) XT V^-1 X , where V^ is the estimated value of V.
17: ldcxx Integer Input
On entry: the first dimension of the array cxx as declared in the (sub)program from which g02jef is called.
Constraint: ldcxxnff.
18: cxz(ldcxz,*) Real (Kind=nag_wp) array Output
Note: the second dimension of the array cxz must be at least nlsv×nrf (see g02jcf).
On exit: if nlsv=1, then cxz holds the matrix (1/σ2) (XTV^-1Z) G^ , where V^ and G^ are the estimates of V and G respectively. If nlsv>1, then cxz holds this matrix in compressed form, with the first nrf columns holding the part of the matrix corresponding to the first level of the overall subject variable, the next nrf columns the part corresponding to the second level of the overall subject variable etc.
19: ldcxz Integer Input
On entry: the first dimension of the array cxz as declared in the (sub)program from which g02jef is called.
Constraint: ldcxznff.
20: rcomm(*) Real (Kind=nag_wp) array Communication Array
On entry: communication array initialized by a call to g02jcf.
21: icomm(*) Integer array Communication Array
On entry: communication array initialized by a call to g02jcf.
22: iopt(liopt) Integer array Input
On entry: optional parameters passed to the optimization routine.
By default g02jef fits the specified model using a modified Newton optimization algorithm as implemented in e04lbf. In some cases, where the calculation of the derivatives is computationally expensive it may be more efficient to use a sequential QP algorithm. The sequential QP algorithm as implemented in e04uca can be chosen by setting iopt(5)=1. If liopt<5 or iopt(5)1, then the modified Newton algorithm will be used.
Different optional parameters are available depending on the optimization routine used. In all cases, using a value of -1 will cause the default value to be used. In addition only the first liopt values of iopt are used, so for example, if only the first element of iopt needs changing and default values for all other optional parameters are sufficient liopt can be set to 1.
The following table lists the association between elements of iopt and arguments in the optimizer when the modified Newton algorithm is being used.

i

Description
Equivalent
argument

Default Value
1 Number of iterations maxcal 1000
2 Unit number for monitoring information n/a As returned by x04abf
3 Print optional parameters (1= print) n/a -1 (no printing performed)
4 Frequency that monitoring information is printed iprint -1
5 Optimizer used n/a n/a
If requested, monitoring information is displayed in a similar format to that given by the modified Newton optimizer.
The following table lists the association between elements of iopt and optional parameters in the optimizer when the sequential QP algorithm is being used.

i

Description
Equivalent
optional parameter

Default Value
1 Number of iterations Major Iteration Limit max(50,3×nvpr)
2 Unit number for monitoring information n/a As returned by x04abf
3 Print optional parameters (1= print, otherwise no print) List/Nolist -1 (no printing performed)
4 Frequency that monitoring information is printed Major Print Level 0
5 Optimizer used n/a n/a
6 Number of minor iterations Minor Iteration Limit max(50,3×nvpr)
7 Frequency that additional monitoring information is printed Minor Print Level 0
If liopt0, default values are used for all optional parameters and iopt is not referenced.
23: liopt Integer Input
On entry: length of the options array iopt.
24: ropt(lropt) Real (Kind=nag_wp) array Input
On entry: optional parameters passed to the optimization routine.
Different optional parameters are available depending on the optimization routine used. In all cases, using a value of -1.0 will cause the default value to be used. In addition only the first lropt values of ropt are used, so for example, if only the first element of ropt needs changing and default values for all other optional parameters are sufficient lropt can be set to 1.
The following table lists the association between elements of ropt and arguments in the optimizer when the modified Newton algorithm is being used.

i

Description
Equivalent
argument

Default Value
1 Sweep tolerance n/a max(eps,eps×maxi(zzii))
2 Lower bound for γ* n/a eps/100
3 Upper bound for γ* n/a 1020
4 Accuracy of linear minimizations eta 0.9
5 Accuracy to which solution is required xtol 0.0
6 Initial distance from solution stepmx 100000.0
The following table lists the association between elements of ropt and optional parameters in the optimizer when the sequential QP algorithm is being used.

i

Description
Equivalent
optional parameter

Default Value
1 Sweep tolerance n/a max(eps,eps×maxi(zzii))
2 Lower bound for γ* n/a eps/100
3 Upper bound for γ* n/a 1020
4 Line search tolerance Line Search Tolerance 0.9
5 Optimality tolerance Optimality Tolerance eps0.72
where eps is the machine precision returned by x02ajf and zzii denotes the i diagonal element of ZTZ.
If lropt0, then default values are used for all optional parameters and ropt is not referenced.
25: lropt Integer Input
On entry: length of the options array ropt.
26: ifail Integer Input/Output
On entry: ifail must be set to 0, −1 or 1 to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of 0 causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of −1 means that an error message is printed while a value of 1 means that it is not.
If halting is not appropriate, the value −1 or 1 is recommended. If message printing is undesirable, then the value 1 is recommended. Otherwise, the value 0 is recommended. When the value -1 or 1 is used it is essential to test the value of ifail on exit.
On exit: ifail=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry ifail=0 or −1, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
ifail=1
On entry, lvpr=value.
Constraint: lvprvalue.
ifail=2
On entry, vpr(value)=value and nvpr=value.
Constraint: 1vpr(i)nvpr.
ifail=3
On entry, nvpr=value.
Constraint: 1nvprvalue.
ifail=4
On entry, gamma(value)=value.
Constraint: gamma(1)=-1.0 or gamma(i)0.0.
ifail=9
On entry, lb=value.
Constraint: lbvalue.
ifail=11
On entry, ldid=value.
Constraint: ldidvalue.
ifail=15
On entry, ldczz=value.
Constraint: ldczzvalue.
ifail=17
On entry, ldcxx=value.
Constraint: ldcxxvalue.
ifail=19
On entry, ldcxz=value.
Constraint: ldcxzvalue.
ifail=21
On entry, icomm has not been initialized correctly.
ifail=32
On entry, at least one value of i, for i=1,2,,nvpr, does not appear in vpr.
ifail=101
Optimal solution found, but requested accuracy not achieved.
ifail=102
Too many major iterations.
ifail=103
Current point cannot be improved upon.
ifail=104
At least one negative estimate for gamma was obtained. All negative estimates have been set to zero.
ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
ifail=-399
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
ifail=-999
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

Not applicable.

8 Parallelism and Performance

Background information to multithreading can be found in the Multithreading documentation.
g02jef is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g02jef makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

The argument vpr gives the mapping between the random variables and the variance components. In most cases vpr(i)=i, for i=1,2,,irndm(1,i)+rndm(2,i). However, in some cases it might be necessary to associate more than one random variable with a single variance component, for example, when the columns of dat hold dummy variables.
Consider a dataset with three variables:
dat= ( 113.6 214.5 311.1 128.3 227.2 326.1 )  
where the first column corresponds to a categorical variable with three levels, the next to a categorical variable with two levels and the last column to a continuous variable. So in a call to g02jcf
levels=(321)  
also assume a model with no fixed effects, no random intercept, no nesting and all three variables being included as random effects, then
fixed=(00); rndm=(30123)T.  
Each of the three columns in dat, therefore, correspond to a single variable and hence there are three variance components, one for each random variable included in the model, so
vpr=(123).  
This is the recommended way of supplying the data to g02jef, however it is possible to reformat the above dataset by replacing each of the categorical variables with a series of dummy variables, one for each level. The dataset then becomes
dat= ( 100103.6 010104.5 001101.1 100018.3 010017.2 001016.1 )  
where each column only has one level
levels= (111111) .  
Again a model with no fixed effects, no random intercept, no nesting and all variables being included as random effects is required, so
fixed=(00) ; rndm= (60123456)T .  
With the data entered in this manner, the first three columns of dat correspond to a single variable (the first column of the original dataset) as do the next two columns (the second column of the original dataset). Therefore, vpr must reflect this
vpr= (111223) .  
In most situations it is more efficient to supply the data to g02jcf in terms of categorical variables rather than transform them into dummy variables.

9.1 Internal Changes

Internal changes have been made to this routine as follows:
For details of all known issues which have been reported for the NAG Library please refer to the Known Issues.

10 Example

This example fits a random effects model with three levels of nesting to a simulated dataset with 90 observations and 12 variables.

10.1 Program Text

Program Text (g02jefe.f90)

10.2 Program Data

Program Data (g02jefe.d)

10.3 Program Results

Program Results (g02jefe.r)