PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_correg_mixeff_ml (g02jb)
Purpose
nag_correg_mixeff_ml (g02jb) fits a linear mixed effects regression model using maximum likelihood (ML).
Syntax
[
nff,
nrf,
df,
ml,
b,
se,
gamma,
warn,
ifail] = g02jb(
nvpr,
levels,
yvid,
fvid,
rvid,
svid,
cwid,
vpr,
dat,
fint,
rint,
lb,
gamma, 'n',
n, 'ncol',
ncol, 'nfv',
nfv, 'nrv',
nrv, 'maxit',
maxit, 'tol',
tol)
[
nff,
nrf,
df,
ml,
b,
se,
gamma,
warn,
ifail] = nag_correg_mixeff_ml(
nvpr,
levels,
yvid,
fvid,
rvid,
svid,
cwid,
vpr,
dat,
fint,
rint,
lb,
gamma, 'n',
n, 'ncol',
ncol, 'nfv',
nfv, 'nrv',
nrv, 'maxit',
maxit, 'tol',
tol)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 23: |
maxit and tol were made optional |
At Mark 22: |
n was made optional |
Description
nag_correg_mixeff_ml (g02jb) fits a model of the form:
where
- is a vector of observations on the dependent variable,
- is a known by design matrix for the fixed independent variables,
- is a vector of length of unknown fixed effects,
- is a known by design matrix for the random independent variables,
- is a vector of length of unknown random effects;
and
- is a vector of length of unknown random errors.
Both
and
are assumed to have a Gaussian distribution with expectation zero and
where
,
is the
identity matrix and
is a diagonal matrix. It is assumed that the random variables,
, can be subdivided into
groups with each group being identically distributed with expectations zero and variance
. The diagonal elements of matrix
therefore take one of the values
, depending on which group the associated random variable belongs to.
The model therefore contains three sets of unknowns, the fixed effects,
, the random effects
and a vector of
variance components,
, where
. Rather than working directly with
,
nag_correg_mixeff_ml (g02jb) uses an iterative process to estimate
. Due to the iterative nature of the estimation a set of initial values,
, for
is required.
nag_correg_mixeff_ml (g02jb) allows these initial values either to be supplied by you or calculated from the data using the minimum variance quadratic unbiased estimators (MIVQUE0) suggested by
Rao (1972).
nag_correg_mixeff_ml (g02jb) fits the model using a quasi-Newton algorithm to maximize the log-likelihood function:
where
Once the final estimates for
have been obtained, the value of
is given by:
Case weights, , can be incorporated into the model by replacing and with and respectively, for a diagonal weight matrix .
The log-likelihood,
, is calculated using the sweep algorithm detailed in
Wolfinger et al. (1994).
References
Goodnight J H (1979) A tutorial on the SWEEP operator The American Statistician 33(3) 149–158
Harville D A (1977) Maximum likelihood approaches to variance component estimation and to related problems JASA 72 320–340
Rao C R (1972) Estimation of variance and covariance components in a linear model J. Am. Stat. Assoc. 67 112–115
Stroup W W (1989) Predictable functions and prediction space in the mixed model procedure Applications of Mixed Models in Agriculture and Related Disciplines Southern Cooperative Series Bulletin No. 343 39–48
Wolfinger R, Tobias R and Sall J (1994) Computing Gaussian likelihoods and their derivatives for general linear mixed models SIAM Sci. Statist. Comput. 15 1294–1310
Parameters
Compulsory Input Parameters
- 1:
– int64int32nag_int scalar
-
If
and
,
nvpr is the number of variance components being
, (
), else
.
If , is not referenced.
Constraint:
if , .
- 2:
– int64int32nag_int array
-
contains the number of levels associated with the
th variable of the data matrix
dat. If this variable is continuous or binary (i.e., only takes the values zero or one) then
should be
; if the variable is discrete then
is the number of levels associated with it and
is assumed to take the values
to
, for
.
Constraint:
, for .
- 3:
– int64int32nag_int scalar
-
The column of
dat holding the dependent,
, variable.
Constraint:
.
- 4:
– int64int32nag_int array
-
The columns of the data matrix
dat holding the fixed independent variables with
holding the column number corresponding to the
th fixed variable.
Constraint:
, for .
- 5:
– int64int32nag_int array
-
The columns of the data matrix holding the random independent variables with holding the column number corresponding to the th random variable.
Constraint:
, for .
- 6:
– int64int32nag_int scalar
-
The column of
dat holding the subject variable.
If , no subject variable is used.
Specifying a subject variable is equivalent to specifying the interaction between that variable and all of the random-effects. Letting the notation denote the interaction between variables and , fitting a model with , random-effects and subject variable is equivalent to fitting a model with random-effects and no subject variable. If the model is equivalent to fitting and no subject variable.
Constraint:
.
- 7:
– int64int32nag_int scalar
-
The column of
dat holding the case weights.
If , no weights are used.
Constraint:
.
- 8:
– int64int32nag_int array
-
holds a flag indicating the variance of the th random variable. The variance of the th random variable is , where if and and otherwise. Random variables with the same value of are assumed to be taken from the same distribution.
Constraint:
, for .
- 9:
– double array
-
lddat, the first dimension of the array, must satisfy the constraint
.
Array containing all of the data. For the
th observation:
- holds the dependent variable, ;
- if , holds the case weights;
- if , holds the subject variable.
The remaining columns hold the values of the independent variables.
Constraints:
- if , ;
- if , .
- 10:
– int64int32nag_int scalar
-
Flag indicating whether a fixed intercept is included ().
Constraint:
or .
- 11:
– int64int32nag_int scalar
-
Flag indicating whether a random intercept is included (
).
If
,
rint is not referenced.
Constraint:
or .
- 12:
– int64int32nag_int scalar
-
Constraint:
where if and otherwise.
- 13:
– double array
-
Holds the initial values of the variance components,
, with
the initial value for
, for
. If
and
,
, else
.
If
, the remaining elements of
gamma are ignored and the initial values for the variance components are estimated from the data using MIVQUE0.
Constraint:
, for .
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the first dimension of the array
dat.
, the number of observations.
Constraint:
.
- 2:
– int64int32nag_int scalar
-
Default:
the dimension of the array
levels and the second dimension of the array
dat. (An error is raised if these dimensions are not equal.)
The number of columns in the data matrix,
dat.
Constraint:
.
- 3:
– int64int32nag_int scalar
-
Default:
the dimension of the array
fvid.
The number of independent variables in the model which are to be treated as being fixed.
Constraint:
.
- 4:
– int64int32nag_int scalar
-
Default:
the dimension of the arrays
rvid,
vpr. (An error is raised if these dimensions are not equal.)
The number of independent variables in the model which are to be treated as being random.
- 5:
– int64int32nag_int scalar
Default:
The maximum number of iterations.
If , the default value of is used.
If
, the parameter estimates
and corresponding standard errors are calculated based on the value of
supplied in
gamma.
- 6:
– double scalar
Default:
The tolerance used to assess convergence.
If
, the default value of
is used, where
is the
machine precision.
Output Parameters
- 1:
– int64int32nag_int scalar
-
The number of fixed effects estimated (i.e., the number of columns, , in the design matrix ).
- 2:
– int64int32nag_int scalar
-
The number of random effects estimated (i.e., the number of columns, , in the design matrix ).
- 3:
– int64int32nag_int scalar
-
The degrees of freedom.
- 4:
– double scalar
-
where
is the log of the maximum likelihood calculated at
, the estimated variance components returned in
gamma.
- 5:
– double array
-
The parameter estimates,
, with the first
nff elements of
b containing the fixed effect parameter estimates,
and the next
nrf elements of
b containing the random effect parameter estimates,
.
Fixed effects
If
,
contains the estimate of the fixed intercept. Let
denote the number of levels associated with the
th fixed variable, that is
. Define
- if , else if , ;
- , .
Then for
:
- if ,
contains the parameter estimate for the th level of the th fixed variable, for ;
- if , contains the parameter estimate for the th fixed variable.
Random effects
Redefining
to denote the number of levels associated with the
th random variable, that is
. Define
- if , else if , ;
, .
Then for
:
- if ,
- if ,
contains the parameter estimate for the th level of the th random variable, for ;
- if , contains the parameter estimate for the th random variable;
- if ,
- let denote the number of levels associated with the subject variable, that is ;
- if ,
contains the parameter estimate for the interaction between the th level of the subject variable and the th level of the th random variable, for and ;
- if ,
contains the parameter estimate for the interaction between the th level of the subject variable and the th random variable, for ;
- if , contains the estimate of the random intercept.
- 6:
– double array
-
The standard errors of the parameter estimates given in
b.
- 7:
– double array
-
, for , holds the final estimate of and holds the final estimate for .
- 8:
– int64int32nag_int scalar
-
Is set to
if a variance component was estimated to be a negative value during the fitting process. Otherwise
warn is set to
.
If , the negative estimate is set to zero and the estimation process allowed to continue.
- 9:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
-
-
On entry, | , |
or | , |
or | , |
or | or , |
or | or , |
or | or , |
or | and , |
or | or or , |
or | or , |
or | and , |
or | or , |
or | lb is too small. |
-
-
On entry, | , for at least one , |
or | , or , for at least one , |
or | , or , for at least one , |
or | or , for at least one , |
or | at least one discrete variable in array dat has a value greater than that specified in levels, |
or | , for at least one , and . |
-
-
Degrees of freedom . The number of arguments exceed the effective number of observations.
-
-
The function failed to converge to the specified tolerance in
maxit iterations. See
Further Comments for advice.
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
The accuracy of the results can be adjusted through the use of the
tol argument.
Further Comments
Wherever possible any block structure present in the design matrix
should be modelled through a subject variable, specified via
svid, rather than being explicitly entered into
dat.
nag_correg_mixeff_ml (g02jb) uses an iterative process to fit the specified model and for some problems this process may fail to converge (see
). If the function fails to converge then the maximum number of iterations (see
maxit) or tolerance (see
tol) may require increasing; try a different starting estimate in
gamma. Alternatively, the model can be fit using restricted maximum likelihood (see
nag_correg_mixeff_reml (g02ja)) or using the noniterative MIVQUE0.
To fit the model just using MIVQUE0, the first element of
gamma should be set to
and
maxit should be set to zero.
Although the quasi-Newton algorithm used in
nag_correg_mixeff_ml (g02jb) tends to require more iterations before converging compared to the Newton–Raphson algorithm recommended by
Wolfinger et al. (1994), it does not require the second derivatives of the likelihood function to be calculated and consequentially takes significantly less time per iteration.
Example
The following dataset is taken from
Stroup (1989) and arises from a balanced split-plot design with the whole plots arranged in a randomized complete block-design.
In this example the full design matrix for the random independent variable,
, is given by:
where
The block structure evident in
(1) is modelled by specifying a four-level subject variable, taking the values
. The first column of
is added to
by setting
. The remaining columns of
are specified by a three level factor, taking the values,
.
Open in the MATLAB editor:
g02jb_example
function g02jb_example
fprintf('g02jb example results\n\n');
dat = [56, 1, 1, 1, 1;
50, 1, 2, 1, 1;
39, 1, 3, 1, 1;
30, 2, 1, 1, 1;
36, 2, 2, 1, 1;
33, 2, 3, 1, 1;
32, 3, 1, 1, 1;
31, 3, 2, 1, 1;
15, 3, 3, 1, 1;
30, 4, 1, 1, 1;
35, 4, 2, 1, 1;
17, 4, 3, 1, 1;
41, 1, 1, 2, 1;
36, 1, 2, 2, 2;
35, 1, 3, 2, 3;
25, 2, 1, 2, 1;
28, 2, 2, 2, 2;
30, 2, 3, 2, 3;
24, 3, 1, 2, 1;
27, 3, 2, 2, 2;
19, 3, 3, 2, 3;
25, 4, 1, 2, 1;
30, 4, 2, 2, 2;
18, 4, 3, 2, 3];
[n,ncol] = size(dat);
levels = [int64(1);4;3;2;3];
yvid = int64(1);
fvid = [int64(3); 4; 5];
rvid = [int64(3)];
svid = int64(2);
cwid = int64(0);
fint = int64(1);
rint = int64(1);
lb = (rint + sum(levels(rvid)))*prod(levels(svid)) + ...
fint + sum(levels(fvid)) - numel(fvid);
vpr = [int64(1)];
nvpr = int64(numel(vpr));
gamma = [1; 1; 0];
[nff, nrf, df, ml, b, se, gamma, warn, ifail] = ...
g02jb( ...
nvpr, levels, yvid, fvid, rvid, svid, cwid, vpr, ...
dat, fint, rint, lb, gamma);
if warn
fprintf(['Warning: At least one variance component was ', ...
'estimated to be negative and then reset to zero\n\n']);
end
fprintf('Fixed effects (Estimate and Standard Deviation)\n\n');
k = 1;
if fint==1
fprintf('Intercept%15s%10.4f%10.4f\n', ' ',b(k), se(k));
k = k + 1;
end
for i = 1:numel(fvid)
for j = 1:levels(fvid(i))
if levels(fvid(i))==1 || j>1
fprintf('Variable %3d Level %3d: %10.4f%10.4f\n', i, j, b(k), se(k));
k = k + 1;
end
end
end
fprintf('\nRandom Effects (Estimate and Standard Deviation\n\n');
if svid==0
for i = 1:numel(rvid)
for j = 1:levels(rvid(i))
fprintf('Variable %4d Level %4d: %10.4f %10.4f\n', i, j, b(k), se(k));
k = k + 1;
end
end
else
for l = 1:levels(svid)
if (rint==1)
fprintf('Intercept for Subject Level %4d:%12s%10.4f%10.4f\n', ...
l, ' ', b(k), se(k));
k = k + 1;
end
for i = 1:numel(rvid)
for j = 1:levels(rvid(i))
fprintf('Subject Level %4d Variable %4d Level %4d: %10.4f%10.4f\n', ...
l, i, j, b(k), se(k));
k = k + 1;
end
end
end
end
fprintf('\n Variance Components\n');
for i = 1:nvpr+rint
fprintf('%4d%10.4f\n',i,gamma(i));
end
fprintf('\nsigma^2 = %10.4f\n', gamma(nvpr+rint+1));
fprintf('-2log likelihood = %10.4f\n', ml);
fprintf('DF = %16d\n', df);
g02jb example results
Fixed effects (Estimate and Standard Deviation)
Intercept 37.0000 4.0421
Variable 1 Level 2: 1.0000 3.0461
Variable 1 Level 3: -11.0000 3.0461
Variable 2 Level 2: -8.2500 1.8736
Variable 3 Level 2: 0.5000 2.6497
Variable 3 Level 3: 7.7500 2.6497
Random Effects (Estimate and Standard Deviation
Intercept for Subject Level 1: 10.7631 3.8855
Subject Level 1 Variable 1 Level 1: 3.7276 2.6268
Subject Level 1 Variable 1 Level 2: -1.4476 2.6268
Subject Level 1 Variable 1 Level 3: 0.3733 2.6268
Intercept for Subject Level 2: -0.5269 3.8855
Subject Level 2 Variable 1 Level 1: -3.7171 2.6268
Subject Level 2 Variable 1 Level 2: -1.2253 2.6268
Subject Level 2 Variable 1 Level 3: 4.8125 2.6268
Intercept for Subject Level 3: -5.6450 3.8855
Subject Level 3 Variable 1 Level 1: 0.5903 2.6268
Subject Level 3 Variable 1 Level 2: 0.3987 2.6268
Subject Level 3 Variable 1 Level 3: -2.3806 2.6268
Intercept for Subject Level 4: -4.5912 3.8855
Subject Level 4 Variable 1 Level 1: -0.6009 2.6268
Subject Level 4 Variable 1 Level 2: 2.2742 2.6268
Subject Level 4 Variable 1 Level 3: -2.8052 2.6268
Variance Components
1 46.7969
2 11.5365
sigma^2 = 7.0208
-2log likelihood = 141.6877
DF = 16
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015