The routine may be called by the names g02jcf or nagf_correg_mixeff_hier_init.
3Description
g02jcf must be called prior to fitting a linear mixed effects regression model with either g02jdforg02jef.
The model fitting routines g02jdfandg02jef fit a model of the following form:
where
is a vector of observations on the dependent variable,
is an design matrix of fixed independent variables,
is a vector of unknown fixed effects,
is an design matrix of random independent variables,
is a vector of length of unknown random effects,
is a vector of length of unknown random errors,
and and are Normally distributed with expectation zero and variance/covariance matrix defined by
where
,
is the identity matrix and is a diagonal matrix.
Case weights can be incorporated into the model by replacing and
with and respectively where is a diagonal weight matrix.
4References
None.
5Arguments
1: – Character(1)Input
On entry: indicates if weights are to be used.
No weights are used.
Case weights are used and must be supplied in array wt.
Constraint:
or .
2: – IntegerInput
On entry: , the number of observations.
The effective number of observations, that is the number of observations with nonzero weight (see wt for more detail), must be greater than the number of fixed effects in the model (as returned in nff).
Constraint:
.
3: – IntegerInput
On entry: the number of columns in the data matrix, dat.
Constraint:
.
4: – Real (Kind=nag_wp) arrayInput
Note: the second dimension of the array dat
must be at least
.
On entry: a matrix of data, with holding the th observation on the th variable. The two design matrices and are constructed from dat and the information given in fixed (for ) and rndm (for ).
Constraint:
if .
5: – IntegerInput
On entry: the first dimension of the array dat as declared in the (sub)program from which g02jcf is called.
Constraint:
.
6: – Integer arrayInput
On entry: contains the number of levels associated with the th variable held in dat.
If the th variable is continuous or binary (i.e., only takes the values zero or one), then must be set to . Otherwise the th variable is assumed to take an integer value between and , (i.e., the th variable is discrete with levels).
Constraint:
, for .
7: – Real (Kind=nag_wp) arrayInput
On entry: , the vector of observations on the dependent variable.
8: – Real (Kind=nag_wp) arrayInput
Note: the dimension of the array wt
must be at least
if .
On entry: if , wt must contain the diagonal elements of the weight matrix .
If , the th observation is not included in the model and the effective number of observations is the number of observations with nonzero weights.
If , wt is not referenced and the effective number of observations is .
Constraint:
if , , for .
9: – Integer arrayInput
On entry: defines the structure of the fixed effects design matrix, .
The number of variables, , to include as fixed effects (not including the intercept if present).
The fixed intercept flag which must contain if a fixed intercept is to be included and otherwise.
The column of dat holding the
th fixed variable, for .
See Section 9.1 for more details on the construction of .
Note: the second dimension of the array rndm
must be at least
.
On entry: defines the structure of the random effects design matrix, . The th column of rndm defines a block of columns in the design matrix .
The number of variables, , to include as random effects in the th block (not including the random intercept if present).
The random intercept flag which must contain if block includes a random intercept and otherwise.
The column of dat holding the
th random variable in the th block, for .
The number of subject variables, , for the th block. The subject variables define the nesting structure for this block.
The column of dat holding the
th subject variable in the th block, for .
See Section 9.2 for more details on the construction of .
Constraints:
;
or ;
at least one random variable or random intercept must be specified in each block, i.e., ;
the column identifiers associated with the random variables must be in the range to ncol, i.e., , for ;
;
the column identifiers associated with the subject variables must be in the range to ncol, i.e., , for .
13: – IntegerInput
On entry: the first dimension of the array rndm as declared in the (sub)program from which g02jcf is called.
Constraint:
.
14: – IntegerOutput
On exit: , the number of fixed effects estimated, i.e., the number of columns in the design matrix .
15: – IntegerOutput
On exit: the number of levels for the overall subject variable (see Section 9.2 for a description of what this means). If there is no overall subject variable, .
16: – IntegerOutput
On exit: the number of random effects estimated in each of the overall subject blocks. The number of columns in the design matrix is given by .
17: – Real (Kind=nag_wp) arrayCommunication Array
On exit: communication array as required by the analysis routines g02jdfandg02jef.
18: – IntegerInput
On entry: the dimension of the array rcomm as declared in the (sub)program from which g02jcf is called.
Constraint:
.
19: – Integer arrayCommunication Array
On exit: if , holds the minimum required value for licomm and holds the minimum required value for lrcomm, otherwise icomm is a communication array as required by the analysis routines g02jdfandg02jef.
20: – IntegerInput
On entry: the dimension of the array icomm as declared in the (sub)program from which g02jcf is called.
Constraint:
or
where
,
,
,
,
, and
21: – IntegerInput/Output
On entry: ifail must be set to , or to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of means that an error message is printed while a value of means that it is not.
If halting is not appropriate, the value or is recommended. If message printing is undesirable, then the value is recommended. Otherwise, the value is recommended. When the value or is used it is essential to test the value of ifail on exit.
On exit: unless the routine detects an error or a warning has been flagged (see Section 6).
6Error Indicators and Warnings
If on entry or , explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
On entry, weight had an illegal value.
Constraint: or .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, variable of observation is less than or greater than : , , value , .
On entry, and .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, number of fixed parameters, is less than zero.
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, number of random parameters for random statement is less than : , number of parameters .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, more fixed factors than observations, .
Constraint: .
On entry, no observations due to zero weights.
On entry, invalid value for fixed intercept flag: value .
On entry, invalid value for random intercept flag for random statement : , value .
On entry, index of fixed variable is less than or greater than : , index and .
On entry, must be at least one parameter, or an intercept in each random statement : .
On entry, index of random variable in random statement is less than or greater than : , , index and .
On entry, number of subject parameters for random statement is less than : , number of parameters .
On entry, nesting variable in random statement has one level: , .
An unexpected error has been triggered by this routine. Please
contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.
7Accuracy
Not applicable.
8Parallelism and Performance
Background information to multithreading can be found in the Multithreading documentation.
g02jcf is not threaded in any implementation.
9Further Comments
9.1Construction of the fixed effects design matrix,
Let
denote the number of fixed variables, that is ;
denote the th fixed variable, that is the vector of values held in the th column of dat when ;
denote the th element of ;
denote the number of levels for , that is ;
denoted an indicator function that returns a vector of values whose th element is if and otherwise.
The design matrix for the fixed effects, , is constructed as follows:
set to
one
and the flag to false;
if a fixed intercept is included, that is ,
set the first column of to a vector of s;
set ;
set to true;
loop over each fixed variable, so for each ,
if ,
set the th column of to be ;
set ;
else
if is false then
set the columns, to , of to , for ;
set ;
set to true;
else
set the columns, to , of to , for ;
set .
The number of columns in the design matrix, , is, therefore, given by
In summary, g02jcf converts all non-binary categorical variables (i.e., where ) to dummy variables. If a fixed intercept is included in the model then the first level of all such variables is dropped. If a fixed intercept is not included in the model then the first level of all such variables, other than the first, is dropped. The variables are added into the model in the order they are specified in fixed.
9.2Construction of random effects design matrix,
Let
denote the number of random variables in the th random statement, that is ;
denote the th random variable from the th random statement, that is the vector of values held in the th column of dat when ;
denote the th element of ;
denote the number of levels for , that is ;
denoted an indicator function that returns a vector of values whose th element is if and otherwise;
denote the number of subject variables in the th random statement, that is ;
denote the th subject variable from the th random statement, that is the vector of values held in the th column of dat when ;
denote the th element of ;
denote the number of levels for , that is ;
denoted an indicator function that returns a vector of values whose th element is if for all and otherwise.
The design matrix for the random effects, , is constructed as follows:
set to one;
loop over each random statement, so for each ,
loop over each level of the last subject variable, so for each ,
loop over each level of the second subject variable, so for each ,
loop over each level of the first subject variable, so for each ,
if a random intercept is included, that is ,
set the th column of to ;
set ;
loop over each random variable in the th random statement, so for each ,
if ,
set the th column of to
where indicates an element-wise multiplication between the two vectors, and ;
set ;
else
set the columns, to , of to , for . As before, indicates an element-wise multiplication between the two vectors, and ;
set .
In summary, each column of rndm defines a block of consecutive columns in . g02jcf converts all non-binary categorical variables (i.e., where or ) to dummy variables. All random variables defined within a column of rndm are nested within all subject variables defined in the same column of rndm. In addition each of the subject variables are nested within each other, starting with the first (i.e., each of the are nested within which in turn is nested within , which in turn is nested within , etc.).
If the last subject variable in each column of rndm are the same (i.e., ) then all random effects in the model are nested within this variable. In such instances the last subject variable () is called the overall subject variable. The fact that all of the random effects in the model are nested within the overall subject variable means that is block diagonal in structure. This fact can be utilised to improve the efficiency of the underlying computation and reduce the amount of internal storage required. The number of levels in the overall subject variable is returned in .
If the last subject variables in each column of rndm are the same, for then the overall subject variable is defined as the interaction of these variables and
If there is no overall subject variable then .
The number of columns in the design matrix is given by .