NAG FL Interface
g02jcf (mixeff_hier_init)
1
Purpose
g02jcf preprocesses a dataset prior to fitting a linear mixed effects regression model of the following form via either
g02jdf or
g02jef.
2
Specification
Fortran Interface
Subroutine g02jcf ( |
weight, n, ncol, dat, lddat, levels, y, wt, fixed, lfixed, nrndm, rndm, ldrndm, nff, nlsv, nrf, rcomm, lrcomm, icomm, licomm, ifail) |
Integer, Intent (In) |
:: |
n, ncol, lddat, levels(ncol), fixed(lfixed), lfixed, nrndm, rndm(ldrndm,*), ldrndm, lrcomm, licomm |
Integer, Intent (Inout) |
:: |
ifail |
Integer, Intent (Out) |
:: |
nff, nlsv, nrf, icomm(licomm) |
Real (Kind=nag_wp), Intent (In) |
:: |
dat(lddat,*), y(n), wt(*) |
Real (Kind=nag_wp), Intent (Out) |
:: |
rcomm(lrcomm) |
Character (1), Intent (In) |
:: |
weight |
|
C Header Interface
#include <nag.h>
void |
g02jcf_ (const char *weight, const Integer *n, const Integer *ncol, const double dat[], const Integer *lddat, const Integer levels[], const double y[], const double wt[], const Integer fixed[], const Integer *lfixed, const Integer *nrndm, const Integer rndm[], const Integer *ldrndm, Integer *nff, Integer *nlsv, Integer *nrf, double rcomm[], const Integer *lrcomm, Integer icomm[], const Integer *licomm, Integer *ifail, const Charlen length_weight) |
|
C++ Header Interface
#include <nag.h> extern "C" {
void |
g02jcf_ (const char *weight, const Integer &n, const Integer &ncol, const double dat[], const Integer &lddat, const Integer levels[], const double y[], const double wt[], const Integer fixed[], const Integer &lfixed, const Integer &nrndm, const Integer rndm[], const Integer &ldrndm, Integer &nff, Integer &nlsv, Integer &nrf, double rcomm[], const Integer &lrcomm, Integer icomm[], const Integer &licomm, Integer &ifail, const Charlen length_weight) |
}
|
The routine may be called by the names g02jcf or nagf_correg_mixeff_hier_init.
3
Description
g02jcf must be called prior to fitting a linear mixed effects regression model with either
g02jdf or
g02jef.
The model fitting routines
g02jdf and
g02jef fit a model of the following form:
where |
is a vector of observations on the dependent variable, |
|
is an by design matrix of fixed independent variables, |
|
is a vector of unknown fixed effects, |
|
is an by design matrix of random independent variables, |
|
is a vector of length of unknown random effects, |
|
is a vector of length of unknown random errors, |
and
and
are Normally distributed with expectation zero and variance/covariance matrix defined by
where
,
is the
identity matrix and
is a diagonal matrix.
Case weights can be incorporated into the model by replacing and
with and respectively where is a diagonal weight matrix.
4
References
None.
5
Arguments
-
1:
– Character(1)
Input
-
On entry: indicates if weights are to be used.
- No weights are used.
- Case weights are used and must be supplied in array wt.
Constraint:
or .
-
2:
– Integer
Input
-
On entry:
, the number of observations.
The effective number of observations, that is the number of observations with nonzero weight (see
wt for more detail), must be greater than the number of fixed effects in the model (as returned in
nff).
Constraint:
.
-
3:
– Integer
Input
-
On entry: the number of columns in the data matrix,
dat.
Constraint:
.
-
4:
– Real (Kind=nag_wp) array
Input
Note: the second dimension of the array
dat
must be at least
.
On entry: a matrix of data, with
holding the
th observation on the
th variable. The two design matrices
and
are constructed from
dat and the information given in
fixed (for
) and
rndm (for
).
Constraint:
if .
-
5:
– Integer
Input
-
On entry: the first dimension of the array
dat as declared in the (sub)program from which
g02jcf is called.
Constraint:
.
-
6:
– Integer array
Input
-
On entry:
contains the number of levels associated with the
th variable held in
dat.
If the th variable is continuous or binary (i.e., only takes the values zero or one), then must be set to . Otherwise the th variable is assumed to take an integer value between and , (i.e., the th variable is discrete with levels).
Constraint:
, for .
-
7:
– Real (Kind=nag_wp) array
Input
-
On entry: , the vector of observations on the dependent variable.
-
8:
– Real (Kind=nag_wp) array
Input
Note: the dimension of the array
wt
must be at least
if
.
On entry: if
,
wt must contain the diagonal elements of the weight matrix
.
If , the th observation is not included in the model and the effective number of observations is the number of observations with nonzero weights.
If
,
wt is not referenced and the effective number of observations is
.
Constraint:
if , , for .
-
9:
– Integer array
Input
-
On entry: defines the structure of the fixed effects design matrix,
.
- The number of variables, , to include as fixed effects (not including the intercept if present).
- The fixed intercept flag which must contain if a fixed intercept is to be included and otherwise.
- The column of dat holding the
th fixed variable, for .
See
Section 9.1 for more details on the construction of
.
Constraints:
- ;
- ;
- , for .
-
10:
– Integer
Input
-
On entry: length of the vector
fixed.
Constraint:
.
-
11:
– Integer
Input
-
On entry: the number of columns in
rndm.
Constraint:
.
-
12:
– Integer array
Input
Note: the second dimension of the array
rndm
must be at least
.
On entry:
defines the structure of the
random effects design matrix,
. The
th column of
rndm defines a block of columns in the design matrix
.
- The number of variables, , to include as random effects in the th block (not including the random intercept if present).
- The random intercept flag which must contain if block includes a random intercept and otherwise.
- The column of dat holding the
th random variable in the th block, for .
- The number of subject variables, , for the th block. The subject variables define the nesting structure for this block.
- The column of dat holding the
th subject variable in the th block, for .
See
Section 9.2 for more details on the construction of
.
Constraints:
- ;
- ;
- at least one random variable or random intercept must be specified in each block, i.e., ;
- the column identifiers associated with the random variables must be in the range to ncol, i.e., , for ;
- ;
- the column identifiers associated with the subject variables must be in the range to ncol, i.e., , for .
-
13:
– Integer
Input
-
On entry: the first dimension of the array
rndm as declared in the (sub)program from which
g02jcf is called.
Constraint:
.
-
14:
– Integer
Output
-
On exit: , the number of fixed effects estimated, i.e., the number of columns in the design matrix .
-
15:
– Integer
Output
-
On exit: the number of levels for the overall subject variable (see
Section 9.2 for a description of what this means). If there is no overall subject variable,
.
-
16:
– Integer
Output
-
On exit: the number of random effects estimated in each of the overall subject blocks. The number of columns in the design matrix is given by .
-
17:
– Real (Kind=nag_wp) array
Communication Array
-
On exit: communication array as required by the analysis routines
g02jdf and
g02jef.
-
18:
– Integer
Input
-
On entry: the dimension of the array
rcomm as declared in the (sub)program from which
g02jcf is called.
Constraint:
.
-
19:
– Integer array
Communication Array
-
On exit: if
,
holds the minimum required value for
licomm and
holds the minimum required value for
lrcomm, otherwise
icomm is a communication array as required by the analysis routines
g02jdf and
g02jef.
-
20:
– Integer
Input
-
On entry: the dimension of the array
icomm as declared in the (sub)program from which
g02jcf is called.
Constraint:
or
where
- ,
- ,
- ,
- ,
- , and
-
21:
– Integer
Input/Output
-
On entry:
ifail must be set to
,
. If you are unfamiliar with this argument you should refer to
Section 4 in the Introduction to the NAG Library FL Interface for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
is recommended. If the output of error messages is undesirable, then the value
is recommended. Otherwise, if you are not familiar with this argument, the recommended value is
.
When the value is used it is essential to test the value of ifail on exit.
On exit:
unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
or
, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
-
On entry,
weight had an illegal value.
Constraint:
or
.
-
On entry, .
Constraint: .
-
On entry, .
Constraint: .
-
On entry, variable of observation is less than or greater than : , , value , .
-
On entry, and .
Constraint: .
-
On entry, .
Constraint: .
-
On entry, .
Constraint: .
-
On entry, number of fixed parameters, is less than zero.
-
On entry, .
Constraint: .
-
On entry, .
Constraint: .
-
On entry, number of random parameters for random statement is less than : , number of parameters .
-
On entry, .
Constraint: .
-
On entry, .
Constraint: .
-
On entry, .
Constraint: .
-
On entry, more fixed factors than observations, .
Constraint: .
-
On entry, no observations due to zero weights.
-
On entry, invalid value for fixed intercept flag: value .
-
On entry, invalid value for random intercept flag for random statement : , value .
-
On entry, index of fixed variable is less than or greater than : , index and .
-
On entry, must be at least one parameter, or an intercept in each random statement : .
-
On entry, index of random variable in random statement is less than or greater than : , , index and .
-
On entry, number of subject parameters for random statement is less than : , number of parameters .
-
On entry, nesting variable in random statement has one level: , .
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 7 in the Introduction to the NAG Library FL Interface for further information.
Your licence key may have expired or may not have been installed correctly.
See
Section 8 in the Introduction to the NAG Library FL Interface for further information.
Dynamic memory allocation failed.
See
Section 9 in the Introduction to the NAG Library FL Interface for further information.
7
Accuracy
Not applicable.
8
Parallelism and Performance
g02jcf is not threaded in any implementation.
9.1
Construction of the fixed effects design matrix,
Let
- denote the number of fixed variables, that is ;
- denote the th fixed variable, that is the vector of values held in the th column of dat when ;
- denote the th element of ;
- denote the number of levels for , that is ;
- denoted an indicator function that returns a vector of values whose th element is if and otherwise.
The design matrix for the
fixed effects,
, is constructed as follows:
- set to
one
and the flag to false;
- if a fixed intercept is included, that is ,
- set the first column of to a vector of s;
- set ;
- set to true;
- loop over each fixed variable, so for each ,
- if ,
- set the th column of to be ;
- set ;
- else
- if is false then
- set the columns, to , of to , for ;
- set ;
- set to true;
-
else
- set the columns, to , of to , for ;
- set .
The number of columns in the design matrix,
, is therefore given by
This quantity is returned in
nff.
In summary,
g02jcf converts all non-binary categorical variables (i.e., where
) to dummy variables. If a fixed intercept is included in the model then the first level of all such variables is dropped. If a fixed intercept is not included in the model then the first level of all such variables, other than the first, is dropped. The variables are added into the model in the order they are specified in
fixed.
9.2
Construction of random effects design matrix,
Let
- denote the number of random variables in the th random statement, that is ;
- denote the th random variable from the th random statement, that is the vector of values held in the th column of dat when ;
- denote the th element of ;
- denote the number of levels for , that is ;
- denoted an indicator function that returns a vector of values whose th element is if and otherwise;
- denote the number of subject variables in the th random statement, that is ;
- denote the th subject variable from the th random statement, that is the vector of values held in the th column of dat when ;
- denote the th element of ;
- denote the number of levels for , that is ;
- denoted an indicator function that returns a vector of values whose th element is if for all and otherwise.
The design matrix for the
random effects,
, is constructed as follows:
- set to one;
- loop over each random statement, so for each ,
- loop over each level of the last subject variable, so for each ,
-
- loop over each level of the second subject variable, so for each ,
- loop over each level of the first subject variable, so for each ,
- if a random intercept is included, that is ,
- set the th column of to ;
- set ;
- loop over each random variable in the th random statement, so for each ,
- if ,
- set the th column of to
where indicates an element-wise multiplication between the two vectors, and ;
- set ;
- else
- set the columns, to , of to , for . As before, indicates an element-wise multiplication between the two vectors, and ;
- set .
In summary, each column of
rndm defines a block of consecutive columns in
.
g02jcf converts all non-binary categorical variables (i.e., where
or
) to dummy variables. All random variables defined within a column of
rndm are nested within all subject variables defined in the same column of
rndm. In addition each of the subject variables are nested within each other, starting with the first (i.e., each of the
are nested within
which in turn is nested within
, which in turn is nested within
, etc.).
If the last subject variable in each column of
rndm are the same (i.e.,
) then all random effects in the model are nested within this variable. In such instances the last subject variable (
) is called the overall subject variable. The fact that all of the random effects in the model are nested within the overall subject variable means that
is block diagonal in structure. This fact can be utilised to improve the efficiency of the underlying computation and reduce the amount of internal storage required. The number of levels in the overall subject variable is returned in
.
If the last
subject variables in each column of
rndm are the same, for
then the overall subject variable is defined as the interaction of these
variables and
If there is no overall subject variable then .
The number of columns in the design matrix is given by .
10
Example
See Section 10 in
g02jdf and
g02jef.