NAG Library Function Document
nag_hier_mixed_init (g02jcc)
1 Purpose
nag_hier_mixed_init (g02jcc) preprocesses a dataset prior to fitting a linear mixed effects regression model of the following form via either
nag_reml_hier_mixed_regsn (g02jdc) or
nag_ml_hier_mixed_regsn (g02jec).
2 Specification
#include <nag.h> |
#include <nagg02.h> |
void |
nag_hier_mixed_init (Nag_OrderType order,
Integer n,
Integer ncol,
const double dat[],
Integer pddat,
const Integer levels[],
const double y[],
const double wt[],
const Integer fixed[],
Integer lfixed,
Integer nrndm,
const Integer rndm[],
Integer lrndm,
Integer *nff,
Integer *nlsv,
Integer *nrf,
double rcomm[],
Integer lrcomm,
Integer icomm[],
Integer licomm,
NagError *fail) |
|
3 Description
nag_hier_mixed_init (g02jcc) must be called prior to fitting a linear mixed effects regression model with either
nag_reml_hier_mixed_regsn (g02jdc) or
nag_ml_hier_mixed_regsn (g02jec).
The model fitting functions
nag_reml_hier_mixed_regsn (g02jdc) and
nag_ml_hier_mixed_regsn (g02jec) fit a model of the following form:
where |
is a vector of observations on the dependent variable,
|
|
is an by design matrix of fixed independent variables, |
|
is a vector of unknown fixed effects, |
|
is an by design matrix of random independent variables, |
|
is a vector of length of unknown random effects, |
|
is a vector of length of unknown random errors,
|
and
and
are Normally distributed with expectation zero and variance/covariance matrix defined by
where
,
is the
identity matrix and
is a diagonal matrix.
Case weights can be incorporated into the model by replacing and with and respectively where is a diagonal weight matrix.
4 References
None.
5 Arguments
- 1:
order – Nag_OrderTypeInput
-
On entry: the
order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by
. See
Section 3.2.1.3 in the Essential Introduction for a more detailed explanation of the use of this argument.
Constraint:
or .
- 2:
n – IntegerInput
On entry:
, the number of observations.
The effective number of observations, that is the number of observations with nonzero weight (see
wt for more detail), must be greater than the number of fixed effects in the model (as returned in
nff).
Constraint:
.
- 3:
ncol – IntegerInput
On entry: the number of columns in the data matrix,
dat.
Constraint:
.
- 4:
dat[] – const doubleInput
-
Note: the dimension,
dim, of the array
dat
must be at least
- when ;
- when .
Where
appears in this document, it refers to the array element
- when ;
- when .
On entry: a matrix of data, with
holding the
th observation on the
th variable. The two design matrices
and
are constructed from
dat and the information given in
fixed (for
) and
rndm (for
).
Constraint:
if .
- 5:
pddat – IntegerInput
-
On entry: the stride separating row or column elements (depending on the value of
order) in the array
dat.
Constraints:
- if ,
;
- if , .
- 6:
levels[ncol] – const IntegerInput
On entry:
contains the number of levels associated with the
th variable held in
dat.
If the th variable is continuous or binary (i.e., only takes the values zero or one) then must be set to . Otherwise the th variable is assumed to take an integer value between and , (i.e., the th variable is discrete with levels).
Constraint:
, for .
- 7:
y[n] – const doubleInput
On entry: , the vector of observations on the dependent variable.
- 8:
wt[n] – const doubleInput
On entry: optionally, the weights to be used in the weighted regression.
If , the th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
If weights are not provided then
wt must be set to the null pointer, i.e.,
(double *)0, and the effective number of observations is
n.
Constraint:
if , , for .
- 9:
fixed[lfixed] – const IntegerInput
On entry: defines the structure of the fixed effects design matrix,
.
- The number of variables, , to include as fixed effects (not including the intercept if present).
- The fixed intercept flag which must contain if a fixed intercept is to be included and otherwise.
- The column of DAT holding the
th fixed variable, for .
See
Section 9.1 for more details on the construction of
.
Constraints:
- ;
- ;
- , for .
- 10:
lfixed – IntegerInput
On entry: length of the vector
fixed.
Constraint:
.
- 11:
nrndm – IntegerInput
On entry: the second dimension of the random effects design matrix
RNDM.
Constraint:
.
- 12:
rndm[] – const IntegerInput
Note: where
appears in this document, it refers to the array element
- when ;
- when .
On entry:
defines the structure of the
random effects design matrix,
. The
th column of
RNDM defines a block of columns in the design matrix
.
- The number of variables, , to include as random effects in the th block (not including the random intercept if present).
- The random intercept flag which must contain if block includes a random intercept and otherwise.
- The column of DAT holding the
th random variable in the th block, for .
- The number of subject variables, , for the th block. The subject variables define the nesting structure for this block.
- The column of DAT holding the
th subject variable in the th block, for .
See
Section 9.2 for more details on the construction of
.
Constraints:
- ;
- ;
- at least one random variable or random intercept must be specified in each block, i.e., ;
- the column identifiers associated with the random variables must be in the range to ncol, i.e., , for ;
- ;
- the column identifiers associated with the subject variables must be in the range to ncol, i.e., , for .
- 13:
lrndm – IntegerInput
On entry: maximum number of entries in any column of
RNDM.
Constraint:
.
- 14:
nff – Integer *Output
On exit: , the number of fixed effects estimated, i.e., the number of columns in the design matrix .
- 15:
nlsv – Integer *Output
On exit: the number of levels for the overall subject variable (see
Section 9.2 for a description of what this means). If there is no overall subject variable,
.
- 16:
nrf – Integer *Output
On exit: the number of random effects estimated in each of the overall subject blocks. The number of columns in the design matrix is given by .
- 17:
rcomm[lrcomm] – doubleCommunication Array
On exit: communication array as required by the analysis functions
nag_reml_hier_mixed_regsn (g02jdc) and
nag_ml_hier_mixed_regsn (g02jec).
- 18:
lrcomm – IntegerInput
On entry: the dimension of the array
rcomm.
Constraint:
.
- 19:
icomm[licomm] – IntegerCommunication Array
On exit: if
,
holds the minimum required value for
licomm and
holds the minimum required value for
lrcomm, otherwise
icomm is a communication array as required by the analysis functions
nag_reml_hier_mixed_regsn (g02jdc) and
nag_ml_hier_mixed_regsn (g02jec).
- 20:
licomm – IntegerInput
On entry: the dimension of the array
icomm.
Constraint:
or
.
where
- ,
- ,
- ,
- ,
- , and
-
- 21:
fail – NagError *Input/Output
-
The NAG error argument (see
Section 3.6 in the Essential Introduction).
6 Error Indicators and Warnings
- NE_ALLOC_FAIL
-
Dynamic memory allocation failed.
- NE_BAD_PARAM
-
On entry, argument had an illegal value.
- NE_INT
-
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INT_2
-
On entry, and .
Constraint: .
On entry, and .
Constraint: .
- NE_INT_ARRAY
-
On entry, index of fixed variable is less than or greater than : , index and .
On entry, index of random variable in random statement is less than or greater than : , , index and .
On entry, invalid value for fixed intercept flag: value .
On entry, invalid value for random intercept flag for random statement : , value .
On entry, .
Constraint: .
On entry, must be at least one parameter, or an intercept in each random statement : .
On entry, nesting variable in random statement has one level: , .
On entry, number of fixed parameters, is less than zero.
On entry, number of random parameters for random statement is less than : , number of parameters .
On entry, number of subject parameters for random statement is less than : , number of parameters .
- NE_INTERNAL_ERROR
-
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
- NE_REAL_ARRAY
-
On entry, no observations due to zero weights.
On entry, variable of observation is less than or greater than : , , value , .
On entry, .
Constraint: .
- NE_TOO_MANY
-
On entry, more fixed factors than observations, .
Constraint: .
7 Accuracy
Not applicable.
8 Parallelism and Performance
Not applicable.
9.1 Construction of the fixed effects design matrix,
Let
- denote the number of fixed variables, that is ;
- denote the th fixed variable, that is the vector of values held in the th column of DAT when ;
- denote the th element of ;
- denote the number of levels for , that is ;
- denoted an indicator function that returns a vector of values whose th element is if and otherwise.
The design matrix for the
fixed effects,
, is constructed as follows:
- set to
zero
and the flag to false;
- if a fixed intercept is included, that is ,
- set the first column of to a vector of s;
- set ;
- set to true;
- loop over each fixed variable, so for each ,
- if ,
- set the th column of to be ;
- set ;
- else
- if is false then
- set the columns, to , of to , for ;
- set ;
- set to true;
-
else
- set the columns, to , of to , for ;
- set .
The number of columns in the design matrix,
, is therefore given by
This quantity is returned in
nff.
In summary, nag_hier_mixed_init (g02jcc) converts all non-binary categorical variables (i.e., where
) to dummy variables. If a fixed intercept is included in the model then the first level of all such variables is dropped. If a fixed intercept is not included in the model then the first level of all such variables, other than the first, is dropped. The variables are added into the model in the order they are specified in
fixed.
9.2 Construction of random effects design matrix,
Let
- denote the number of random variables in the th random statement, that is ;
- denote the th random variable from the th random statement, that is the vector of values held in the th column of DAT when ;
- denote the th element of ;
- denote the number of levels for , that is ;
- denoted an indicator function that returns a vector of values whose th element is if and otherwise;
- denote the number of subject variables in the th random statement, that is ;
- denote the th subject variable from the th random statement, that is the vector of values held in the th column of DAT when ;
- denote the th element of ;
- denote the number of levels for , that is ;
- denoted an indicator function that returns a vector of values whose th element is if for all and otherwise.
The design matrix for the
random effects,
, is constructed as follows:
- set to zero;
- loop over each random statement, so for each ,
- loop over each level of the last subject variable, so for each ,
-
- loop over each level of the second subject variable, so for each ,
- loop over each level of the first subject variable, so for each ,
- if a random intercept is included, that is ,
- set the th column of to ;
- set ;
- loop over each random variable in the th random statement, so for each ,
- if ,
- set the th column of to where indicates an element-wise multiplication between the two vectors, and ;
- set ;
- else
- set the columns, to , of to , for . As before, indicates an element-wise multiplication between the two vectors, and ;
- set .
In summary, each column of
RNDM defines a block of consecutive columns in
. nag_hier_mixed_init (g02jcc) converts all non-binary categorical variables (i.e., where
or
) to dummy variables. All random variables defined within a column of
RNDM are nested within all subject variables defined in the same column of
RNDM. In addition each of the subject variables are nested within each other, starting with the first (i.e., each of the
are nested within
which in turn is nested within
, which in turn is nested within
, etc.).
If the last subject variable in each column of
RNDM are the same (i.e.,
) then all random effects in the model are nested within this variable. In such instances the last subject variable (
) is called the overall subject variable. The fact that all of the random effects in the model are nested within the overall subject variable means that
is block diagonal in structure. This fact can be utilised to improve the efficiency of the underlying computation and reduce the amount of internal storage required. The number of levels in the overall subject variable is returned in
.
If the last
subject variables in each column of
RNDM are the same, for
then the overall subject variable is defined as the interaction of these
variables and
If there is no overall subject variable then .
The number of columns in the design matrix is given by .
9.3 The rndm argument
To illustrate some additional points about the
rndm argument, we assume that we have a dataset with three discrete variables,
,
and
, with
and
levels respectively, and that
is in the first column of
DAT,
in the second and
the third. Also assume that we wish to fit a model containing
along with
nested within
, as random effects. In order to do this the
RNDM matrix requires two columns:
The first column,
, indicates one random variable (
), no intercept (
), the random variable is in the first column of
DAT (
), there are no subject variables; as no nesting is required for
(
). The last element in this column is ignored.
The second column,
, indicates one random variable (
), no intercept (
), the random variable is in the second column of
DAT , there is one subject variable (
), and the subject variable is in the third column of
dat .
The corresponding
matrix would have
columns, with
coming from
and
(
) from
nested within
. The, symmetric,
matrix has the form
where
indicates a structural zero, i.e., it always takes the value
, irrespective of the data, and
a value that is not a structural zero. The first two rows and columns of
correspond to
. The block diagonal matrix in the 12 rows and columns in the bottom right correspond to
nested within
. With the
blocks corresponding to the levels of
. There are three blocks as the subject variable (
) has three levels.
The model fitting functions,
nag_reml_hier_mixed_regsn (g02jdc) and
nag_ml_hier_mixed_regsn (g02jec), use the sweep algorithm to calculate the log likelihood function for a given set of variance components. This algorithm consists of moving down the diagonal elements (called pivots) of a matrix which is similar in structure to
, and updating each element in that matrix. When using the
diagonal element of a matrix
, an element
, is adjusted by an amount equal to
. This process can be referred to as sweeping on the
th pivot. As there are no structural zeros in the first row or column of the above
, sweeping on the first pivot of
would alter each element of the matrix and therefore destroy the structural zeros, i.e., we could no longer guarantee they would be zero.
Reordering the
RNDM matrix to
i.e., the swapping the two columns, results in a
matrix of the form
This matrix is identical to the previous one, except the first two rows and columns have become the last two rows and columns. Sweeping a matrix, , of this form on the first pivot will only affect those elements , where , which is only the th and th row and columns, and the top left hand block of rows and columns. The block diagonal nature of the first rows and columns therefore greatly reduces the amount of work the algorithm needs to perform.
nag_hier_mixed_init (g02jcc) constructs the
as specified by the
RNDM matrix, and does not attempt to reorder it to improve performance. Therefore for best performance some thought is required on what ordering to use. In general it is more efficient to structure
RNDM in such a way that the first row relates to the deepest level of nesting, the second to the next level, etc..
10 Example