nag_mv_discrim (g03dac) computes a test statistic for the equality of within-group covariance matrices and also computes matrices for use in discriminant analysis.
Let a sample of
observations on
variables come from
groups with
observations in the
th group and
. If the data is assumed to follow a multivariate Normal distribution with the variance-covariance matrix of the
th group
, then to test for equality of the variance-covariance matrices between groups, that is,
, the following likelihood-ratio test statistic,
, can be used;
where
and
are the within-group variance-covariance matrices and
is the pooled variance-covariance matrix given by
For large
,
is approximately distributed as a
variable with
degrees of freedom, see
Morrison (1967) for further comments. If weights are used, then
and
are the weighted pooled and within-group variance-covariance matrices and
is the effective number of observations, that is, the sum of the weights.
Instead of calculating the within-group variance-covariance matrices and then computing their determinants in order to calculate the test statistic, nag_mv_discrim (g03dac) uses a decomposition. The group means are subtracted from the data and then for each group, a decomposition is computed to give an upper triangular matrix . This matrix can be scaled to give a matrix such that . The pooled matrix is then computed from the matrices. The values of and the can then be calculated from the diagonal elements of and the .
This approach means that the Mahalanobis squared distances for a vector observation
can be computed as
, where
,
being the vector of means of the
th group. These distances can be calculated by
nag_mv_discrim_mahaldist (g03dbc). The distances are used in discriminant analysis and
nag_mv_discrim_group (g03dcc) uses the results of nag_mv_discrim (g03dac) to perform several different types of discriminant analysis. The differences between the discriminant methods are, in part, due to whether or not the within-group variance-covariance matrices are equal.
- 1:
n – IntegerInput
On entry: the number of observations, .
Constraint:
.
- 2:
m – IntegerInput
On entry: the number of variables in the data array
x.
Constraint:
.
- 3:
x[] – const doubleInput
-
On entry: must contain the th observation for the th variable, for and .
- 4:
tdx – IntegerInput
-
On entry: the stride separating matrix column elements in the array
x.
Constraint:
.
- 5:
isx[m] – const IntegerInput
-
On entry:
indicates whether or not the
th variable in
x is to be included in the variance-covariance matrices.
If the th variable is included, for ; otherwise it is not referenced.
Constraint:
for
nvar values of
.
- 6:
nvar – IntegerInput
On entry: the number of variables in the variance-covariance matrices, .
Constraint:
.
- 7:
ing[n] – const IntegerInput
-
On entry: indicates to which group the th observation belongs, for .
Constraint:
, for
The values of
ing must be such that each group has at least
nvar members
- 8:
ng – IntegerInput
On entry: the number of groups, .
Constraint:
.
- 9:
wt[n] – const doubleInput
-
On entry: the elements of
wt must contain the weights to be used in the analysis and the effective number of observations for a group is the sum of the weights of the observations in that group. If
then the
th observation is excluded from the calculations.
If weights are not provided then
wt must be set to
NULL and the effective number of observations for a group is the number of observations in that group.
Constraints:
- if wt is not NULL, , for ;
- the effective number of observations for each group must be greater than 1.
- 10:
nig[ng] – IntegerOutput
-
On exit: contains the number of observations in the th group, for .
- 11:
gmean[] – doubleOutput
-
Note: the th element of the matrix is stored in .
On exit: the
th row of
gmean contains the means of the
selected variables for the
th group, for
.
- 12:
tdg – IntegerInput
-
On entry: the stride separating matrix column elements in the array
gmean.
Constraint:
.
- 13:
det[ng] – doubleOutput
-
On exit: the logarithm of the determinants of the within-group variance-covariance matrices.
- 14:
gc[] – doubleOutput
-
Note: the dimension,
dim, of the array
gc
must be at least
.
On exit: the first
elements of
gc contain
and the remaining
blocks of
elements contain the
matrices. All are stored in packed form by columns.
- 15:
stat – double *Output
-
On exit: the likelihood-ratio test static, .
- 16:
df – double *Output
-
On exit: the degrees of freedom for the distribution of .
- 17:
sig – double *Output
-
On exit: the significance level for .
- 18:
fail – NagError *Input/Output
-
The NAG error argument (see
Section 3.6 in the Essential Introduction).
- NE_2_INT_ARG_LT
-
On entry, while . These arguments must satisfy .
On entry, while . These arguments must satisfy .
On entry, while . These arguments must satisfy .
- NE_ALLOC_FAIL
-
Dynamic memory allocation failed.
- NE_GROUP_OBSERV
-
On entry, group has effective observations.
Constraint: in each group the effective number of observations must be .
- NE_GROUP_VAR
-
On entry, group has members, while .
Constraint: number of members in each group .
- NE_GROUP_VAR_RANK
-
The variables in group are not of full rank.
- NE_INT_ARG_LT
-
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INTARR_INT
-
On entry, , .
Constraint: , for .
- NE_INTERNAL_ERROR
-
An internal error has occurred in this function. Check the function call
and any array sizes. If the call is correct then please contact
NAG for
assistance.
- NE_NEG_WEIGHT_ELEMENT
-
On entry,
.
Constraint: when referenced, all elements of
wt must be non-negative.
- NE_VAR_INCL_INDICATED
-
The number of variables,
nvar in the analysis
, while number of variables included in the analysis via array
. Constraint: these two numbers must be the same.
- NE_VAR_RANK
-
The variables are not of full rank.
Not applicable.
The data, taken from
Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of 21 patients are input and the statistics computed by nag_mv_discrim (g03dac). The printed results show that there is evidence that the within-group variance-covariance matrices are not equal.