NAG FL Interface
g03daf (discrim)
1
Purpose
g03daf computes a test statistic for the equality of within-group covariance matrices and also computes matrices for use in discriminant analysis.
2
Specification
Fortran Interface
Subroutine g03daf ( |
weight, n, m, x, ldx, isx, nvar, ing, ng, wt, nig, gmn, ldgmn, det, gc, stat, df, sig, wk, iwk, ifail) |
Integer, Intent (In) |
:: |
n, m, ldx, isx(m), nvar, ing(n), ng, ldgmn |
Integer, Intent (Inout) |
:: |
ifail |
Integer, Intent (Out) |
:: |
nig(ng), iwk(ng) |
Real (Kind=nag_wp), Intent (In) |
:: |
x(ldx,m), wt(*) |
Real (Kind=nag_wp), Intent (Inout) |
:: |
gmn(ldgmn,nvar) |
Real (Kind=nag_wp), Intent (Out) |
:: |
det(ng), gc((ng+1)*nvar*(nvar+1)/2), stat, df, sig, wk(n*(nvar+1)) |
Character (1), Intent (In) |
:: |
weight |
|
C Header Interface
#include <nag.h>
void |
g03daf_ (const char *weight, const Integer *n, const Integer *m, const double x[], const Integer *ldx, const Integer isx[], const Integer *nvar, const Integer ing[], const Integer *ng, const double wt[], Integer nig[], double gmn[], const Integer *ldgmn, double det[], double gc[], double *stat, double *df, double *sig, double wk[], Integer iwk[], Integer *ifail, const Charlen length_weight) |
|
C++ Header Interface
#include <nag.h> extern "C" {
void |
g03daf_ (const char *weight, const Integer &n, const Integer &m, const double x[], const Integer &ldx, const Integer isx[], const Integer &nvar, const Integer ing[], const Integer &ng, const double wt[], Integer nig[], double gmn[], const Integer &ldgmn, double det[], double gc[], double &stat, double &df, double &sig, double wk[], Integer iwk[], Integer &ifail, const Charlen length_weight) |
}
|
The routine may be called by the names g03daf or nagf_mv_discrim.
3
Description
Let a sample of
observations on
variables come from
groups with
observations in the
th group and
. If the data is assumed to follow a multivariate Normal distribution with the variance-covariance matrix of the
th group
, then to test for equality of the variance-covariance matrices between groups, that is,
, the following likelihood-ratio test statistic,
, can be used;
where
and
are the within-group variance-covariance matrices and
is the pooled variance-covariance matrix given by
For large
,
is approximately distributed as a
variable with
degrees of freedom, see
Morrison (1967) for further comments. If weights are used, then
and
are the weighted pooled and within-group variance-covariance matrices and
is the effective number of observations, that is, the sum of the weights.
Instead of calculating the within-group variance-covariance matrices and then computing their determinants in order to calculate the test statistic, g03daf uses a decomposition. The group means are subtracted from the data and then for each group, a decomposition is computed to give an upper triangular matrix . This matrix can be scaled to give a matrix such that . The pooled matrix is then computed from the matrices. The values of and the can then be calculated from the diagonal elements of and the .
This approach means that the Mahalanobis squared distances for a vector observation
can be computed as
, where
,
being the vector of means of the
th group. These distances can be calculated by
g03dbf. The distances are used in discriminant analysis and
g03dcf uses the results of
g03daf to perform several different types of discriminant analysis. The differences between the discriminant methods are, in part, due to whether or not the within-group variance-covariance matrices are equal.
4
References
Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill
5
Arguments
-
1:
– Character(1)
Input
-
On entry: indicates if weights are to be used.
- No weights are used.
- Weights are to be used and must be supplied in wt.
Constraint:
or .
-
2:
– Integer
Input
-
On entry: , the number of observations.
Constraint:
.
-
3:
– Integer
Input
-
On entry: the number of variables in the data array
x.
Constraint:
.
-
4:
– Real (Kind=nag_wp) array
Input
-
On entry: must contain the th observation for the th variable, for and .
-
5:
– Integer
Input
-
On entry: the first dimension of the array
x as declared in the (sub)program from which
g03daf is called.
Constraint:
.
-
6:
– Integer array
Input
-
On entry:
indicates whether or not the
th variable in
x is to be included in the variance-covariance matrices.
If
the th variable is included, for ; otherwise it is not referenced.
Constraint:
for
nvar values of
.
-
7:
– Integer
Input
-
On entry: , the number of variables in the variance-covariance matrices.
Constraint:
.
-
8:
– Integer array
Input
-
On entry: indicates to which group the th observation belongs, for .
Constraint:
, for
The values of
ing must be such that each group has at least
nvar members.
-
9:
– Integer
Input
-
On entry: the number of groups, .
Constraint:
.
-
10:
– Real (Kind=nag_wp) array
Input
-
Note: the dimension of the array
wt
must be at least
if
, and at least
otherwise.
On entry: if
the first
elements of
wt must contain the weights to be used in the analysis and the effective number of observations for a group is the sum of the weights of the observations in that group. If
the
th observation is excluded from the calculations.
If
,
wt is not referenced and the effective number of observations for a group is the number of observations in that group.
Constraint:
if , , for .
-
11:
– Integer array
Output
-
On exit: contains the number of observations in the th group, for .
-
12:
– Real (Kind=nag_wp) array
Output
-
On exit: the
th row of
gmn contains the means of the
selected variables for the
th group, for
.
-
13:
– Integer
Input
-
On entry: the first dimension of the array
gmn as declared in the (sub)program from which
g03daf is called.
Constraint:
.
-
14:
– Real (Kind=nag_wp) array
Output
-
On exit: the logarithm of the determinants of the within-group variance-covariance matrices.
-
15:
– Real (Kind=nag_wp) array
Output
-
On exit: the first
elements of
gc contain
and the remaining
blocks of
elements contain the
matrices. All are stored in packed form by columns.
-
16:
– Real (Kind=nag_wp)
Output
-
On exit: the likelihood-ratio test statistic, .
-
17:
– Real (Kind=nag_wp)
Output
-
On exit: the degrees of freedom for the distribution of .
-
18:
– Real (Kind=nag_wp)
Output
-
On exit: the significance level for .
-
19:
– Real (Kind=nag_wp) array
Workspace
-
-
20:
– Integer array
Workspace
-
-
21:
– Integer
Input/Output
-
On entry:
ifail must be set to
,
or
to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of means that an error message is printed while a value of means that it is not.
If halting is not appropriate, the value
or
is recommended. If message printing is undesirable, then the value
is recommended. Otherwise, the value
is recommended.
When the value or is used it is essential to test the value of ifail on exit.
On exit:
unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
or
, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
-
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: or .
-
On entry, and .
Constraint: .
-
On entry, , and .
Constraint: .
On entry,
and
values of
Constraint: exactly
nvar elements of
.
The effective number of observations for group is less than .
The number of observations for group
is less than
nvar.
-
is not of full rank.
is not of full rank for .
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 7 in the Introduction to the NAG Library FL Interface for further information.
Your licence key may have expired or may not have been installed correctly.
See
Section 8 in the Introduction to the NAG Library FL Interface for further information.
Dynamic memory allocation failed.
See
Section 9 in the Introduction to the NAG Library FL Interface for further information.
7
Accuracy
The accuracy is dependent on the accuracy of the computation of the
decomposition. See
f08aef for further details.
8
Parallelism and Performance
g03daf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g03daf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementation-specific information.
The time taken will be approximately proportional to .
10
Example
The data, taken from
Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of
patients are input and the statistics computed by
g03daf. The printed results show that there is evidence that the within-group variance-covariance matrices are not equal.
10.1
Program Text
10.2
Program Data
10.3
Program Results