NAG Library ManualKeyword Search:

NAG Library Routine Document

g03daf (discrim)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

▸▿ 10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1

Purpose

g03daf computes a test statistic for the equality of within-group covariance matrices and also computes matrices for use in discriminant analysis.

2

Specification

Fortran Interface

Subroutine g03daf (

weight, n, m, x, ldx, isx, nvar, ing, ng, wt, nig, gmn, ldgmn, det, gc, stat, df, sig, wk, iwk, ifail)

Integer, Intent (In)	::	n, m, ldx, isx(m), nvar, ing(n), ng, ldgmn
Integer, Intent (Inout)	::	ifail
Integer, Intent (Out)	::	nig(ng), iwk(ng)
Real (Kind=nag_wp), Intent (In)	::	x(ldx,m), wt(*)
Real (Kind=nag_wp), Intent (Inout)	::	gmn(ldgmn,nvar)
Real (Kind=nag_wp), Intent (Out)	::	det(ng), gc((ng+1)nvar(nvar+1)/2), stat, df, sig, wk(n*(nvar+1))
Character (1), Intent (In)	::	weight

C Header Interface

#include nagmk26.h

void

g03daf_ ( const char *weight, const Integer *n, const Integer *m, const double x[], const Integer *ldx, const Integer isx[], const Integer *nvar, const Integer ing[], const Integer *ng, const double wt[], Integer nig[], double gmn[], const Integer *ldgmn, double det[], double gc[], double *stat, double *df, double *sig, double wk[], Integer iwk[], Integer *ifail, const Charlen length_weight)

3

Description

Let a sample of

n

observations on

p

variables come from

n_{g}

groups with

n_{j}

observations in the

j

th group and

\sum n_{j} = n

. If the data is assumed to follow a multivariate Normal distribution with the variance-covariance matrix of the

j

th group

Σ_{j}

, then to test for equality of the variance-covariance matrices between groups, that is,

Σ_{1} = Σ_{2} = \dots = Σ_{n_{g}} = Σ

, the following likelihood-ratio test statistic,

G

, can be used;

G = C \{(n - n_{g}) \log |S| - \sum_{j = 1}^{n_{g}} (n_{j} - 1) \log |S_{j}|\},

where

C = 1 - \frac{2 p^{2} + 3 p - 1}{6 (p + 1) (n_{g} - 1)} (\sum_{j = 1}^{n_{g}} \frac{1}{(n_{j} - 1)} - \frac{1}{(n - n_{g})}),

and

S_{j}

are the within-group variance-covariance matrices and

S

is the pooled variance-covariance matrix given by

S = \frac{\sum_{j = 1}^{n_{g}} (n_{j} - 1) S_{j}}{(n - n_{g})} .

For large

n

G

is approximately distributed as a

χ^{2}

variable with

\frac{1}{2} p (p + 1) (n_{g} - 1)

degrees of freedom, see Morrison (1967) for further comments. If weights are used, then

S

and

S_{j}

are the weighted pooled and within-group variance-covariance matrices and

n

is the effective number of observations, that is, the sum of the weights.

Instead of calculating the within-group variance-covariance matrices and then computing their determinants in order to calculate the test statistic, g03daf uses a

Q R

decomposition. The group means are subtracted from the data and then for each group, a

Q R

decomposition is computed to give an upper triangular matrix

R_{j}^{*}

. This matrix can be scaled to give a matrix

R_{j}

such that

S_{j} = R_{j}^{T} R_{j}

. The pooled

R

matrix is then computed from the

R_{j}

matrices. The values of

|S|

and the

|S_{j}|

can then be calculated from the diagonal elements of

R

and the

R_{j}

This approach means that the Mahalanobis squared distances for a vector observation

x

can be computed as

z^{T} z

, where

R_{j} z = (x - {\bar{x}}_{j})

{\bar{x}}_{j}

being the vector of means of the

j

th group. These distances can be calculated by g03dbf. The distances are used in discriminant analysis and g03dcf uses the results of g03daf to perform several different types of discriminant analysis. The differences between the discriminant methods are, in part, due to whether or not the within-group variance-covariance matrices are equal.

4

References

Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge

Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin

Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

5

Arguments

1: $weight$ – Character(1)Input

On entry: indicates if weights are to be used.

$weight ='U'$: No weights are used.
$weight ='W'$: Weights are to be used and must be supplied in wt.

Constraint:

weight ='U'

'W'

2: $n$ – IntegerInput

On entry:

n

, the number of observations.

Constraint:

n \geq 1

3: $m$ – IntegerInput

On entry: the number of variables in the data array x.

Constraint:

m \geq nvar

4: $x (ldx, m)$ – Real (Kind=nag_wp) arrayInput

On entry:

x (k, l)

must contain the

k

th observation for the

l

th variable, for

k = 1, 2, \dots, n

and

l = 1, 2, \dots, m

5: $ldx$ – IntegerInput

On entry: the first dimension of the array x as declared in the (sub)program from which g03daf is called.

Constraint:

ldx \geq n

6: $isx (m)$ – Integer arrayInput

On entry:

isx (l)

indicates whether or not the

l

th variable in x is to be included in the variance-covariance matrices.

isx (l) > 0

the

l

th variable is included, for

l = 1, 2, \dots, m

; otherwise it is not referenced.

Constraint:

isx (l) > 0

for nvar values of

l

7: $nvar$ – IntegerInput

On entry:

p

, the number of variables in the variance-covariance matrices.

Constraint:

nvar \geq 1

8: $ing (n)$ – Integer arrayInput

On entry:

ing (k)

indicates to which group the

k

th observation belongs, for

k = 1, 2, \dots, n

Constraint:

1 \leq ing (k) \leq ng

, for

k = 1, 2, \dots, n

The values of ing must be such that each group has at least nvar members.

9: $ng$ – IntegerInput

On entry: the number of groups,

n_{g}

Constraint:

ng \geq 2

10: $wt (*)$ – Real (Kind=nag_wp) arrayInput

Note: the dimension of the array wt must be at least

n

weight ='W'

, and at least

1

otherwise.

On entry: if

weight ='W'

the first

n

elements of wt must contain the weights to be used in the analysis and the effective number of observations for a group is the sum of the weights of the observations in that group. If

wt (k) = 0.0

the

k

th observation is excluded from the calculations.

weight ='U'

, wt is not referenced and the effective number of observations for a group is the number of observations in that group.

Constraint: if

weight ='W'

wt (k) \geq 0.0

, for

k = 1, 2, \dots, n

11: $nig (ng)$ – Integer arrayOutput

On exit:

nig (j)

contains the number of observations in the

j

th group, for

j = 1, 2, \dots, n_{g}

12: $gmn (ldgmn, nvar)$ – Real (Kind=nag_wp) arrayOutput

On exit: the

j

th row of gmn contains the means of the

p

selected variables for the

j

th group, for

j = 1, 2, \dots, n_{g}

13: $ldgmn$ – IntegerInput

On entry: the first dimension of the array gmn as declared in the (sub)program from which g03daf is called.

Constraint:

ldgmn \geq ng

14: $\det (ng)$ – Real (Kind=nag_wp) arrayOutput

On exit: the logarithm of the determinants of the within-group variance-covariance matrices.

15: $gc ((ng + 1) \times nvar \times (nvar + 1) / 2)$ – Real (Kind=nag_wp) arrayOutput

On exit: the first

p (p + 1) / 2

elements of gc contain

R

and the remaining

n_{g}

blocks of

p (p + 1) / 2

elements contain the

R_{j}

matrices. All are stored in packed form by columns.

16: $stat$ – Real (Kind=nag_wp)Output

On exit: the likelihood-ratio test statistic,

G

17: $df$ – Real (Kind=nag_wp)Output

On exit: the degrees of freedom for the distribution of

G

18: $sig$ – Real (Kind=nag_wp)Output

On exit: the significance level for

G

19: $wk (n \times (nvar + 1))$ – Real (Kind=nag_wp) arrayWorkspace

20: $iwk (ng)$ – Integer arrayWorkspace

21: $ifail$ – IntegerInput/Output

On entry: ifail must be set to

0

- 1 ​ or ​ 1

. If you are unfamiliar with this argument you should refer to Section 3.4 in How to Use the NAG Library and its Documentation for details.

For environments where it might be inappropriate to halt program execution when an error is detected, the value

- 1 ​ or ​ 1

is recommended. If the output of error messages is undesirable, then the value

1

is recommended. Otherwise, if you are not familiar with this argument, the recommended value is

0

. When the value $- 1 or 1$ is used it is essential to test the value of ifail on exit.

On exit:

ifail = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

6

Error Indicators and Warnings

If on entry

ifail = 0

- 1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Errors or warnings detected by the routine:

$ifail = 1$

On entry,	$nvar < 1$ ,
or	$n < 1$ ,
or	$ng < 2$ ,
or	$m < nvar$ ,
or	$ldx < n$ ,
or	$ldgmn < ng$ ,
or	$weight \neq'U'$ or $'W'$ .

$ifail = 2$

On entry,

weight ='W'

and a value of

wt < 0.0

$ifail = 3$

On entry,	there are not exactly nvar elements of $isx > 0$ ,
or	a value of ing is not in the range $1$ to ng,
or	the effective number of observations for a group is less than $1$ ,
or	a group has less than nvar members.

$ifail = 4$: $R$ or one of the $R_{j}$ is not of full rank.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 3.9 in How to Use the NAG Library and its Documentation for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 3.8 in How to Use the NAG Library and its Documentation for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 3.7 in How to Use the NAG Library and its Documentation for further information.

7

Accuracy

The accuracy is dependent on the accuracy of the computation of the

Q R

decomposition. See f08aef (dgeqrf) for further details.

8

Parallelism and Performance

g03daf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.

g03daf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9

Further Comments

The time taken will be approximately proportional to

n p^{2}

10

Example

The data, taken from Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of

21

patients are input and the statistics computed by g03daf. The printed results show that there is evidence that the within-group variance-covariance matrices are not equal.

NAG Library Routine Document

g03daf (discrim)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Parallelism and Performance

9

Further Comments

10

Example

10.1

Program Text

10.2

Program Data

10.3

Program Results