G03DBF (PDF version)
G03 Chapter Contents
G03 Chapter Introduction
NAG Library Manual

NAG Library Routine Document

G03DBF

Note:  before using this routine, please read the Users' Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent details.

+ Contents

    1  Purpose
    7  Accuracy

1  Purpose

G03DBF computes Mahalanobis squared distances for group or pooled variance-covariance matrices. It is intended for use after G03DAF.

2  Specification

SUBROUTINE G03DBF ( EQUAL, MODE, NVAR, NG, GMN, LDGMN, GC, NOBS, M, ISX, X, LDX, D, LDD, WK, IFAIL)
INTEGER  NVAR, NG, LDGMN, NOBS, M, ISX(*), LDX, LDD, IFAIL
REAL (KIND=nag_wp)  GMN(LDGMN,NVAR), GC((NG+1)*NVAR*(NVAR+1)/2), X(LDX,*), D(LDD,NG), WK(2*NVAR)
CHARACTER(1)  EQUAL, MODE

3  Description

Consider p variables observed on ng populations or groups. Let x-j be the sample mean and Sj the within-group variance-covariance matrix for the jth group and let xk be the kth sample point in a dataset. A measure of the distance of the point from the jth population or group is given by the Mahalanobis distance, Dkj:
Dkj2=xk-x-jTSj-1xk-x-j.
If the pooled estimated of the variance-covariance matrix S is used rather than the within-group variance-covariance matrices, then the distance is:
Dkj2=xk-x-jTS-1xk-x-j.
Instead of using the variance-covariance matrices S and Sj, G03DBF uses the upper triangular matrices R and Rj supplied by G03DAF such that S=RTR and Sj=RjTRj. Dkj2 can then be calculated as zTz where Rjz=xk-x-j or Rz=xk-x-j as appropriate.
A particular case is when the distance between the group or population means is to be estimated. The Mahalanobis squared distance between the ith and jth groups is:
Dij2=x-i-x-jTSj-1x-i-x-j
or
Dij2=x-i-x-jTS-1x-i-x-j.
Note:  Djj2=0 and that in the case when the pooled variance-covariance matrix is used Dij2=Dji2 so in this case only the lower triangular values of Dij2, i>j, are computed.

4  References

Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

5  Parameters

1:     EQUAL – CHARACTER(1)Input
On entry: indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.
EQUAL='E'
The within-group variance-covariance matrices are assumed equal and the matrix R stored in the first pp+1/2 elements of GC is used.
EQUAL='U'
The within-group variance-covariance matrices are assumed to be unequal and the matrices Rj, for j=1,2,,ng, stored in the remainder of GC are used.
Constraint: EQUAL='E' or 'U'.
2:     MODE – CHARACTER(1)Input
On entry: indicates whether distances from sample points are to be calculated or distances between the group means.
MODE='S'
The distances between the sample points given in X and the group means are calculated.
MODE='M'
The distances between the group means will be calculated.
Constraint: MODE='M' or 'S'.
3:     NVAR – INTEGERInput
On entry: p, the number of variables in the variance-covariance matrices as specified to G03DAF.
Constraint: NVAR1.
4:     NG – INTEGERInput
On entry: the number of groups, ng.
Constraint: NG2.
5:     GMN(LDGMN,NVAR) – REAL (KIND=nag_wp) arrayInput
On entry: the jth row of GMN contains the means of the p selected variables for the jth group, for j=1,2,,ng. These are returned by G03DAF.
6:     LDGMN – INTEGERInput
On entry: the first dimension of the array GMN as declared in the (sub)program from which G03DBF is called.
Constraint: LDGMNNG.
7:     GC(NG+1×NVAR×NVAR+1/2) – REAL (KIND=nag_wp) arrayInput
On entry: the first pp+1/2 elements of GC should contain the upper triangular matrix R and the next ng blocks of pp+1/2 elements should contain the upper triangular matrices Rj. All matrices must be stored packed by column. These matrices are returned by G03DAF. If EQUAL='E' only the first pp+1/2 elements are referenced, if EQUAL='U' only the elements pp+1/2+1 to ng+1pp+1/2 are referenced.
Constraints:
  • if EQUAL='E', R0.0;
  • if EQUAL='U', the diagonal elements of the Rj0.0, for j=1,2,,NG.
8:     NOBS – INTEGERInput
On entry: if MODE='S', the number of sample points in X for which distances are to be calculated.
If MODE='M', NOBS is not referenced.
Constraint: if NOBS1, MODE='S'.
9:     M – INTEGERInput
On entry: if MODE='S', the number of variables in the data array X.
If MODE='M', M is not referenced.
Constraint: if MNVAR, MODE='S'.
10:   ISX(*) – INTEGER arrayInput
Note: the dimension of the array ISX must be at least max1,M.
On entry: if MODE='S', ISXl indicates if the lth variable in X is to be included in the distance calculations. If ISXl>0 the lth variable is included, for l=1,2,,M; otherwise the lth variable is not referenced.
If MODE='M', ISX is not referenced.
Constraint: if MODE='S', ISXl>0 for NVAR values of l.
11:   X(LDX,*) – REAL (KIND=nag_wp) arrayInput
Note: the second dimension of the array X must be at least max1,M.
On entry: if MODE='S' the kth row of X must contain xk. That is Xkl must contain the kth sample value for the lth variable, for k=1,2,,NOBS and l=1,2,,M. Otherwise X is not referenced.
12:   LDX – INTEGERInput
On entry: the first dimension of the array X as declared in the (sub)program from which G03DBF is called.
Constraints:
  • if MODE='S', LDXNOBS;
  • otherwise 1.
13:   D(LDD,NG) – REAL (KIND=nag_wp) arrayOutput
On exit: the squared distances.
If MODE='S', Dkj contains the squared distance of the kth sample point from the jth group mean, Dkj2, for k=1,2,,NOBS and j=1,2,,ng.
If MODE='M' and EQUAL='U', Dij contains the squared distance between the ith mean and the jth mean, Dij2, for i=1,2,,ng and j=1,2,,i-1,i+1,,ng. The elements Dii are not referenced, for i=1,2,,ng.
If MODE='M' and EQUAL='E', Dij contains the squared distance between the ith mean and the jth mean, Dij2, for i=1,2,,ng and j=1,2,,i-1. Since Dij=Dji the elements Dij are not referenced, for i=1,2,,ng and j=i+1,,ng.
14:   LDD – INTEGERInput
On entry: the first dimension of the array D as declared in the (sub)program from which G03DBF is called.
Constraints:
  • if MODE='S', LDDNOBS;
  • if MODE='M', LDDNG.
15:   WK(2×NVAR) – REAL (KIND=nag_wp) arrayWorkspace
16:   IFAIL – INTEGERInput/Output
On entry: IFAIL must be set to 0, -1​ or ​1. If you are unfamiliar with this parameter you should refer to Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value -1​ or ​1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is 0. When the value -1​ or ​1 is used it is essential to test the value of IFAIL on exit.
On exit: IFAIL=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6  Error Indicators and Warnings

If on entry IFAIL=0 or -1, explanatory error messages are output on the current error message unit (as defined by X04AAF).
Errors or warnings detected by the routine:
IFAIL=1
On entry,NVAR<1,
orNG<2,
orLDGMN<NG,
orMODE='S' and NOBS<1,
orMODE='S' and M<NVAR,
orMODE='S' and LDX<NOBS,
orMODE='S' and LDD<NOBS,
orMODE='M' and LDD<NG,
orEQUAL'E' or ‘U’,
orMODE'M' or ‘S’.
IFAIL=2
On entry,MODE='S' and the number of variables indicated by ISX is not equal to NVAR,
orEQUAL='E' and a diagonal element of R is zero,
orEQUAL='U' and a diagonal element of Rj for some j is zero.

7  Accuracy

The accuracy will depend upon the accuracy of the input R or Rj matrices.

8  Further Comments

If the distances are to be used for discrimination, see also G03DCF.

9  Example

The data, taken from Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of 21 patients are input and the group means and R matrices are computed by G03DAF. A further six observations of unknown type are input, and the distances from the group means of the 21 patients of known type are computed under the assumption that the within-group variance-covariance matrices are not equal. These results are printed and indicate that the first four are close to one of the groups while observations 5 and 6 are some distance from any group.

9.1  Program Text

Program Text (g03dbfe.f90)

9.2  Program Data

Program Data (g03dbfe.d)

9.3  Program Results

Program Results (g03dbfe.r)


G03DBF (PDF version)
G03 Chapter Contents
G03 Chapter Introduction
NAG Library Manual

© The Numerical Algorithms Group Ltd, Oxford, UK. 2012