NAG Library Routine Document

G03FAF

+− Contents

1 Purpose

2 Specification

3 Description

4 References

5 Parameters

6 Error Indicators and Warnings

7 Accuracy

8 Further Comments

+− 9 Example

9.1 Program Text

9.2 Program Data

9.3 Program Results

1 Purpose

G03FAF performs a principal coordinate analysis also known as classical metric scaling.

2 Specification

SUBROUTINE G03FAF (

ROOTS, N, D, NDIM, X, LDX, EVAL, WK, IWK, IFAIL)

INTEGER	N, NDIM, LDX, IWK(5*N), IFAIL
REAL (KIND=nag_wp)	D(N(N-1)/2), X(LDX,NDIM), EVAL(N), WK(N(N+17)/2-1)
CHARACTER(1)	ROOTS

3 Description

For a set of

n

objects a distance matrix

D

can be calculated such that

d_{i j}

is a measure of how ‘far apart’ are objects

i

and

j

(see G03EAF for example). Principal coordinate analysis or metric scaling starts with a distance matrix and finds points

X

in Euclidean space such that those points have the same distance matrix. The aim is to find a small number of dimensions,

k ≪ (n - 1)

, that provide an adequate representation of the distances.

The principal coordinates of the points are computed from the eigenvectors of the matrix

E

where

e_{i j} = - 1 / 2 (d_{i j}^{2} - d_{i .}^{2} - d_{. j}^{2} + d_{. .}^{2})

with

d_{i .}^{2}

denoting the average of

d_{i j}^{2}

over the suffix

j

, etc.. The eigenvectors are then scaled by multiplying by the square root of the value of the corresponding eigenvalue.

Provided that the ordered eigenvalues,

λ_{i}

, of the matrix

E

are all positive,

\sum_{i = 1}^{k} λ_{i} / \sum_{i = 1}^{n - 1} λ_{i}

shows how well the data is represented in

k

dimensions. The eigenvalues will be non-negative if

E

is positive semidefinite. This will be true provided

d_{i j}

satisfies the inequality:

d_{i j} \leq d_{i k} + d_{j k}

for all

i, j, k

. If this is not the case the size of the negative eigenvalue reflects the amount of deviation from this condition and the results should be treated cautiously in the presence of large negative eigenvalues. See Krzanowski (1990) for further discussion. G03FAF provides the option for all eigenvalues to be computed so that the smallest eigenvalues can be checked.

4 References

Chatfield C and Collins A J (1980) Introduction to Multivariate Analysis Chapman and Hall

Gower J C (1966) Some distance properties of latent root and vector methods used in multivariate analysis Biometrika 53 325–338

Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

5 Parameters

1: ROOTS – CHARACTER(1)Input

On entry: indicates if all the eigenvalues are to be computed or just the NDIM largest.

$ROOTS ='A'$: All the eigenvalues are computed.
$ROOTS ='L'$: Only the largest NDIM eigenvalues are computed.

Constraint:

ROOTS ='A'

'L'

2: N – INTEGERInput

On entry:

n

, the number of objects in the distance matrix.

Constraint:

N > NDIM

3: D( $N \times (N - 1) / 2$ ) – REAL (KIND=nag_wp) arrayInput

On entry: the lower triangle of the distance matrix

D

stored packed by rows. That is

D ((i - 1) \times (i - 2) / 2 + j)

must contain

d_{i j}

for

i = 2, 3, \dots, n; j = 1, 2, \dots, i - 1

Constraint:

D (i) \geq 0.0

, for

i = 1, 2, \dots, n (n - 1) / 2

4: NDIM – INTEGERInput

On entry:

k

, the number of dimensions used to represent the data.

Constraint:

NDIM \geq 1

5: X(LDX,NDIM) – REAL (KIND=nag_wp) arrayOutput

On exit: the

i

th row contains

k

coordinates for the

i

th point,

i = 1, 2, \dots, n

6: LDX – INTEGERInput

On entry: the first dimension of the array X as declared in the (sub)program from which G03FAF is called.

Constraint:

LDX \geq N

7: EVAL(N) – REAL (KIND=nag_wp) arrayOutput

On exit: if

ROOTS ='A'

, EVAL contains the

n

scaled eigenvalues of the matrix

E

ROOTS ='L'

, EVAL contains the largest

k

scaled eigenvalues of the matrix

E

In both cases the eigenvalues are divided by the sum of the eigenvalues (that is, the trace of

E

8: WK( $N \times (N + 17) / 2 - 1$ ) – REAL (KIND=nag_wp) arrayWorkspace

9: IWK( $5 \times N$ ) – INTEGER arrayWorkspace

10: IFAIL – INTEGERInput/Output

On entry: IFAIL must be set to

0

- 1 ​ or ​ 1

. If you are unfamiliar with this parameter you should refer to Section 3.3 in the Essential Introduction for details.

For environments where it might be inappropriate to halt program execution when an error is detected, the value

- 1 ​ or ​ 1

is recommended. If the output of error messages is undesirable, then the value

1

is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is

0

. When the value $- 1 or 1$ is used it is essential to test the value of IFAIL on exit.

On exit:

IFAIL = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry

IFAIL = 0

- 1

, explanatory error messages are output on the current error message unit (as defined by X04AAF).

Errors or warnings detected by the routine:

$IFAIL = 1$

On entry,	$NDIM < 1$ ,
or	$N < NDIM$ ,
or	$ROOTS \neq'A'$ or $'L'$ ,
or	$LDX < N$ .

$IFAIL = 2$

On entry,	$D (i) < 0.0$ for some $i$ , $i = 1, 2, \dots, n (n - 1) / 2$ ,
or	all elements of $D = 0.0$ .

$IFAIL = 3$: There are less than NDIM eigenvalues greater than zero. Try a smaller number of dimensions (NDIM) or use non-metric scaling (G03FCF).

$IFAIL = 4$: The computation of the eigenvalues or eigenvectors has failed. Seek expert help.

7 Accuracy

G03FAF uses F08JFF (DSTERF) or F08JJF (DSTEBZ) to compute the eigenvalues and F08JKF (DSTEIN) to compute the eigenvectors. These routines should be consulted for a discussion of the accuracy of the computations involved.

8 Further Comments

Alternative, non-metric, methods of scaling are provided by G03FCF.

The relationship between principal coordinates and principal components, see G03FCF, is discussed in Krzanowski (1990) and Gower (1966).

9 Example

The data, given by Krzanowski (1990), are dissimilarities between water vole populations in Europe. The first two principal coordinates are computed.

NAG Library Routine DocumentG03FAF

+− Contents

1 Purpose

2 Specification

3 Description

4 References

5 Parameters

6 Error Indicators and Warnings

7 Accuracy

8 Further Comments

9 Example

9.1 Program Text

9.2 Program Data

9.3 Program Results

NAG Library Routine Document

G03FAF