Integer, Intent (In)	::	n, ndim, ldx, iter, iopt
Integer, Intent (Inout)	::	ifail
Integer, Intent (Out)	::	iwk(n(n-1)/2+nndim+5)
Real (Kind=nag_wp), Intent (In)	::	d(n*(n-1)/2)
Real (Kind=nag_wp), Intent (Inout)	::	x(ldx,ndim)
Real (Kind=nag_wp), Intent (Out)	::	stress, dfit(2n(n-1)), wk(15nndim)
Character (1), Intent (In)	::	typ

C Header Interface

#include <nag.h>

void	g03fcf_ (const char typ, const Integer n, const Integer ndim, const double d[], double x[], const Integer ldx, double stress, double dfit[], const Integer iter, const Integer iopt, double wk[], Integer iwk[], Integer ifail, const Charlen length_typ)

The routine may be called by the names g03fcf or nagf_mv_multidimscal_ordinal.

3 Description

For a set of

n

objects, a distance or dissimilarity matrix

D

can be calculated such that

d_{i j}

is a measure of how ‘far apart’ the objects

i

and

j

are. If

p

variables

x_{k}

have been recorded for each observation this measure may be based on Euclidean distance,

d_{i j} = \sum_{k = 1}^{p} {(x_{k i} - x_{k j})}^{2}

, or some other calculation such as the number of variables for which

x_{k j} \neq x_{k i}

. Alternatively, the distances may be the result of a subjective assessment. For a given distance matrix, multidimensional scaling produces a configuration of

n

points in a chosen number of dimensions,

m

, such that the distance between the points in some way best matches the distance matrix. For some distance measures, such as Euclidean distance, the size of distance is meaningful, for other measures of distance all that can be said is that one distance is greater or smaller than another. For the former metric scaling can be used, see g03faf, for the latter, a non-metric scaling is more appropriate.

For non-metric multidimensional scaling, the criterion used to measure the closeness of the fitted distance matrix to the observed distance matrix is known as

STRESS

STRESS

is given by,

\sqrt{\frac{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {(\hat{d_{i j}} - \tilde{d_{i j}})}^{2}}{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {\hat{d_{i j}}}^{2}}}

where

{\hat{d_{i j}}}^{2}

is the Euclidean squared distance between points

i

and

j

and

\tilde{d_{i j}}

is the fitted distance obtained when

\hat{d_{i j}}

is monotonically regressed on

d_{i j}

, that is

\tilde{d_{i j}}

is monotonic relative to

d_{i j}

and is obtained from

\hat{d_{i j}}

with the smallest number of changes. So

STRESS

is a measure of by how much the set of points preserve the order of the distances in the original distance matrix. Non-metric multidimensional scaling seeks to find the set of points that minimize the

STRESS

An alternate measure is

SSTRESS

\sqrt{\frac{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {({\hat{d_{i j}}}^{2} - {\tilde{d_{i j}}}^{2})}^{2}}{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {\hat{d_{i j}}}^{4}}}

in which the distances in

STRESS

are replaced by squared distances.

In order to perform a non-metric scaling, an initial configuration of points is required. This can be obtained from principal coordinate analysis, see g03faf. Given an initial configuration, g03fcf uses the optimization routine e04dgf/e04dga to find the configuration of points that minimizes

STRESS

SSTRESS

. The routine e04dgf/e04dga uses a conjugate gradient algorithm. g03fcf will find an optimum that may only be a local optimum, to be more sure of finding a global optimum several different initial configurations should be used; these can be obtained by randomly perturbing the original initial configuration using routines from Chapter G05.

4 References

Chatfield C and Collins A J (1980) Introduction to Multivariate Analysis Chapman and Hall

Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

5 Arguments

1: $typ$ – Character(1) Input

On entry: indicates whether

STRESS

SSTRESS

is to be used as the criterion.

$typ ='T'$: $STRESS$ is used.
$typ ='S'$: $SSTRESS$ is used.

Constraint:

typ ='S'

'T'

2: $n$ – Integer Input

On entry:

n

, the number of objects in the distance matrix.

Constraint:

n > ndim

3: $ndim$ – Integer Input

On entry:

m

, the number of dimensions used to represent the data.

Constraint:

ndim \geq 1

4: $d (n \times (n - 1) / 2)$ – Real (Kind=nag_wp) array Input

On entry: the lower triangle of the distance matrix

D

stored packed by rows. That is

d ((i - 1) \times (i - 2) / 2 + j)

must contain

d_{i j}

, for

i = 2, 3, \dots, n

and

j = 1, 2, \dots, i - 1

. If

d_{i j}

is missing then set

d_{i j} < 0

; for further comments on missing values see Section 9.

5: $x (ldx, ndim)$ – Real (Kind=nag_wp) array Input/Output

On entry: the

i

th row must contain an initial estimate of the coordinates for the

i

th point, for

i = 1, 2, \dots, n

. One method of computing these is to use g03faf.

On exit: the

i

th row contains

m

coordinates for the

i

th point, for

i = 1, 2, \dots, n

6: $ldx$ – Integer Input

On entry: the first dimension of the array x as declared in the (sub)program from which g03fcf is called.

Constraint:

ldx \geq n

7: $stress$ – Real (Kind=nag_wp) Output

On exit: the value of

STRESS

SSTRESS

at the final iteration.

8: $dfit (2 \times n \times (n - 1))$ – Real (Kind=nag_wp) array Output

On exit: auxiliary outputs.

typ ='T'

, the first

n (n - 1) / 2

elements contain the distances,

\hat{d_{i j}}

, for the points returned in x, the second set of

n (n - 1) / 2

contains the distances

\hat{d_{i j}}

ordered by the input distances,

d_{i j}

, the third set of

n (n - 1) / 2

elements contains the monotonic distances,

\tilde{d_{i j}}

, ordered by the input distances,

d_{i j}

and the final set of

n (n - 1) / 2

elements contains fitted monotonic distances,

\tilde{d_{i j}}

, for the points in x. The

\tilde{d_{i j}}

corresponding to distances which are input as missing are set to zero.

typ ='S'

, the results are as above except that the squared distances are returned.

Each distance matrix is stored in lower triangular packed form in the same way as the input matrix

D

9: $iter$ – Integer Input

On entry: the maximum number of iterations in the optimization process.

$iter = 0$: A default value of $50$ is used.
$iter < 0$: A default value of $\max (50, 5 n m)$ (the default for e04dgf/e04dga) is used.

10: $iopt$ – Integer Input

On entry: selects the options, other than the number of iterations, that control the optimization.

$iopt = 0$: The tolerance $ε$ is set to $0.00001$ (Section 7). All other values are set as described in Section 9.
$iopt > 0$: The tolerance $ε$ is set to $10^{- i}$ where $i = iopt$ . All other values are set as described in Section 9.
$iopt < 0$: No values are changed, therefore the default values of e04dgf/e04dga are used.

11: $wk (15 \times n \times ndim)$ – Real (Kind=nag_wp) array Workspace

12: $iwk (n \times (n - 1) / 2 + n \times ndim + 5)$ – Integer array Workspace

13: $ifail$ – Integer Input/Output

On entry: ifail must be set to

0

- 1

1

to set behaviour on detection of an error; these values have no effect when no error is detected.

A value of

0

causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of

- 1

means that an error message is printed while a value of

1

means that it is not.

If halting is not appropriate, the value

- 1

1

is recommended. If message printing is undesirable, then the value

1

is recommended. Otherwise, the value

0

is recommended. When the value $- 1$ or $1$ is used it is essential to test the value of ifail on exit.

On exit:

ifail = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry

ifail = 0

- 1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Errors or warnings detected by the routine:

$ifail = 1$: On entry, $ldx = 〈value〉$ and $n = 〈value〉$ .
Constraint: $ldx \geq n$ .

On entry, $n = 〈value〉$ and $ndim = 〈value〉$ .
Constraint: $n > ndim$ .

On entry, $ndim = 〈value〉$ .
Constraint: $ndim \geq 1$ .

On entry, $typ = 〈value〉$ .
Constraint: $typ ='S'$ or $'T'$ .

$ifail = 2$: On entry, all the elements of $d \leq 0.0$ .

$ifail = 3$: The optimization has failed to converge in iter function iterations. Try either increasing the number of iterations using iter or increasing the value of $ε$ , given by iopt, used to determine convergence. Alternatively try a different starting configuration.

$ifail = 4$: The conditions for an acceptable solution have not been met but a lower point could not be found. Try using a larger value of $ε$ , given by iopt.

$ifail = 5$: The optimization cannot begin from the initial configuration. Try a different set of points.

$ifail = 6$: The optimization has failed. This error is only likely if $iopt < 0$ . It corresponds to $ifail = 4$ , $7$ and $9$ in e04dgf/e04dga.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

After a successful optimization the relative accuracy of

STRESS

should be approximately

ε

, as specified by iopt.

8 Parallelism and Performance

g03fcf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

The optimization routine e04dgf/e04dga used by g03fcf has a number of options to control the process. The options for the maximum number of iterations (Iteration Limit) and accuracy (Optimality Tolerance) can be controlled by iter and iopt respectively. The printing option (Print Level) is set to

- 1

to give no printing. The other option set is to stop the checking of derivatives (

Verify = NO

) for efficiency. All other options are left at their default values. If however

iopt < 0

is used, only the maximum number of iterations is set. All other options can be controlled by the option setting mechanism of e04dgf/e04dga with the defaults as given by that routine.

Missing values in the input distance matrix can be specified by a negative value and providing there are not more than about two thirds of the values missing the algorithm may still work. However the routine g03faf does not allow for missing values so an alternative method of obtaining an initial set of coordinates is required. It may be possible to estimate the missing values with some form of average and then use g03faf to give an initial set of coordinates.

10 Example

The data, given by Krzanowski (1990), are dissimilarities between water vole populations in Europe. Initial estimates are provided by the first two principal coordinates computed.

g03fc: FL CL CPP AD

NAG FL Interfaceg03fcf (multidimscal_​ordinal)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG FL Interface
g03fcf (multidimscal_ordinal)