nag_mv_ordinal_multidimscale (g03fcc) : NAG C Library, Mark 26

Description

For a set of

n

objects, a distance or dissimilarity matrix

D

can be calculated such that

d_{i j}

is a measure of how ‘far apart’ objects

i

and

j

are. If

p

variables

x_{k}

have been recorded for each observation this measure may be based on Euclidean distance,

d_{i j} = \sum_{k = 1}^{p} {(x_{k i} - x_{k j})}^{2}

, or some other calculation such as the number of variables for which

x_{k j} \neq x_{k i}

. Alternatively, the distances may be the result of a subjective assessment. For a given distance matrix, multidimensional scaling produces a configuration of

n

points in a chosen number of dimensions,

m

, such that the distance between the points in some way best matches the distance matrix. For some distance measures, such as Euclidean distance, the size of distance is meaningful, for other measures of distance all that can be said is that one distance is greater or smaller than another. For the former, metric scaling can be used, see nag_mv_prin_coord_analysis (g03fac), for the latter, a non-metric scaling is more appropriate.

For non-metric multidimensional scaling, the criterion used to measure the closeness of the fitted distance matrix to the observed distance matrix is known as

stress

stress

is given by,

\sqrt{\frac{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {({\hat{d}}_{i j} - {\tilde{d}}_{i j})}^{2}}{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {\hat{d}}_{i j}^{2}}}

where

{\hat{d}}_{i j}^{2}

is the Euclidean squared distance between points

i

and

j

and

{\tilde{d}}_{i j}

is the fitted distance obtained when

{\hat{d}}_{i j}

is monotonically regressed on

d_{i j}

, that is,

{\tilde{d}}_{i j}

is monotonic relative to

d_{i j}

and is obtained from

{\hat{d}}_{i j}

with the smallest number of changes. So

stress

is a measure of by how much the set of points preserve the order of the distances in the original distance matrix. Non-metric multidimensional scaling seeks to find the set of points that minimize the

stress

An alternate measure is squared

stress

S S T R E S S

\sqrt{\frac{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {({\hat{d}}_{i j}^{2} - {\tilde{d}}_{i j}^{2})}^{2}}{\sum_{i = 1}^{n} \sum_{j = 1}^{i - 1} {\hat{d}}_{i j}^{4}}}

in which the distances in

stress

are replaced by squared distances.

In order to perform a non-metric scaling, an initial configuration of points is required. This can be obtained from principal coordinate analysis, see nag_mv_prin_coord_analysis (g03fac). Given an initial configuration, nag_mv_ordinal_multidimscale (g03fcc) uses the optimization function nag_opt_conj_grad (e04dgc) to find the configuration of points that minimizes

stress

S S T R E S S

. The function nag_opt_conj_grad (e04dgc) uses a conjugate gradient algorithm. nag_mv_ordinal_multidimscale (g03fcc) will find an optimum that may only be a local optimum, to be more sure of finding a global optimum several different initial configurations should be used; these can be obtained by randomly perturbing the original initial configuration using functions from the g05 Chapter Introduction.

Arguments

type

– Nag_ScaleCriterionInput

On entry: indicates whether

stress

S S T R E S S

is to be used as the criterion.

$type = Nag_Stress$: $stress$ is used.
$type = Nag_SStress$: $S S T R E S S$ is used.

Constraint:

type = Nag_Stress

Nag_SStress

n

– IntegerInput

On entry: the number of objects in the distance matrix,

n

Constraint:

n > ndim

ndim

– IntegerInput

On entry: the number of dimensions used to represent the data,

m

Constraint:

ndim \geq 1

d [n \times (n - 1) / 2]

– const doubleInput

On entry: the lower triangle of the distance matrix

D

stored packed by rows. That is

d [(i - 1) \times (i - 2) / 2 + j - 1]

must contain

d_{i j}

, for

i = 2, 3, \dots, n

and

j = 1, 2, \dots, i - 1

. If

d_{i j}

is missing then set

d_{i j} < 0

; For further comments on missing values see Section 9.

x [n \times tdx]

– doubleInput/Output

Note: the

(i, j)

th element of the matrix

X

is stored in

x [(i - 1) \times tdx + j - 1]

On entry: the

i

th row must contain an initial estimate of the coordinates for the

i

th point,

i = 1, 2, \dots, n

. One method of computing these is to use nag_mv_prin_coord_analysis (g03fac).

On exit: the

i

th row contains

m

coordinates for the

i

th point,

i = 1, 2, \dots, n

tdx

– IntegerInput

On entry: the stride separating matrix column elements in the array x.

Constraint:

tdx \geq ndim

stress

– double *Output

On exit: the value of

stress

S S T R E S S

at the final iteration.

dfit [2 \times n \times (n - 1)]

– doubleOutput

On exit: auxiliary outputs. If

type = Nag_Stress

, the first

n (n - 1) / 2

elements contain the distances,

{\hat{d}}_{i j}

, for the points returned in x, the second set of

n (n - 1) / 2

contains the distances

{\hat{d}}_{i j}

ordered by the input distances,

d_{i j}

, the third set of

n (n - 1) / 2

elements contains the monotonic distances,

{\tilde{d}}_{i j}

, ordered by the input distances,

d_{i j}

and the final set of

n (n - 1) / 2

elements contains fitted monotonic distances,

{\tilde{d}}_{i j}

, for the points in x. The

{\tilde{d}}_{i j}

corresponding to distances which are input as missing are set to zero. If

type = Nag_SStress

, the results are as above except that the squared distances are returned.

Each distance matrix is stored in lower triangular packed form in the same way as the input matrix

D

options

– Nag_E04_Opt *Input/Output

On entry/exit: a pointer to a structure of type Nag_E04_Opt whose members are optional parameters for nag_opt_conj_grad (e04dgc). These structure members offer the means of adjusting some of the argument values of the algorithm and on output will supply further details of the results. You are referred to the nag_opt_conj_grad (e04dgc) document for further details.

The default values used by nag_mv_ordinal_multidimscale (g03fcc) when the options argument is set to the NAG defined null pointer, E04_DEFAULT, are as follows:

$optim_tol = 0.00001$ ;
$print_level = Nag_NoPrint$ ;
$list = Nag_FALSE$ ;
$verify_grad = Nag_FALSE$ ;
$max_iter = \max (50, n \times ndim)$ .

If a different value is required for any of these four structure members or if other options available in nag_opt_conj_grad (e04dgc) are to be used, then the structure options should be declared and initialized by a call to nag_opt_init (e04xxc) and supplied as an argument to nag_mv_ordinal_multidimscale (g03fcc). In this case, the structure members listed above except for

list

will have the default values as specified above;

list = Nag_TRUE

in this case.

10:

fail

– NagError *Input/Output

The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

Error Indicators and Warnings

NE_2_INT_ARG_LE

On entry,

n = 〈value〉

while

ndim = 〈value〉

. These arguments must satisfy

n > ndim

NE_2_INT_ARG_LT

On entry,

tdx = 〈value〉

while

ndim = 〈value〉

. These arguments must satisfy

tdx \geq ndim

NE_ALLOC_FAIL

Dynamic memory allocation failed.

NE_BAD_PARAM

On entry, argument type had an illegal value.

NE_INT_ARG_LT

On entry,

ndim = 〈value〉

.
Constraint:

ndim \geq 1

NE_INTERNAL_ERROR

Additional error messages are output if the optimization fails to converge or if the options are set incorrectly, Details of these can be found in the nag_opt_conj_grad (e04dgc) document.

An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.

NE_NEG_OR_ZERO_ARRAY

All elements of array

d \leq 0.0

.
Constraint: At least one element of d must be positive.

Further Comments

Missing values in the input distance matrix can be specified by a negative value and providing there are not more than about two thirds of the values missing, the algorithm may still work. However, the function nag_mv_prin_coord_analysis (g03fac) does not allow for missing values so an alternative method of obtaining an initial set of coordinates is required. It may be possible to estimate the missing values with some form of average and then use nag_mv_prin_coord_analysis (g03fac) to give an initial set of coordinates.

NAG C Library Function Document

nag_mv_ordinal_multidimscale (g03fcc)

▸▿ Contents

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Parallelism and Performance

9

Further Comments

10

Example

10.1

Program Text

10.2

Program Data

10.3

Program Results

NAG C Library Function Document

nag_mv_ordinal_multidimscale (g03fcc)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Parallelism and Performance

9

Further Comments

10

Example

10.1

Program Text

10.2

Program Data

10.3

Program Results