naginterfaces.library.mv.distance_​mat_​2

naginterfaces.library.mv.distance_mat_2(update, scal, stype, p, x, isv, y, sx, sy=None, d=None)[source]

distance_mat_2 computes a distance (dissimilarity) matrix between two sets of observations.

For full information please refer to the NAG Library document for g03eb

https://support.nag.com/numeric/nl/nagdoc_30/flhtml/g03/g03ebf.html

Parameters
updatestr, length 1

Indicates whether or not an existing matrix is to be updated.

The matrix is updated and distances are added to .

The matrix is initialized to zero before the distances are added to .

scalstr, length 1

Indicates the standardization of the variables to be used.

Standard deviation.

Range.

Standardizations given in array (and posibly ).

Unscaled.

stypestr, length 1

Indicates how the standardization of the variables treats the two sets of observations.

Amalgamated.

Independent.

Standardization is based purley on observations in .

pfloat

The order of the Minkowski distance metric.

xfloat, array-like, shape

must contain the value of the th variable for the th observation in the first set of observations, for , for .

isvint, array-like, shape

indicates whether or not the th variable in and is to be included in the distance computations.

If the th variable is not included., for .

If the th variable is included, for

yfloat, array-like, shape

must contain the value of the th variable for the th observation in the second set of observations, for , for .

sxfloat, array-like, shape

If and then must contain the scaling for variable , for .

syNone or float, array-like, shape , optional

Note: the required length for this argument is determined as follows: if : ; otherwise: .

If and and then must contain the scaling for variable , for .

If , or then is not referenced and may be None.

dNone or float, array-like, shape , optional

The distance matrix .

If , need not be set.

Returns
sxfloat, ndarray, shape

If and then contains the standard deviation of the variable in the th column of .

If and , contains the range of the variable in the th column of .

If and , .

If , is unchanged.

syNone or float, ndarray, shape

If and and then contains the standard deviation of the variable in the th column of .

If and and , contains the range of the variable in the th column of .

If , is unchanged.

If , is unchanged.

dfloat, ndarray, shape

The (possibly updated) distance matrix .

Raises
NagValueError
(errno )

On entry, .

Constraint: or .

(errno )

On entry, .

Constraint: , , or .

(errno )

On entry, .

Constraint: , or .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

Variable is constant.

(errno )

On entry, does not contain a positive element.

(errno )

On entry, at least one element of or .

(errno )

On entry, at least one element of .

(errno )

On entry, at least one element of .

Notes

Given two sets of observations on variables, a distance matrix is such that the th element represents how far apart or how dissimilar the th observation from the first set and the th observation from the second set are.

Let and be and data matrices of and observations, respectively, on variables. The distance between observation from and observation from , , is most commonly defined in terms of the scaled Minkowski -norm:

where and are the th and th elements of and respectively, is a standardization for the th variable in , is a standardization for the th variable in , and is the order of the Minkowski norm.

Three standardizations (scalings) for the variables are available.

  1. Standard deviation:

  2. Range:

  3. User-supplied values of .

In addition to the Minkowski measure there are a large number of other dissimilarity measures, particularly for dichotomous variables (see Krzanowski (1990) and Everitt (1974)). For the dichotomous case these measures are simple to compute and can, if suitable scaling is used, be combined with the distances computed by distance_mat_2 using the updating option.

Dissimilarity measures for variables can be based on the correlation coefficient for continuous variables and contingency table statistics for dichotomous data, see submodule correg and submodule contab respectively.

distance_mat_2 returns the full rectangular distance matrix.

References

Everitt, B S, 1974, Cluster Analysis, Heinemann

Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press