naginterfaces.library.mv.distance_​mat

naginterfaces.library.mv.distance_mat(update, dist, scal, x, isx, s, d)[source]

distance_mat computes a distance (dissimilarity) matrix.

For full information please refer to the NAG Library document for g03ea

https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g03/g03eaf.html

Parameters
updatestr, length 1

Indicates whether or not an existing matrix is to be updated.

The matrix is updated and distances are added to .

The matrix is initialized to zero before the distances are added to .

diststr, length 1

Indicates which type of distances are computed.

Absolute distances.

Euclidean distances.

Euclidean squared distances.

scalstr, length 1

Indicates the standardization of the variables to be used.

Standard deviation.

Range.

Standardizations given in array .

Unscaled.

xfloat, array-like, shape

must contain the value of the th variable for the th object, for , for .

isxint, array-like, shape

indicates whether or not the th variable in is to be included in the distance computations.

If the th variable is included, for ; otherwise it is not referenced.

sfloat, array-like, shape

If and then must contain the scaling for variable , for .

dfloat, array-like, shape

If , must contain the strictly lower triangle of the distance matrix to be updated. must be stored packed by rows, i.e., , must contain .

If , need not be set.

Returns
sfloat, ndarray, shape

If and then contains the standard deviation of the variable in the th column of .

If and , contains the range of the variable in the th column of .

If and , .

If , is unchanged.

dfloat, ndarray, shape

The strictly lower triangle of the distance matrix stored packed by rows, i.e., is contained in , .

Raises
NagValueError
(errno )

On entry, .

Constraint: , , or .

(errno )

On entry, .

Constraint: or .

(errno )

On entry, .

Constraint: , or

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, at least one element of .

(errno )

Variable is constant.

(errno )

On entry, at least one element of .

(errno )

On entry, does not contain a positive element.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

Given objects, a distance or dissimilarity matrix is a symmetric matrix with zero diagonal elements such that the th element represents how far apart or how dissimilar the th and th objects are.

Let be an data matrix of observations of variables on objects, then the distance between object and object , , can be defined as:

where and are the th and th elements of , is a standardization for the th variable and is a suitable function. Three functions are provided in distance_mat.

  1. Euclidean distance: and .

  2. Euclidean squared distance: and .

  3. Absolute distance (city block metric): and .

Three standardizations are available.

  1. Standard deviation:

  2. Range:

  3. User-supplied values of .

In addition to the above distances there are a large number of other dissimilarity measures, particularly for dichotomous variables (see Krzanowski (1990) and Everitt (1974)). For the dichotomous case these measures are simple to compute and can, if suitable scaling is used, be combined with the distances computed by distance_mat using the updating option.

Dissimilarity measures for variables can be based on the correlation coefficient for continuous variables and contingency table statistics for dichotomous data, see submodule correg and submodule contab respectively.

distance_mat returns the strictly lower triangle of the distance matrix.

References

Everitt, B S, 1974, Cluster Analysis, Heinemann

Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press