naginterfaces.library.mv.distance_mat_2¶
- naginterfaces.library.mv.distance_mat_2(update, scal, stype, p, x, isv, y, sx, sy=None, d=None)[source]¶
distance_mat_2
computes a distance (dissimilarity) matrix between two sets of observations.For full information please refer to the NAG Library document for g03eb
https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g03/g03ebf.html
- Parameters
- updatestr, length 1
Indicates whether or not an existing matrix is to be updated.
The matrix is updated and distances are added to .
The matrix is initialized to zero before the distances are added to .
- scalstr, length 1
Indicates the standardization of the variables to be used.
Standard deviation.
Range.
Standardizations given in array (and posibly ).
Unscaled.
- stypestr, length 1
Indicates how the standardization of the variables treats the two sets of observations.
Amalgamated.
Independent.
Standardization is based purley on observations in .
- pfloat
The order of the Minkowski distance metric.
- xfloat, array-like, shape
must contain the value of the th variable for the th observation in the first set of observations, for , for .
- isvint, array-like, shape
indicates whether or not the th variable in and is to be included in the distance computations.
If the th variable is not included., for .
If the th variable is included, for
- yfloat, array-like, shape
must contain the value of the th variable for the th observation in the second set of observations, for , for .
- sxfloat, array-like, shape
If and then must contain the scaling for variable , for .
- syNone or float, array-like, shape , optional
Note: the required length for this argument is determined as follows: if : ; otherwise: .
If and and then must contain the scaling for variable , for .
If , or then is not referenced and may be None.
- dNone or float, array-like, shape , optional
The distance matrix .
If , need not be set.
- Returns
- sxfloat, ndarray, shape
If and then contains the standard deviation of the variable in the th column of .
If and , contains the range of the variable in the th column of .
If and , .
If , is unchanged.
- syNone or float, ndarray, shape
If and and then contains the standard deviation of the variable in the th column of .
If and and , contains the range of the variable in the th column of .
If , is unchanged.
If , is unchanged.
- dfloat, ndarray, shape
The (possibly updated) distance matrix .
- Raises
- NagValueError
- (errno )
On entry, .
Constraint: or .
- (errno )
On entry, .
Constraint: , , or .
- (errno )
On entry, .
Constraint: , or .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
Variable is constant.
- (errno )
On entry, does not contain a positive element.
- (errno )
On entry, at least one element of or .
- (errno )
On entry, at least one element of .
- (errno )
On entry, at least one element of .
- Notes
Given two sets of observations on variables, a distance matrix is such that the th element represents how far apart or how dissimilar the th observation from the first set and the th observation from the second set are.
Let and be and data matrices of and observations, respectively, on variables. The distance between observation from and observation from , , is most commonly defined in terms of the scaled Minkowski -norm:
where and are the th and th elements of and respectively, is a standardization for the th variable in , is a standardization for the th variable in , and is the order of the Minkowski norm.
Three standardizations (scalings) for the variables are available.
Standard deviation:
Range:
User-supplied values of .
In addition to the Minkowski measure there are a large number of other dissimilarity measures, particularly for dichotomous variables (see Krzanowski (1990) and Everitt (1974)). For the dichotomous case these measures are simple to compute and can, if suitable scaling is used, be combined with the distances computed by
distance_mat_2
using the updating option.Dissimilarity measures for variables can be based on the correlation coefficient for continuous variables and contingency table statistics for dichotomous data, see submodule
correg
and submodulecontab
respectively.distance_mat_2
returns the full rectangular distance matrix.
- References
Everitt, B S, 1974, Cluster Analysis, Heinemann
Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press