naginterfaces.library.mv.distance_mat¶
- naginterfaces.library.mv.distance_mat(update, dist, scal, x, isx, s, d)[source]¶
distance_mat
computes a distance (dissimilarity) matrix.For full information please refer to the NAG Library document for g03ea
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g03/g03eaf.html
- Parameters
- updatestr, length 1
Indicates whether or not an existing matrix is to be updated.
The matrix is updated and distances are added to .
The matrix is initialized to zero before the distances are added to .
- diststr, length 1
Indicates which type of distances are computed.
Absolute distances.
Euclidean distances.
Euclidean squared distances.
- scalstr, length 1
Indicates the standardization of the variables to be used.
Standard deviation.
Range.
Standardizations given in array .
Unscaled.
- xfloat, array-like, shape
must contain the value of the th variable for the th object, for , for .
- isxint, array-like, shape
indicates whether or not the th variable in is to be included in the distance computations.
If the th variable is included, for ; otherwise it is not referenced.
- sfloat, array-like, shape
If and then must contain the scaling for variable , for .
- dfloat, array-like, shape
If , must contain the strictly lower triangle of the distance matrix to be updated. must be stored packed by rows, i.e., , must contain .
If , need not be set.
- Returns
- sfloat, ndarray, shape
If and then contains the standard deviation of the variable in the th column of .
If and , contains the range of the variable in the th column of .
If and , .
If , is unchanged.
- dfloat, ndarray, shape
The strictly lower triangle of the distance matrix stored packed by rows, i.e., is contained in , .
- Raises
- NagValueError
- (errno )
On entry, .
Constraint: , , or .
- (errno )
On entry, .
Constraint: or .
- (errno )
On entry, .
Constraint: , or
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, at least one element of .
- (errno )
Variable is constant.
- (errno )
On entry, at least one element of .
- (errno )
On entry, does not contain a positive element.
- Notes
In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.
Given objects, a distance or dissimilarity matrix is a symmetric matrix with zero diagonal elements such that the th element represents how far apart or how dissimilar the th and th objects are.
Let be an data matrix of observations of variables on objects, then the distance between object and object , , can be defined as:
where and are the th and th elements of , is a standardization for the th variable and is a suitable function. Three functions are provided in
distance_mat
.Euclidean distance: and .
Euclidean squared distance: and .
Absolute distance (city block metric): and .
Three standardizations are available.
Standard deviation:
Range:
User-supplied values of .
In addition to the above distances there are a large number of other dissimilarity measures, particularly for dichotomous variables (see Krzanowski (1990) and Everitt (1974)). For the dichotomous case these measures are simple to compute and can, if suitable scaling is used, be combined with the distances computed by
distance_mat
using the updating option.Dissimilarity measures for variables can be based on the correlation coefficient for continuous variables and contingency table statistics for dichotomous data, see submodule
correg
and submodulecontab
respectively.distance_mat
returns the strictly lower triangle of the distance matrix.
- References
Everitt, B S, 1974, Cluster Analysis, Heinemann
Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press