naginterfaces.library.mv.cluster_hier¶
- naginterfaces.library.mv.cluster_hier(method, n, d)[source]¶
cluster_hier
performs hierarchical cluster analysis.For full information please refer to the NAG Library document for g03ec
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g03/g03ecf.html
- Parameters
- methodint
Indicates which clustering method is used.
Single link.
Complete link.
Group average.
Centroid.
Median.
Minimum variance.
- nint
, the number of objects.
- dfloat, array-like, shape
The strictly lower triangle of the distance matrix. must be stored packed by rows, i.e., , must contain .
- Returns
- dfloat, ndarray, shape
Is overwritten.
- ilcint, ndarray, shape
contains the number, , of the cluster merged with cluster (see ), , at step , for .
- iucint, ndarray, shape
contains the number, , of the cluster merged with cluster , , at step , for .
- cdfloat, ndarray, shape
contains the distance , between clusters and , , merged at step , for .
- iordint, ndarray, shape
The objects in dendrogram order.
- dordfloat, ndarray, shape
The clustering distances corresponding to the order in . contains the distance at which cluster and merge, for . contains the maximum distance.
- Raises
- NagValueError
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: , , , , or .
- (errno )
On entry, at least one element of is negative.
- (errno )
Minimum cluster distance not increasing, dendrogram invalid.
- Notes
In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.
Given a distance or dissimilarity matrix for objects (see
distance_mat()
), cluster analysis aims to group the objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods, a hierarchical tree is produced by starting with clusters, each with a single object and then at each of stages, merging two clusters to form a larger cluster, until all objects are in a single cluster. This process may be represented by a dendrogram (seecluster_hier_dendrogram()
).At each stage, the clusters that are nearest are merged, methods differ as to how the distances between the new cluster and other clusters are computed. For three clusters , and let , and be the number of objects in each cluster and let , and be the distances between the clusters. Let clusters and be merged to give cluster , then the distance from cluster to cluster , can be computed in the following ways.
Single link or nearest neighbour : .
Complete link or furthest neighbour : .
Group average : .
Centroid : .
Median : .
Minimum variance : .
For further details see Everitt (1974) and Krzanowski (1990).
If the clusters are numbered then, for convenience, if clusters and , , merge then the new cluster will be referred to as cluster . Information on the clustering history is given by the values of , and for each of the clustering steps. In order to produce a dendrogram, the ordering of the objects such that the clusters that merge are adjacent is required. This ordering is computed so that the first element is . The associated distances with this ordering are also computed.
- References
Everitt, B S, 1974, Cluster Analysis, Heinemann
Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press