​hier, n, d)[source]

cluster_hier performs hierarchical cluster analysis.

For full information please refer to the NAG Library document for g03ec


Indicates which clustering method is used.

Single link.

Complete link.

Group average.



Minimum variance.


, the number of objects.

dfloat, array-like, shape

The strictly lower triangle of the distance matrix. must be stored packed by rows, i.e., , must contain .

dfloat, ndarray, shape

Is overwritten.

ilcint, ndarray, shape

contains the number, , of the cluster merged with cluster (see ), , at step , for .

iucint, ndarray, shape

contains the number, , of the cluster merged with cluster , , at step , for .

cdfloat, ndarray, shape

contains the distance , between clusters and , , merged at step , for .

iordint, ndarray, shape

The objects in dendrogram order.

dordfloat, ndarray, shape

The clustering distances corresponding to the order in . contains the distance at which cluster and merge, for . contains the maximum distance.

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: , , , , or .

(errno )

On entry, at least one element of is negative.

(errno )

Minimum cluster distance not increasing, dendrogram invalid.


In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

Given a distance or dissimilarity matrix for objects (see distance_mat()), cluster analysis aims to group the objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods, a hierarchical tree is produced by starting with clusters, each with a single object and then at each of stages, merging two clusters to form a larger cluster, until all objects are in a single cluster. This process may be represented by a dendrogram (see cluster_hier_dendrogram()).

At each stage, the clusters that are nearest are merged, methods differ as to how the distances between the new cluster and other clusters are computed. For three clusters , and let , and be the number of objects in each cluster and let , and be the distances between the clusters. Let clusters and be merged to give cluster , then the distance from cluster to cluster , can be computed in the following ways.

  1. Single link or nearest neighbour : .

  2. Complete link or furthest neighbour : .

  3. Group average : .

  4. Centroid : .

  5. Median : .

  6. Minimum variance : .

For further details see Everitt (1974) and Krzanowski (1990).

If the clusters are numbered then, for convenience, if clusters and , , merge then the new cluster will be referred to as cluster . Information on the clustering history is given by the values of , and for each of the clustering steps. In order to produce a dendrogram, the ordering of the objects such that the clusters that merge are adjacent is required. This ordering is computed so that the first element is . The associated distances with this ordering are also computed.


Everitt, B S, 1974, Cluster Analysis, Heinemann

Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press