NAG C Library Function Document

nag_mv_cluster_indicator (g03ejc)

1
Purpose

nag_mv_cluster_indicator (g03ejc) computes a cluster indicator variable from the results of nag_mv_hierar_cluster_analysis (g03ecc).

2
Specification

#include <nag.h>
#include <nagg03.h>
void  nag_mv_cluster_indicator (Integer n, const double cd[], const Integer iord[], const double dord[], Integer *k, double *dlevel, Integer ic[], NagError *fail)

3
Description

Given a distance or dissimilarity matrix for n  objects, cluster analysis aims to group the n  objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods (see nag_mv_hierar_cluster_analysis (g03ecc)), a hierarchical tree is produced by starting with n  clusters each with a single object and then at each of n-1  stages, merging two clusters to form a larger cluster until all objects are in a single cluster. nag_mv_cluster_indicator (g03ejc) takes the information from the tree and produces the clusters that exist at a given distance. This is equivalent to taking the dendrogram (see nag_mv_dendrogram (g03ehc)) and drawing a line across at a given distance to produce clusters.
As an alternative to giving the distance at which clusters are required, you can specify the number of clusters required and nag_mv_cluster_indicator (g03ejc) will compute the corresponding distance. However, it may not be possible to compute the number of clusters required due to ties in the distance matrix.
If there are k  clusters then the indicator variable will assign a value between 1 and k  to each object to indicate to which cluster it belongs. Object 1 always belongs to cluster 1.

4
References

Everitt B S (1974) Cluster Analysis Heinemann
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

5
Arguments

1:     n IntegerInput
On entry: the number of objects, n .
Constraint: n2 .
2:     cd[n-1] const doubleInput
On entry: the clustering distances in increasing order as returned by nag_mv_hierar_cluster_analysis (g03ecc).
Constraint: cd[i] cd[i-1] , for i=1,2,,n-2.
3:     iord[n] const IntegerInput
On entry: the objects in the dendrogram order as returned by nag_mv_hierar_cluster_analysis (g03ecc).
4:     dord[n] const doubleInput
On entry: the clustering distances corresponding to the order in iord.
5:     k Integer *Input/Output
On entry: indicates if a specified number of clusters is required.
k>0
nag_mv_cluster_indicator (g03ejc) will attempt to find k clusters.
k0
nag_mv_cluster_indicator (g03ejc) will find the clusters based on the distance given in dlevel.
Constraint: kn .
On exit: the number of clusters produced, k .
6:     dlevel double *Input/Output
On entry: if k0 , then dlevel must contain the distance at which clusters are produced. Otherwise dlevel need not be set.
Constraint: if k0 , dlevel>0.0 .
On exit: if k>0  on entry, then dlevel contains the distance at which the required number of clusters are found. Otherwise dlevel remains unchanged.
7:     ic[n] IntegerOutput
On exit: ic[i-1]  indicates to which of k  clusters the i th object belongs, for i=1,2,,n.
8:     fail NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6
Error Indicators and Warnings

NE_2_INT_ARG_GT
On entry, k=value  while n=value . These arguments must satisfy kn .
NE_CLUSTER
The precise number of clusters requested is not possible because of
tied clustering distances. The actual number of clusters produced is value.
NE_INCOMP_ARRAYS
Arrays cd and dord are not compatible.
NE_INT_ARG_LT
On entry, n=value.
Constraint: n2.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_NOT_INCREASING
The sequence cd is not increasing:
cd[value] = value, cd[value] = value.
NE_REAL_INT
On entry, dlevel=value , k=value .
Constraint: k0  and dlevel>0.0 .
NW_2_INT
On exit, k=value , n=value .
Trivial solution returned.
NW_INT
On exit, k=1 .
Trivial solution returned.
NW_REAL_REALARR
On entry, dlevel=value , cd[value] = value.
Trivial solution returned.

7
Accuracy

The accuracy will depend upon the accuracy of the distances in cd and dord (see nag_mv_hierar_cluster_analysis (g03ecc)).

8
Parallelism and Performance

nag_mv_cluster_indicator (g03ejc) is not threaded in any implementation.

9
Further Comments

A fixed number of clusters can be found using the non-hierarchical method used in nag_mv_kmeans_cluster_analysis (g03efc).

10
Example

Data consisting of three variables on five objects are input. Euclidean squared distances are computed using nag_mv_distance_mat (g03eac) and median clustering performed using nag_mv_hierar_cluster_analysis (g03ecc). A dendrogram is produced by nag_mv_dendrogram (g03ehc) and printed. nag_mv_cluster_indicator (g03ejc) finds two clusters and the results are printed.

10.1
Program Text

Program Text (g03ejce.c)

10.2
Program Data

Program Data (g03ejce.d)

10.3
Program Results

Program Results (g03ejce.r)