PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_mv_cluster_hier_indicator (g03ej)
Purpose
nag_mv_cluster_hier_indicator (g03ej) computes a cluster indicator variable from the results of
nag_mv_cluster_hier (g03ec).
Syntax
Description
Given a distance or dissimilarity matrix for
objects, cluster analysis aims to group the
objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods (see
nag_mv_cluster_hier (g03ec)), a hierarchical tree is produced by starting with
clusters each with a single object and then at each of
stages, merging two clusters to form a larger cluster until all objects are in a single cluster.
nag_mv_cluster_hier_indicator (g03ej) takes the information from the tree and produces the clusters that exist at a given distance. This is equivalent to taking the dendrogram (see
nag_mv_cluster_hier_dendrogram (g03eh)) and drawing a line across at a given distance to produce clusters.
As an alternative to giving the distance at which clusters are required, you can specify the number of clusters required and nag_mv_cluster_hier_indicator (g03ej) will compute the corresponding distance. However, it may not be possible to compute the number of clusters required due to ties in the distance matrix.
If there are clusters then the indicator variable will assign a value between and to each object to indicate to which cluster it belongs. Object always belongs to cluster .
References
Everitt B S (1974) Cluster Analysis Heinemann
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
Parameters
Compulsory Input Parameters
- 1:
– double array
-
The clustering distances in increasing order as returned by
nag_mv_cluster_hier (g03ec).
Constraint:
, for .
- 2:
– int64int32nag_int array
-
The objects in dendrogram order as returned by
nag_mv_cluster_hier (g03ec).
- 3:
– double array
-
The clustering distances corresponding to the order in
iord.
- 4:
– int64int32nag_int scalar
-
Indicates if a specified number of clusters is required.
If
then
nag_mv_cluster_hier_indicator (g03ej) will attempt to find
k clusters.
If
then
nag_mv_cluster_hier_indicator (g03ej) will find the clusters based on the distance given in
dlevel.
Constraint:
.
- 5:
– double scalar
-
If
,
dlevel must contain the distance at which clusters are produced. Otherwise
dlevel need not be set.
Constraint:
if , .
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the dimension of the arrays
iord,
dord. (An error is raised if these dimensions are not equal.)
, the number of objects.
Constraint:
.
Output Parameters
- 1:
– int64int32nag_int scalar
-
The number of clusters produced, .
- 2:
– double scalar
-
If
on entry,
dlevel contains the distance at which the required number of clusters are found. Otherwise
dlevel remains unchanged.
- 3:
– int64int32nag_int array
-
indicates to which of clusters the th object belongs, for .
- 4:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
Cases prefixed with W are classified as warnings and
do not generate an error of type NAG:error_n. See nag_issue_warnings.
-
-
On entry, | , |
or | and . |
or | . |
-
-
On entry, | cd is not in increasing order, |
or | dord is incompatible with cd. |
-
-
On entry, | , |
or | , |
or | , |
or | . |
Note: on exit with this value of
ifail the trivial clustering solution is returned.
- W
-
The precise number of clusters requested is not possible because of tied clustering distances. The actual number of clusters, less than the number requested, is returned in
k.
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
The accuracy will depend upon the accuracy of the distances in
cd and
dord (see
nag_mv_cluster_hier (g03ec)).
Further Comments
A fixed number of clusters can be found using the non-hierarchical method used in
nag_mv_cluster_kmeans (g03ef).
Example
Data consisting of three variables on five objects are input. Euclidean squared distances are computed using
nag_mv_distance_mat (g03ea) and median clustering performed using
nag_mv_cluster_hier (g03ec). A dendrogram is produced by
nag_mv_cluster_hier_dendrogram (g03eh) and printed.
nag_mv_cluster_hier_indicator (g03ej) finds two clusters and the results are printed.
Open in the MATLAB editor:
g03ej_example
function g03ej_example
fprintf('g03ej example results\n\n');
x = [1, 5, 2;
2, 1, 1;
3, 4, 3;
4, 1, 2;
5, 5, 0];
[n,m] = size(x);
isx = ones(m,1,'int64');
isx(1) = int64(0);
s = ones(m,1);
ld = (n*(n-1))/2;
d = zeros(ld,1);
update = 'I';
dist = 'S';
scal = 'U';
[s, d, ifail] = g03ea( ...
update, dist, scal, x, isx, s, d);
method = int64(5);
n = int64(n);
[d, ilc, iuc, cd, iord, dord, ifail] = ...
g03ec(method, n, d);
row = {'A'; 'B'; 'C'; 'D'; 'E'};
fprintf(' Distance Clusters Joined\n\n');
for i = 1:n-1
fprintf('%10.3f %s %s\n', cd(i), row{ilc(i)}, row{iuc(i)})
end
k = int64(2);
dlevel = 0;
[k, dlevel, ic, ifail] = g03ej( ...
cd, iord, dord, k, dlevel);
fprintf('\n Allocation to %2d clusters\n', k);
fprintf(' Clusters found at distance %6.3f\n\n', dlevel);
fprintf(' Object Cluster\n\n');
for i=1:n
fprintf('%6s %2d\n',row{i}, ic(i));
end
g03ej example results
Distance Clusters Joined
1.000 B D
2.000 A C
6.500 A E
14.125 A B
Allocation to 2 clusters
Clusters found at distance 6.500
Object Cluster
A 1
B 2
C 1
D 2
E 1
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015