Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 24: |
weight was removed from the interface; wt was made optional |
At Mark 22: |
n and k were made optional |
Given
objects with
variables measured on each object,
, for
and
,
nag_mv_cluster_kmeans (g03ef) allocates each object to one of
groups or clusters to minimize the within-cluster sum of squares:
where
is the set of objects in the
th cluster and
is the mean for the variable
over cluster
. This is often known as
-means clustering.
Optionally, weights for each object,
, can be used so that the clustering is based on within-cluster weighted sums of squares:
where
is the weighted mean for variable
over cluster
.
Hartigan J A and Wong M A (1979) Algorithm AS 136: A K-means clustering algorithm Appl. Statist. 28 100–108
The data consists of observations of five variables on twenty soils (see
Hartigan and Wong (1979)). The data is read in, the
-means clustering performed and the results printed.
function g03ef_example
fprintf('g03ef example results\n\n');
x = [77.3, 13.0, 9.7, 1.5, 6.4;
82.5, 10.0, 7.5, 1.5, 6.5;
66.9, 20.6, 12.5, 2.3, 7.0;
47.2, 33.8, 19.0, 2.8, 5.8;
65.3, 20.5, 14.2, 1.9, 6.9;
83.3, 10.0, 6.7, 2.2, 7.0;
81.6, 12.7, 5.7, 2.9, 6.7;
47.8, 36.5, 15.7, 2.3, 7.2;
48.6, 37.1, 14.3, 2.1, 7.2;
61.6, 25.5, 12.9, 1.9, 7.3;
58.6, 26.5, 14.9, 2.4, 6.7;
69.3, 22.3, 8.4, 4.0, 7.0;
61.8, 30.8, 7.4, 2.7, 6.4;
67.7, 25.3, 7.0, 4.8, 7.3;
57.2, 31.2, 11.6, 2.4, 6.5;
67.2, 22.7, 10.1, 3.3, 6.2;
59.2, 31.2, 9.6, 2.4, 6.0;
80.2, 13.2, 6.6, 2.0, 5.8;
82.2, 11.1, 6.7, 2.2, 7.2;
69.7, 20.7, 9.6, 3.1, 5.9];
[m,n] = size(x);
isx = ones(n,1,'int64');
cmeans = [82.5, 10.0, 7.5, 1.5, 6.5;
47.8, 36.5, 15.7, 2.3, 7.2;
67.2, 22.7, 10.1, 3.3, 6.2];
[cmeans, inc, nic, css, csw, ifail] = ...
g03ef(x, isx, cmeans);
disp(' The cluster each point belongs to');
fprintf('%6d%6d%6d%6d%6d%6d%6d%6d%6d%6d\n',inc);
disp(' The number of points in each cluster');
disp(nic');
disp(' The within-cluster sum of weights of each cluster');
disp(csw');
disp(' The within-cluster sum of squares of each cluster');
disp(css')
mtitle = 'The final cluster centres';
matrix = 'General';
diag = ' ';
[ifail] = x04ca( ...
matrix, diag, cmeans, mtitle);