naginterfaces.library.mv.cluster_​kmeans

naginterfaces.library.mv.cluster_kmeans(x, isx, cmeans, wt=None, maxit=10)[source]

cluster_kmeans performs -means cluster analysis.

For full information please refer to the NAG Library document for g03ef

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g03/g03eff.html

Parameters
xfloat, array-like, shape

must contain the value of the th variable for the th object, for , for .

isxint, array-like, shape

indicates whether or not the th variable is to be included in the analysis. If , the variable contained in the th column of is included, for .

cmeansfloat, array-like, shape

must contain the value of the th variable for the th initial cluster centre, for , for .

wtNone or float, array-like, shape , optional

If , the first elements of must contain the weights to be used.

If , the th observation is not included in the analysis.

The effective number of observation is the sum of the weights.

If , is not referenced and the effective number of observations is .

maxitint, optional

The maximum number of iterations allowed in the analysis.

Returns
cmeansfloat, ndarray, shape

contains the value of the th variable for the th computed cluster centre, for , for .

incint, ndarray, shape

contains the cluster to which the th object has been allocated, for .

nicint, ndarray, shape

contains the number of objects in the th cluster, for .

cssfloat, ndarray, shape

contains the within-cluster (weighted) sum of squares of the th cluster, for .

cswfloat, ndarray, shape

contains the within-cluster sum of weights of the th cluster, for . If , the sum of weights is the number of objects in the cluster.

Raises
NagValueError
(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, and .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: or .

(errno )

On entry, has less than two positive values.

(errno )

On entry, and .

Constraint: .

(errno )

On entry, and values of .

Constraint: exactly elements of .

(errno )

At least one cluster is empty after the initial assignment.

(errno )

Convergence has not been achieved within the maximum number of iterations .

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

Given objects with variables measured on each object, , for , for , cluster_kmeans allocates each object to one of groups or clusters to minimize the within-cluster sum of squares:

where is the set of objects in the th cluster and is the mean for the variable over cluster . This is often known as -means clustering.

In addition to the data matrix, a matrix giving the initial cluster centres for the clusters is required. The objects are then initially allocated to the cluster with the nearest cluster mean. Given the initial allocation, the procedure is to iteratively search for the -partition with locally optimal within-cluster sum of squares by moving points from one cluster to another.

Optionally, weights for each object, , can be used so that the clustering is based on within-cluster weighted sums of squares:

where is the weighted mean for variable over cluster .

The function is based on the algorithm of Hartigan and Wong (1979).

References

Everitt, B S, 1974, Cluster Analysis, Heinemann

Hartigan, J A and Wong, M A, 1979, Algorithm AS 136: A K-means clustering algorithm, Appl. Statist. (28), 100–108

Kendall, M G and Stuart, A, 1976, The Advanced Theory of Statistics (Volume 3), (3rd Edition), Griffin

Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press