naginterfaces.library.mv.cluster_kmeans¶
- naginterfaces.library.mv.cluster_kmeans(x, isx, cmeans, wt=None, maxit=10)[source]¶
cluster_kmeans
performs -means cluster analysis.For full information please refer to the NAG Library document for g03ef
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g03/g03eff.html
- Parameters
- xfloat, array-like, shape
must contain the value of the th variable for the th object, for , for .
- isxint, array-like, shape
indicates whether or not the th variable is to be included in the analysis. If , the variable contained in the th column of is included, for .
- cmeansfloat, array-like, shape
must contain the value of the th variable for the th initial cluster centre, for , for .
- wtNone or float, array-like, shape , optional
If , the first elements of must contain the weights to be used.
If , the th observation is not included in the analysis.
The effective number of observation is the sum of the weights.
If , is not referenced and the effective number of observations is .
- maxitint, optional
The maximum number of iterations allowed in the analysis.
- Returns
- cmeansfloat, ndarray, shape
contains the value of the th variable for the th computed cluster centre, for , for .
- incint, ndarray, shape
contains the cluster to which the th object has been allocated, for .
- nicint, ndarray, shape
contains the number of objects in the th cluster, for .
- cssfloat, ndarray, shape
contains the within-cluster (weighted) sum of squares of the th cluster, for .
- cswfloat, ndarray, shape
contains the within-cluster sum of weights of the th cluster, for . If , the sum of weights is the number of objects in the cluster.
- Raises
- NagValueError
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, and .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: or .
- (errno )
On entry, has less than two positive values.
- (errno )
On entry, and .
Constraint: .
- (errno )
On entry, and values of .
Constraint: exactly elements of .
- (errno )
At least one cluster is empty after the initial assignment.
- (errno )
Convergence has not been achieved within the maximum number of iterations .
- Notes
In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.
Given objects with variables measured on each object, , for , for ,
cluster_kmeans
allocates each object to one of groups or clusters to minimize the within-cluster sum of squares:where is the set of objects in the th cluster and is the mean for the variable over cluster . This is often known as -means clustering.
In addition to the data matrix, a matrix giving the initial cluster centres for the clusters is required. The objects are then initially allocated to the cluster with the nearest cluster mean. Given the initial allocation, the procedure is to iteratively search for the -partition with locally optimal within-cluster sum of squares by moving points from one cluster to another.
Optionally, weights for each object, , can be used so that the clustering is based on within-cluster weighted sums of squares:
where is the weighted mean for variable over cluster .
The function is based on the algorithm of Hartigan and Wong (1979).
- References
Everitt, B S, 1974, Cluster Analysis, Heinemann
Hartigan, J A and Wong, M A, 1979, Algorithm AS 136: A K-means clustering algorithm, Appl. Statist. (28), 100–108
Kendall, M G and Stuart, A, 1976, The Advanced Theory of Statistics (Volume 3), (3rd Edition), Griffin
Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press