Module Summary

Interfaces for the NAG Mark 29.2 mv Chapter.

mv - Multivariate Methods

This module is concerned with methods for studying multivariate data. A multivariate dataset consists of several variables recorded on a number of objects or individuals. Multivariate methods can be classified as those that seek to examine the relationships between the variables (e.g., principal components), known as variable-directed methods, and those that seek to examine the relationships between the objects (e.g., cluster analysis), known as individual-directed methods.

Multiple regression is not included in this module as it involves the relationship of a single variable, known as the response variable, to the other variables in the dataset, the explanatory variables. Routines for multiple regression are provided in submodule correg.

See Also :

This subpackage contains examples for the mv module. See also the Examples subsection.

Functionality Index

Canonical correlation analysis: canon_corr()

Canonical variate analysis: canon_var()

Cluster Analysis

compute distance matrix: distance_mat()

construct clusters following cluster_hier(): cluster_hier_indicator()

construct dendrogram following cluster_hier(): cluster_hier_dendrogram()

Gaussian mixture model: gaussian_mixture()

hierarchical: cluster_hier()

K-means: cluster_kmeans()

Discriminant Analysis

allocation of observations to groups, following discrim(): discrim_group()

Mahalanobis squared distances, following discrim(): discrim_mahal()

test for equality of within-group covariance matrices: discrim()

Factor Analysis

factor score coefficients, following factor(): factor_score()

maximum likelihood estimates of parameters: factor()

Principal component analysis: prin_comp()


orthogonal rotations for loading matrix: rot_orthomax()

Procustes rotations: rot_procrustes()

ProMax rotations: rot_promax()

Scaling Methods

multidimensional scaling: multidimscal_ordinal()

principal coordinate analysis: multidimscal_metric()

Standardize values of a data matrix: z_scores()

For full information please refer to the NAG Library document


Example for

K-means cluster analysis.

>>> main() Python Example Results.
K-means cluster analysis.
The cluster to which each point belongs:
[1, 1, 3, 2, 3, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 3]
The number of points in each cluster:
[6, 3, 11]
The within-cluster sum of weights of each cluster:
[6.00, 3.00, 11.00]
The within-cluster sum of squares of each cluster:
[46.57, 20.38, 468.90]
The final cluster centres:
  8.1183e+01, 1.1667e+01, 7.1500e+00, 2.0500e+00, 6.6000e+00
  4.7867e+01, 3.5800e+01, 1.6333e+01, 2.4000e+00, 6.7333e+00
  6.4045e+01, 2.5209e+01, 1.0745e+01, 2.8364e+00, 6.6545e+00

Example for

Discriminant analysis.

>>> main() Python Example Results.
Discriminant analysis for diagnosis of Cushing's syndrome.
   Obs       Posterior        Allocated     Atypicality
             probabilities    to group      index
     1      0.094 0.905 0.002     2      0.596 0.254 0.975
     2      0.005 0.168 0.827     3      0.952 0.836 0.018
     3      0.019 0.920 0.062     2      0.954 0.797 0.912
     4      0.697 0.303 0.000     1      0.207 0.860 0.993
     5      0.317 0.013 0.670     3      0.991 1.000 0.984
     6      0.032 0.366 0.601     3      0.981 0.978 0.887[source]

Example for

Fits a mixture of Gaussians for a given (co)variance structure.

>>> main() Python Example Results.
Fits a Gaussian mixture model with pooled covariance structure to
New Haven schools test data.
The final membership probabilities are:
  9.50176891e-01, 4.98231095e-02
  3.32590884e-06, 9.99996674e-01
  9.99613355e-01, 3.86644659e-04
  9.99920087e-01, 7.99127116e-05
  3.89990173e-02, 9.61000983e-01
  9.32704894e-01, 6.72951064e-02
  9.88809712e-01, 1.11902877e-02
  4.12521422e-03, 9.95874786e-01
  9.72521486e-01, 2.74785140e-02
  9.99691952e-01, 3.08048285e-04
  2.17221867e-01, 7.82778133e-01
  7.69380852e-01, 2.30619148e-01
  9.99973063e-01, 2.69370297e-05
  6.11334389e-03, 9.93886656e-01
  4.41893305e-02, 9.55810670e-01
  3.50057883e-04, 9.99649942e-01
  9.99902971e-01, 9.70286734e-05
  4.02698414e-05, 9.99959730e-01
  9.73798317e-01, 2.62016830e-02
  3.02036785e-04, 9.99697963e-01
  6.94705604e-02, 9.30529440e-01
  4.16030240e-03, 9.95839698e-01
  3.08391490e-02, 9.69160851e-01
  9.91157909e-01, 8.84209120e-03
  4.15339034e-04, 9.99584661e-01
The log-likehood is -29.683060270297.
There were 14 function evaluations required.[source]

Example for

Perform a principal component analysis on a data matrix; both the principal component loadings and the principal component scores are returned.

>>> main() Python Example Results.
Perform an unweighted principal component analysis on a dataset
from Cooley and Lohnes (1971). The statistics of the principal
component analysis are:
 Eigenvalues  Percentage  Cumulative       Chisq          DF         Sig
               variation   variation
      8.2739,     0.6515,     0.6515,     8.6127,     5.0000,     0.1255
      3.6761,     0.2895,     0.9410,     4.1183,     2.0000,     0.1276
      0.7499,     0.0590,     1.0000,     0.0000,     0.0000,     0.0000