naginterfaces.library.mv.canon_var¶

naginterfaces.library.mv.canon_var(weight, x, isx, ing, ng, wt, tol)[source]¶

canon_var performs a canonical variate (canonical discrimination) analysis.

For full information please refer to the NAG Library document for g03ac

https://support.nag.com/numeric/nl/nagdoc_31/flhtml/g03/g03acf.html

Parameters

weightstr, length 1

Indicates if weights are to be used.

$w e i g h t ='U'$

No weights are used.

$w e i g h t ='W'$ or $'V'$

Weights are used and must be supplied in $w t$ .

If $w e i g h t ='W'$ , the weights are treated as frequencies and the effective number of observations is the sum of the weights.

If $w e i g h t ='V'$ , the weights are treated as being inversely proportional to the variance of the observations and the effective number of observations is the number of observations with nonzero weights.

xfloat, array-like, shape $(n, m)$

$x [i - 1, j - 1]$ must contain the $i$ th observation for the $j$ th variable, for $j = 1, 2, \dots, m$ , for $i = 1, 2, \dots, n$ .

isxint, array-like, shape $(m)$

$i s x [j - 1]$ indicates whether or not the $j$ th variable is to be included in the analysis.

If $i s x [j - 1] > 0$ , the variables contained in the $j$ th column of $x$ is included in the canonical variate analysis, for $j = 1, 2, \dots, m$ .

ingint, array-like, shape $(n)$

$i n g [i - 1]$ indicates which group the $i$ th observation is in, for $i = 1, 2, \dots, n$ . The effective number of groups is the number of groups with nonzero membership.

ngint

The number of groups, $n_{g}$ .

wtfloat, array-like, shape $(:)$

Note: the required length for this argument is determined as follows: if $w e i g h t in ('W','V')$ : $n$ ; otherwise: $1$ .

If $w e i g h t ='W'$ or $'V'$ , the first $n$ elements of $w t$ must contain the weights to be used in the analysis.

If $w t [i - 1] = 0.0$ , the $i$ th observation is not included in the analysis.

If $w e i g h t ='U'$ , $w t$ is not referenced.

tolfloat

The value of $t o l$ is used to decide if the variables are of full rank and, if not, what is the rank of the variables. The smaller the value of $t o l$ the stricter the criterion for selecting the singular value decomposition. If a non-negative value of $t o l$ less than machine precision is entered, the square root of machine precision is used instead.

Returns

nigint, ndarray, shape $(n g)$

$n i g [j - 1]$ gives the number of observations in group $j$ , for $j = 1, 2, \dots, n_{g}$ .

cvmfloat, ndarray, shape $(n g, nx)$

$c v m [i - 1, j - 1]$ contains the mean of the $j$ th canonical variate for the $i$ th group, for $j = 1, 2, \dots, l$ , for $i = 1, 2, \dots, n_{g}$ ; the remaining columns, if any, are used as workspace.

efloat, ndarray, shape $(min (nx, n g - 1), 6)$

The statistics of the canonical variate analysis.

$e [i - 1, 0]$

The canonical correlations, $δ_{i}$ , for $i = 1, 2, \dots, l$ .

$e [i - 1, 1]$

The eigenvalues of the within-group sum of squares matrix, $λ_{i}^{2}$ , for $i = 1, 2, \dots, l$ .

$e [i - 1, 2]$

The proportion of variation explained by the $i$ th canonical variate, for $i = 1, 2, \dots, l$ .

$e [i - 1, 3]$

The $χ^{2}$ statistic for the $i$ th canonical variate, for $i = 1, 2, \dots, l$ .

$e [i - 1, 4]$

The degrees of freedom for $χ^{2}$ statistic for the $i$ th canonical variate, for $i = 1, 2, \dots, l$ .

$e [i - 1, 5]$

The significance level for the $χ^{2}$ statistic for the $i$ th canonical variate, for $i = 1, 2, \dots, l$ .

ncvint

The number of canonical variates, $l$ . This will be the minimum of $n_{g} - 1$ and the rank of $x$ .

cvxfloat, ndarray, shape $(nx, n g - 1)$

The canonical variate loadings. $c v x [i - 1, j - 1]$ contains the loading coefficient for the $i$ th variable on the $j$ th canonical variate, for $j = 1, 2, \dots, l$ , for $i = 1, 2, \dots, n_{x}$ ; the remaining columns, if any, are used as workspace.

irankxint

The rank of the dependent variables.

If the variables are of full rank then $i r a n k x = nx$ .

If the variables are not of full rank then $i r a n k x$ is an estimate of the rank of the dependent variables. $i r a n k x$ is calculated as the number of singular values greater than $t o l \times (largest singular value)$ .

Raises

NagValueError

(errno $1$ )

On entry, $t o l = ⟨ v a l u e ⟩$ .

Constraint: $t o l \geq 0.0$ .

(errno $1$ )

On entry, $w e i g h t = ⟨ v a l u e ⟩$ .

Constraint: $w e i g h t ='U'$ , $'W'$ or $'V'$ .

(errno $1$ )

On entry, $lde = ⟨ v a l u e ⟩$ and $m i n (nx, n g - 1) = ⟨ v a l u e ⟩$ .

Constraint: $lde \geq m i n (nx, n g - 1)$ .

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ and $nx + n g = ⟨ v a l u e ⟩$ .

Constraint: $n \geq nx + n g$ .

(errno $1$ )

On entry, $m = ⟨ v a l u e ⟩$ and $nx = ⟨ v a l u e ⟩$ .

Constraint: $m \geq nx$ .

(errno $1$ )

On entry, $n g = ⟨ v a l u e ⟩$ .

Constraint: $n g \geq 2$ .

(errno $1$ )

On entry, $nx = ⟨ v a l u e ⟩$ .

Constraint: $nx \geq 1$ .

(errno $2$ )

On entry, $i = ⟨ v a l u e ⟩$ and $w t [i - 1] < 0.0$ .

Constraint: $w t [i - 1] \geq 0.0$ .

(errno $3$ )

On entry, $i = ⟨ v a l u e ⟩$ , $i n g [i - 1] = ⟨ v a l u e ⟩$ and $n g = ⟨ v a l u e ⟩$ .

Constraint: $1 \leq i n g [i - 1] \leq n g$ .

(errno $4$ )

On entry, $nx = ⟨ v a l u e ⟩$ , expected $value = ⟨ v a l u e ⟩$ .

Constraint: $nx$ must be consistent with $i s x$ .

(errno $5$ )

The singular value decomposition has failed to converge.

(errno $7$ )

The effective number of observations is less than the effective number of groups plus number of variables.

(errno $7$ )

Less than $2$ groups have nonzero membership.

Warns

NagAlgorithmicWarning

(errno $6$ ): A canonical correlation is equal to $1.0$ .
(errno $8$ ): The rank of $x$ is $0$ .

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

Let a sample of $n$ observations on $n_{x}$ variables in a data matrix come from $n_{g}$ groups with $n_{1}, n_{2}, \dots, n_{n_{g}}$ observations in each group, $\sum n_{i} = n$ . Canonical variate analysis finds the linear combination of the $n_{x}$ variables that maximizes the ratio of between-group to within-group variation. The variables formed, the canonical variates can then be used to discriminate between groups.

The canonical variates can be calculated from the eigenvectors of the within-group sums of squares and cross-products matrix. However, canon_var calculates the canonical variates by means of a singular value decomposition (SVD) of a matrix $V$ . Let the data matrix with variable (column) means subtracted be $X$ , and let its rank be $k$ ; then the $k$ by ( $n_{g} - 1$ ) matrix $V$ is given by:

V = Q_{X}^{T} Q_{g},

where $Q_{g}$ is an $n \times (n_{g} - 1)$ orthogonal matrix that defines the groups and $Q_{X}$ is the first $k$ rows of the orthogonal matrix $Q$ either from the $Q R$ decomposition of $X$ :

X = Q R

if $X$ is of full column rank, i.e., $k = n_{x}$ , else from the SVD of $X$ :

X = Q D P^{T} .

Let the SVD of $V$ be:

V = U_{x} Δ U_{g}^{T}

then the nonzero elements of the diagonal matrix $Δ$ , $δ_{i}$ , for $i = 1, 2, \dots, l$ , are the $l$ canonical correlations associated with the $l = m i n (k, n_{g} - 1)$ canonical variates, where $l = m i n (k, n_{g})$ .

The eigenvalues, $λ_{i}^{2}$ , of the within-group sums of squares matrix are given by:

λ_{i}^{2} = \frac{δ_{i}^{2}}{1 - δ_{i}^{2}}

and the value of $π_{i} = λ_{i}^{2} / \sum λ_{i}^{2}$ gives the proportion of variation explained by the $i$ th canonical variate. The values of the $π_{i}$ ’s give an indication as to how many canonical variates are needed to adequately describe the data, i.e., the dimensionality of the problem.

To test for a significant dimensionality greater than $i$ the $χ^{2}$ statistic:

(n - 1 - n_{g} - \frac{1}{2} (k - n_{g})) l \sum j = i + 1 log (1 + λ_{j}^{2})

can be used. This is asymptotically distributed as a $χ^{2}$ -distribution with $(k - i) (n_{g} - 1 - i)$ degrees of freedom. If the test for $i = h$ is not significant, then the remaining tests for $i > h$ should be ignored.

The loadings for the canonical variates are calculated from the matrix $U_{x}$ . This matrix is scaled so that the canonical variates have unit within-group variance.

In addition to the canonical variates loadings the means for each canonical variate are calculated for each group.

Weights can be used with the analysis, in which case the weighted means are subtracted from each column and then each row is scaled by an amount $\sqrt{w_{i}}$ , where $w_{i}$ is the weight for the $i$ th observation (row).

References

Chatfield, C and Collins, A J, 1980, Introduction to Multivariate Analysis, Chapman and Hall

Gnanadesikan, R, 1977, Methods for Statistical Data Analysis of Multivariate Observations, Wiley

Hammarling, S, 1985, The singular value decomposition in multivariate statistics, SIGNUM Newsl. (20(3)), 2–25

Kendall, M G and Stuart, A, 1969, The Advanced Theory of Statistics (Volume 1), (3rd Edition), Griffin

NAG and Python

Return to Front

naginterfaces.library.mv.canon_var¶

naginterfaces.library.mv.canon_​var¶

naginterfaces.library.mv.canon_var¶