NAG FL Interface
g03acf (canon_​var)

1 Purpose

g03acf performs a canonical variate (canonical discrimination) analysis.

2 Specification

Fortran Interface
Subroutine g03acf ( weight, n, m, x, ldx, isx, nx, ing, ng, wt, nig, cvm, ldcvm, e, lde, ncv, cvx, ldcvx, tol, irankx, wk, iwk, ifail)
Integer, Intent (In) :: n, m, ldx, isx(m), nx, ing(n), ng, ldcvm, lde, ldcvx, iwk
Integer, Intent (Inout) :: ifail
Integer, Intent (Out) :: nig(ng), ncv, irankx
Real (Kind=nag_wp), Intent (In) :: x(ldx,m), wt(*), tol
Real (Kind=nag_wp), Intent (Inout) :: cvm(ldcvm,nx), e(lde,6), cvx(ldcvx,ng-1)
Real (Kind=nag_wp), Intent (Out) :: wk(iwk)
Character (1), Intent (In) :: weight
C Header Interface
#include <nag.h>
void  g03acf_ (const char *weight, const Integer *n, const Integer *m, const double x[], const Integer *ldx, const Integer isx[], const Integer *nx, const Integer ing[], const Integer *ng, const double wt[], Integer nig[], double cvm[], const Integer *ldcvm, double e[], const Integer *lde, Integer *ncv, double cvx[], const Integer *ldcvx, const double *tol, Integer *irankx, double wk[], const Integer *iwk, Integer *ifail, const Charlen length_weight)
The routine may be called by the names g03acf or nagf_mv_canon_var.

3 Description

Let a sample of n observations on nx variables in a data matrix come from ng groups with n1,n2,,nng observations in each group, ni=n. Canonical variate analysis finds the linear combination of the nx variables that maximizes the ratio of between-group to within-group variation. The variables formed, the canonical variates can then be used to discriminate between groups.
The canonical variates can be calculated from the eigenvectors of the within-group sums of squares and cross-products matrix. However, g03acf calculates the canonical variates by means of a singular value decomposition (SVD) of a matrix V. Let the data matrix with variable (column) means subtracted be X, and let its rank be k; then the k by (ng-1) matrix V is given by:
V = QXT Qg ,  
where Qg is an n by ng-1 orthogonal matrix that defines the groups and QX is the first k rows of the orthogonal matrix Q either from the QR decomposition of X:
X=QR  
if X is of full column rank, i.e., k=nx, else from the SVD of X:
X=QDPT .  
Let the SVD of V be:
V = Ux Δ UgT  
then the nonzero elements of the diagonal matrix Δ, δi, for i=1,2,,l, are the l canonical correlations associated with the l = mink,ng-1 canonical variates, where l = mink,ng .
The eigenvalues, λi2, of the within-group sums of squares matrix are given by:
λi2=δi2 1-δi2  
and the value of πi=λi2/λi2 gives the proportion of variation explained by the ith canonical variate. The values of the πi's give an indication as to how many canonical variates are needed to adequately describe the data, i.e., the dimensionality of the problem.
To test for a significant dimensionality greater than i the χ2 statistic:
n-1-ng-12k-ng j=i+1 l log 1 + λj2  
can be used. This is asymptotically distributed as a χ2-distribution with k-ing-1-i degrees of freedom. If the test for i=h is not significant, then the remaining tests for i>h should be ignored.
The loadings for the canonical variates are calculated from the matrix Ux. This matrix is scaled so that the canonical variates have unit within-group variance.
In addition to the canonical variates loadings the means for each canonical variate are calculated for each group.
Weights can be used with the analysis, in which case the weighted means are subtracted from each column and then each row is scaled by an amount wi, where wi is the weight for the ith observation (row).

4 References

Chatfield C and Collins A J (1980) Introduction to Multivariate Analysis Chapman and Hall
Gnanadesikan R (1977) Methods for Statistical Data Analysis of Multivariate Observations Wiley
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25
Kendall M G and Stuart A (1969) The Advanced Theory of Statistics (Volume 1) (3rd Edition) Griffin

5 Arguments

1: weight Character(1) Input
On entry: indicates if weights are to be used.
weight='U'
No weights are used.
weight='W' or 'V'
Weights are used and must be supplied in wt.
If weight='W', the weights are treated as frequencies and the effective number of observations is the sum of the weights.
If weight='V', the weights are treated as being inversely proportional to the variance of the observations and the effective number of observations is the number of observations with nonzero weights.
Constraint: weight='U', 'W' or 'V'.
2: n Integer Input
On entry: n, the number of observations.
Constraint: nnx+ng.
3: m Integer Input
On entry: m, the total number of variables.
Constraint: mnx.
4: xldxm Real (Kind=nag_wp) array Input
On entry: xij must contain the ith observation for the jth variable, for i=1,2,,n and j=1,2,,m.
5: ldx Integer Input
On entry: the first dimension of the array x as declared in the (sub)program from which g03acf is called.
Constraint: ldxn.
6: isxm Integer array Input
On entry: isxj indicates whether or not the jth variable is to be included in the analysis.
If isxj>0, the variables contained in the jth column of x is included in the canonical variate analysis, for j=1,2,,m.
Constraint: isxj>0 for nx values of j.
7: nx Integer Input
On entry: the number of variables in the analysis, nx.
Constraint: nx1.
8: ingn Integer array Input
On entry: ingi indicates which group the ith observation is in, for i=1,2,,n. The effective number of groups is the number of groups with nonzero membership.
Constraint: 1inging, for i=1,2,,n.
9: ng Integer Input
On entry: the number of groups, ng.
Constraint: ng2.
10: wt* Real (Kind=nag_wp) array Input
Note: the dimension of the array wt must be at least n if weight='W' or 'V', and at least 1 otherwise.
On entry: if weight='W' or 'V', the first n elements of wt must contain the weights to be used in the analysis.
If wti=0.0, the ith observation is not included in the analysis.
If weight='U', wt is not referenced.
Constraints:
  • wti0.0, for i=1,2,,n;
  • 1nwtinx+effective number of groups.
11: nigng Integer array Output
On exit: nigj gives the number of observations in group j, for j=1,2,,ng.
12: cvmldcvmnx Real (Kind=nag_wp) array Output
On exit: cvmij contains the mean of the jth canonical variate for the ith group, for i=1,2,,ng and j=1,2,,l; the remaining columns, if any, are used as workspace.
13: ldcvm Integer Input
On entry: the first dimension of the array cvm as declared in the (sub)program from which g03acf is called.
Constraint: ldcvmng.
14: elde6 Real (Kind=nag_wp) array Output
On exit: the statistics of the canonical variate analysis.
ei1
The canonical correlations, δi, for i=1,2,,l.
ei2
The eigenvalues of the within-group sum of squares matrix, λi2, for i=1,2,,l.
ei3
The proportion of variation explained by the ith canonical variate, for i=1,2,,l.
ei4
The χ2 statistic for the ith canonical variate, for i=1,2,,l.
ei5
The degrees of freedom for χ2 statistic for the ith canonical variate, for i=1,2,,l.
ei6
The significance level for the χ2 statistic for the ith canonical variate, for i=1,2,,l.
15: lde Integer Input
On entry: the first dimension of the array e as declared in the (sub)program from which g03acf is called.
Constraint: ldeminnx,ng-1.
16: ncv Integer Output
On exit: the number of canonical variates, l. This will be the minimum of ng-1 and the rank of x.
17: cvxldcvxng-1 Real (Kind=nag_wp) array Output
On exit: the canonical variate loadings. cvxij contains the loading coefficient for the ith variable on the jth canonical variate, for i=1,2,,nx and j=1,2,,l; the remaining columns, if any, are used as workspace.
18: ldcvx Integer Input
On entry: the first dimension of the array cvx as declared in the (sub)program from which g03acf is called.
Constraint: ldcvxnx.
19: tol Real (Kind=nag_wp) Input
On entry: the value of tol is used to decide if the variables are of full rank and, if not, what is the rank of the variables. The smaller the value of tol the stricter the criterion for selecting the singular value decomposition. If a non-negative value of tol less than machine precision is entered, the square root of machine precision is used instead.
Constraint: tol0.0.
20: irankx Integer Output
On exit: the rank of the dependent variables.
If the variables are of full rank then irankx=nx.
If the variables are not of full rank then irankx is an estimate of the rank of the dependent variables. irankx is calculated as the number of singular values greater than tol×(largest singular value).
21: wkiwk Real (Kind=nag_wp) array Workspace
22: iwk Integer Input
On entry: the dimension of the array wk as declared in the (sub)program from which g03acf is called.
Constraints:
  • if nxng-1, iwkn×nx+max5×nx-1+nx+1×nx,n+1;
  • if nx<ng-1, iwkn×nx+max5×nx-1+ng-1×nx,n+1.
23: ifail Integer Input/Output
On entry: ifail must be set to 0, -1 or 1 to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of 0 causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of -1 means that an error message is printed while a value of 1 means that it is not.
If halting is not appropriate, the value -1 or 1 is recommended. If message printing is undesirable, then the value 1 is recommended. Otherwise, the value 0 is recommended. When the value -1 or 1 is used it is essential to test the value of ifail on exit.
On exit: ifail=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry ifail=0 or -1, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
ifail=1
On entry, iwk=value
Constraint: iwkvalue.
On entry, ldcvm=value and ng=value.
Constraint: ldcvmng.
On entry, ldcvx=value and nx=value.
Constraint: ldcvxnx.
On entry, lde=value and minnx,ng-1=value.
Constraint: ldeminnx,ng-1.
On entry, ldx=value and n=value.
Constraint: ldxn.
On entry, m=value and nx=value.
Constraint: mnx.
On entry, n=value and nx+ng=value.
Constraint: nnx+ng.
On entry, ng=value.
Constraint: ng2.
On entry, nx=value.
Constraint: nx1.
On entry, tol=value.
Constraint: tol0.0.
On entry, weight=value.
Constraint: weight='U', 'W' or 'V'.
ifail=2
On entry, i=value and wti<0.0.
Constraint: wti0.0.
ifail=3
On entry, i=value, ingi=value and ng=value.
Constraint: 1inging.
ifail=4
On entry, nx=value, expected value=value.
Constraint: nx must be consistent with isx.
ifail=5
The singular value decomposition has failed to converge. This is an unlikely error exit.
ifail=6
A canonical correlation is equal to 1.0. This will happen if the variables provide an exact indication as to which group every observation is allocated.
ifail=7
Less than 2 groups have nonzero membership.
The effective number of observations is less than the effective number of groups plus number of variables.
ifail=8
The rank of x is 0. This will happen if all the variables are constants.
ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
ifail=-399
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
ifail=-999
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

As the computation involves the use of orthogonal matrices and a singular value decomposition rather than the traditional computing of a sum of squares matrix and the use of an eigenvalue decomposition, g03acf should be less affected by ill-conditioned problems.

8 Parallelism and Performance

g03acf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g03acf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

None.

10 Example

This example uses a sample of nine observations, each consisting of three variables plus a group indicator. There are three groups. An unweighted canonical variate analysis is performed and the results printed.

10.1 Program Text

Program Text (g03acfe.f90)

10.2 Program Data

Program Data (g03acfe.d)

10.3 Program Results

Program Results (g03acfe.r)