PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_mv_canon_corr (g03ad)
Purpose
nag_mv_canon_corr (g03ad) performs canonical correlation analysis upon input data matrices.
Syntax
[
e,
ncv,
cvx,
cvy,
ifail] = g03ad(
z,
isz,
nx,
ny,
mcv,
tol, 'n',
n, 'm',
m, 'wt',
wt)
[
e,
ncv,
cvx,
cvy,
ifail] = nag_mv_canon_corr(
z,
isz,
nx,
ny,
mcv,
tol, 'n',
n, 'm',
m, 'wt',
wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 24: |
weight was removed from the interface; wt was made optional |
At Mark 22: |
n was made optional |
Description
Let there be two sets of variables, and . For a sample of observations on variables in a data matrix and variables in a data matrix , canonical correlation analysis seeks to find a small number of linear combinations of each set of variables in order to explain or summarise the relationships between them. The variables thus formed are known as canonical variates.
Let the variance-covariance matrix of the two datasets be
and let
then the canonical correlations can be calculated from the eigenvalues of the matrix
. However,
nag_mv_canon_corr (g03ad) calculates the canonical correlations by means of a singular value decomposition (SVD) of a matrix
. If the rank of the data matrix
is
and the rank of the data matrix
is
, and both
and
have had variable (column) means subtracted then the
by
matrix
is given by:
where
is the first
columns of the orthogonal matrix
either from the
decomposition of
if
is of full column rank, i.e.,
:
or from the SVD of
if
:
Similarly
is the first
columns of the orthogonal matrix
either from the
decomposition of
if
is of full column rank, i.e.,
:
or from the SVD of
if
:
Let the SVD of
be:
then the nonzero elements of the diagonal matrix
,
, for
, are the
canonical correlations associated with the
canonical variates, where
.
The eigenvalues,
, of the matrix
are given by:
The value of
gives the proportion of variation explained by the
th canonical variate. The values of the
's give an indication as to how many canonical variates are needed to adequately describe the data, i.e., the dimensionality of the problem.
To test for a significant dimensionality greater than
the
statistic:
can be used. This is asymptotically distributed as a
-distribution with
degrees of freedom. If the test for
is not significant, then the remaining tests for
should be ignored.
The loadings for the canonical variates are calculated from the matrices and respectively. These matrices are scaled so that the canonical variates have unit variance.
References
Hastings N A J and Peacock J B (1975) Statistical Distributions Butterworth
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill
Parameters
Compulsory Input Parameters
- 1:
– double array
-
ldz, the first dimension of the array, must satisfy the constraint
.
must contain the
th observation for the
th variable, for
and
.
Both
and
variables are to be included in
z, the indicator array,
isz, being used to assign the variables in
z to the
or
sets as appropriate.
- 2:
– int64int32nag_int array
-
indicates whether or not the
th variable is included in the analysis and to which set of variables it belongs.
- The variable contained in the th column of z is included as an variable in the analysis.
- The variable contained in the th column of z is included as a variable in the analysis.
- The variable contained in the th column of z is not included in the analysis.
Constraint:
only
nx elements of
isz can be
and only
ny elements of
isz can be
.
- 3:
– int64int32nag_int scalar
-
The number of variables in the analysis, .
Constraint:
.
- 4:
– int64int32nag_int scalar
-
The number of variables in the analysis, .
Constraint:
.
- 5:
– int64int32nag_int scalar
-
An upper limit to the number of canonical variates.
Constraint:
.
- 6:
– double scalar
-
The value of
tol is used to decide if the variables are of full rank and, if not, what is the rank of the variables. The smaller the value of
tol the stricter the criterion for selecting the singular value decomposition. If a non-negative value of
tol less than
machine precision is entered, the square root of
machine precision is used instead.
Constraint:
.
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the dimension of the array
wt and the first dimension of the array
z. (An error is raised if these dimensions are not equal.)
, the number of observations.
Constraint:
.
- 2:
– int64int32nag_int scalar
-
Default:
the dimension of the array
isz and the second dimension of the array
z. (An error is raised if these dimensions are not equal.)
, the total number of variables.
Constraint:
.
- 3:
– double array
-
The dimension of the array
wt
must be at least
if
, and at least
otherwise
If
, the first
elements of
wt must contain the weights to be used in the analysis.
If , the th observation is not included in the analysis. The effective number of observations is the sum of weights.
If
,
wt is not referenced and the effective number of observations is
.
Constraints:
- , for ;
- the .
Output Parameters
- 1:
– double array
-
The statistics of the canonical variate analysis.
- The canonical correlations,
, for .
- The eigenvalues of ,
, for .
- The proportion of variation explained by the
th canonical variate, for .
- The statistic for the
th canonical variate, for .
- The degrees of freedom for statistic for the
th canonical variate, for .
- The significance level for the statistic for the
th canonical variate, for .
- 2:
– int64int32nag_int scalar
-
The number of canonical correlations, . This will be the minimum of the rank of and the rank of .
- 3:
– double array
-
The canonical variate loadings for the variables. contains the loading coefficient for the th variable on the th canonical variate.
- 4:
– double array
-
The canonical variate loadings for the variables. contains the loading coefficient for the th variable on the th canonical variate.
- 5:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
Cases prefixed with W are classified as warnings and
do not generate an error of type NAG:error_n. See nag_issue_warnings.
-
-
On entry, | , |
or | , |
or | , |
or | , |
or | , |
or | , |
or | , |
or | , |
or | , |
or | and , |
or | and , |
or | or , |
or | . |
-
-
On entry, | a and value of . |
-
-
On entry, | the number of variables to be included in the analysis as indicated by isz is not equal to nx. |
or | the number of variables to be included in the analysis as indicated by isz is not equal to ny. |
-
-
On entry, | the effective number of observations is less than . |
-
-
A singular value decomposition has failed to converge. See
nag_eigen_real_triang_svd (f02wu). This is an unlikely error exit.
- W
-
A canonical correlation is equal to . This will happen if the and variables are perfectly correlated.
- W
-
On entry, the rank of the matrix or the rank of the matrix is . This will happen if all the or variables are constants.
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
As the computation involves the use of orthogonal matrices and a singular value decomposition rather than the traditional computing of a sum of squares matrix and the use of an eigenvalue decomposition, nag_mv_canon_corr (g03ad) should be less affected by ill-conditioned problems.
Further Comments
None.
Example
This example has nine observations and two variables in each set of the four variables read in, the second and third are variables while the first and last are variables. Canonical variate analysis is performed and the results printed.
Open in the MATLAB editor:
g03ad_example
function g03ad_example
fprintf('g03ad example results\n\n');
z = [80, 58.4, 14.0, 21;
75, 59.2, 15.0, 27;
78, 60.3, 15.0, 27;
75, 57.4, 13.0, 22;
79, 59.5, 14.0, 26;
78, 58.1, 14.5, 26;
75, 58.0, 12.5, 23;
64, 55.5, 11.0, 22;
80, 59.2, 12.5, 22];
isz = [int64(-1);1;1;-1];
nx = int64(2);
ny = nx;
mcv = nx;
tol = 1e-06;
[e, ncv, cvx, cvy, ifail] = ...
g03ad( ...
z, isz, nx, ny, mcv, tol);
fprintf('Rank of x = %d, Rank of y = %d\n\n', nx, ny);
fprintf('Canonical Eigenvalues Percentage Chisq DF Sig\n');
fprintf('correlations variation\n');
fprintf('%11.4f%12.4f%12.4f%10.4f%8.1f%8.4f\n',e');
fprintf('\n');
mtitle = 'Canonical Coefficients for x';
matrix = 'General';
diag = ' ';
[ifail] = x04ca( ...
matrix, diag, cvx, mtitle);
fprintf('\n');
mtitle = 'Canonical Coefficients for y';
[ifail] = x04ca( ...
matrix, diag, cvy, mtitle);
g03ad example results
Rank of x = 2, Rank of y = 2
Canonical Eigenvalues Percentage Chisq DF Sig
correlations variation
0.9570 0.9159 0.8746 14.3914 4.0 0.0061
0.3624 0.1313 0.1254 0.7744 1.0 0.3789
Canonical Coefficients for x
1 2
1 -0.4261 1.0337
2 -0.3444 -1.1136
Canonical Coefficients for y
1 2
1 -0.1415 0.1504
2 -0.2384 -0.3424
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015