# NAG FL Interfaceg03adf (canon_​corr)

## 1Purpose

g03adf performs canonical correlation analysis upon input data matrices.

## 2Specification

Fortran Interface
 Subroutine g03adf ( n, m, z, ldz, isz, nx, ny, wt, e, lde, ncv, cvx, mcv, cvy, tol, wk, iwk,
 Integer, Intent (In) :: n, m, ldz, isz(m), nx, ny, lde, ldcvx, mcv, ldcvy, iwk Integer, Intent (Inout) :: ifail Integer, Intent (Out) :: ncv Real (Kind=nag_wp), Intent (In) :: z(ldz,m), wt(*), tol Real (Kind=nag_wp), Intent (Inout) :: e(lde,6), cvx(ldcvx,mcv), cvy(ldcvy,mcv) Real (Kind=nag_wp), Intent (Out) :: wk(iwk) Character (1), Intent (In) :: weight
#include <nag.h>
 void g03adf_ (const char *weight, const Integer *n, const Integer *m, const double z[], const Integer *ldz, const Integer isz[], const Integer *nx, const Integer *ny, const double wt[], double e[], const Integer *lde, Integer *ncv, double cvx[], const Integer *ldcvx, const Integer *mcv, double cvy[], const Integer *ldcvy, const double *tol, double wk[], const Integer *iwk, Integer *ifail, const Charlen length_weight)
The routine may be called by the names g03adf or nagf_mv_canon_corr.

## 3Description

Let there be two sets of variables, $x$ and $y$. For a sample of $n$ observations on ${n}_{x}$ variables in a data matrix $X$ and ${n}_{y}$ variables in a data matrix $Y$, canonical correlation analysis seeks to find a small number of linear combinations of each set of variables in order to explain or summarise the relationships between them. The variables thus formed are known as canonical variates.
Let the variance-covariance matrix of the two datasets be
 $Sxx Sxy Syx Syy$
and let
 $Σ=Syy -1SyxSxx -1Sxy$
then the canonical correlations can be calculated from the eigenvalues of the matrix $\Sigma$. However, g03adf calculates the canonical correlations by means of a singular value decomposition (SVD) of a matrix $V$. If the rank of the data matrix $X$ is ${k}_{x}$ and the rank of the data matrix $Y$ is ${k}_{y}$, and both $X$ and $Y$ have had variable (column) means subtracted then the ${k}_{x}$ by ${k}_{y}$ matrix $V$ is given by:
 $V=QxTQy,$
where ${Q}_{x}$ is the first ${k}_{x}$ columns of the orthogonal matrix $Q$ either from the $QR$ decomposition of $X$ if $X$ is of full column rank, i.e., ${k}_{x}={n}_{x}$:
 $X=QxRx$
or from the SVD of $X$ if ${k}_{x}<{n}_{x}$:
 $X=QxDxPxT.$
Similarly ${Q}_{y}$ is the first ${k}_{y}$ columns of the orthogonal matrix $Q$ either from the $QR$ decomposition of $Y$ if $Y$ is of full column rank, i.e., ${k}_{y}={n}_{y}$:
 $Y=QyRy$
or from the SVD of $Y$ if ${k}_{y}<{n}_{y}$:
 $Y=QyDyPyT.$
Let the SVD of $V$ be:
 $V=UxΔUyT$
then the nonzero elements of the diagonal matrix $\Delta$, ${\delta }_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,l$, are the $l$ canonical correlations associated with the $l$ canonical variates, where $l=\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({k}_{x},{k}_{y}\right)$.
The eigenvalues, ${\lambda }_{i}^{2}$, of the matrix $\Sigma$ are given by:
 $λi2 = δi2 .$
The value of ${\pi }_{i}={\lambda }_{i}^{2}/\sum {\lambda }_{i}^{2}$ gives the proportion of variation explained by the $i$th canonical variate. The values of the ${\pi }_{i}$'s give an indication as to how many canonical variates are needed to adequately describe the data, i.e., the dimensionality of the problem.
To test for a significant dimensionality greater than $i$ the ${\chi }^{2}$ statistic:
 $n - 12 kx + ky + 3 ∑ j=i+1 l log 1 - δj2$
can be used. This is asymptotically distributed as a ${\chi }^{2}$-distribution with $\left({k}_{x}-i\right)\left({k}_{y}-i\right)$ degrees of freedom. If the test for $i={k}_{\mathrm{min}}$ is not significant, then the remaining tests for $i>{k}_{\mathrm{min}}$ should be ignored.
The loadings for the canonical variates are calculated from the matrices ${U}_{x}$ and ${U}_{y}$ respectively. These matrices are scaled so that the canonical variates have unit variance.

## 4References

Hastings N A J and Peacock J B (1975) Statistical Distributions Butterworth
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

## 5Arguments

1: $\mathbf{weight}$Character(1) Input
On entry: indicates if weights are to be used.
${\mathbf{weight}}=\text{'U'}$
No weights are used.
${\mathbf{weight}}=\text{'W'}$
Weights are used and must be supplied in wt.
Constraint: ${\mathbf{weight}}=\text{'U'}$ or $\text{'W'}$.
2: $\mathbf{n}$Integer Input
On entry: $n$, the number of observations.
Constraint: ${\mathbf{n}}>{\mathbf{nx}}+{\mathbf{ny}}$.
3: $\mathbf{m}$Integer Input
On entry: $m$, the total number of variables.
Constraint: ${\mathbf{m}}\ge {\mathbf{nx}}+{\mathbf{ny}}$.
4: $\mathbf{z}\left({\mathbf{ldz}},{\mathbf{m}}\right)$Real (Kind=nag_wp) array Input
On entry: ${\mathbf{z}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.
Both $x$ and $y$ variables are to be included in z, the indicator array, isz, being used to assign the variables in z to the $x$ or $y$ sets as appropriate.
5: $\mathbf{ldz}$Integer Input
On entry: the first dimension of the array z as declared in the (sub)program from which g03adf is called.
Constraint: ${\mathbf{ldz}}\ge {\mathbf{n}}$.
6: $\mathbf{isz}\left({\mathbf{m}}\right)$Integer array Input
On entry: ${\mathbf{isz}}\left(j\right)$ indicates whether or not the $j$th variable is included in the analysis and to which set of variables it belongs.
${\mathbf{isz}}\left(j\right)>0$
The variable contained in the $j$th column of z is included as an $x$ variable in the analysis.
${\mathbf{isz}}\left(j\right)<0$
The variable contained in the $j$th column of z is included as a $y$ variable in the analysis.
${\mathbf{isz}}\left(j\right)=0$
The variable contained in the $j$th column of z is not included in the analysis.
Constraint: only nx elements of isz can be $\text{}>0$ and only ny elements of isz can be $\text{}<0$.
7: $\mathbf{nx}$Integer Input
On entry: the number of $x$ variables in the analysis, ${n}_{x}$.
Constraint: ${\mathbf{nx}}\ge 1$.
8: $\mathbf{ny}$Integer Input
On entry: the number of $y$ variables in the analysis, ${n}_{y}$.
Constraint: ${\mathbf{ny}}\ge 1$.
9: $\mathbf{wt}\left(*\right)$Real (Kind=nag_wp) array Input
Note: the dimension of the array wt must be at least ${\mathbf{n}}$ if ${\mathbf{weight}}=\text{'W'}$, and at least $1$ otherwise.
On entry: if ${\mathbf{weight}}=\text{'W'}$, the first $n$ elements of wt must contain the weights to be used in the analysis.
If ${\mathbf{wt}}\left(i\right)=0.0$, the $i$th observation is not included in the analysis. The effective number of observations is the sum of weights.
If ${\mathbf{weight}}=\text{'U'}$, wt is not referenced and the effective number of observations is $n$.
Constraints:
• ${\mathbf{wt}}\left(\mathit{i}\right)\ge 0.0$, for $\mathit{i}=1,2,\dots ,n$;
• the $\text{sum of weights}\ge {\mathbf{nx}}+{\mathbf{ny}}+1$.
10: $\mathbf{e}\left({\mathbf{lde}},6\right)$Real (Kind=nag_wp) array Output
On exit: the statistics of the canonical variate analysis.
${\mathbf{e}}\left(i,1\right)$
The canonical correlations, ${\delta }_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,l$.
${\mathbf{e}}\left(i,2\right)$
The eigenvalues of $\Sigma$, ${\lambda }_{\mathit{i}}^{2}$, for $\mathit{i}=1,2,\dots ,l$.
${\mathbf{e}}\left(i,3\right)$
The proportion of variation explained by the $\mathit{i}$th canonical variate, for $\mathit{i}=1,2,\dots ,l$.
${\mathbf{e}}\left(i,4\right)$
The ${\chi }^{2}$ statistic for the $\mathit{i}$th canonical variate, for $\mathit{i}=1,2,\dots ,l$.
${\mathbf{e}}\left(i,5\right)$
The degrees of freedom for ${\chi }^{2}$ statistic for the $\mathit{i}$th canonical variate, for $\mathit{i}=1,2,\dots ,l$.
${\mathbf{e}}\left(i,6\right)$
The significance level for the ${\chi }^{2}$ statistic for the $\mathit{i}$th canonical variate, for $\mathit{i}=1,2,\dots ,l$.
11: $\mathbf{lde}$Integer Input
On entry: the first dimension of the array e as declared in the (sub)program from which g03adf is called.
Constraint: ${\mathbf{lde}}\ge \mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nx}},{\mathbf{ny}}\right)$.
12: $\mathbf{ncv}$Integer Output
On exit: the number of canonical correlations, $l$. This will be the minimum of the rank of $\mathrm{X}$ and the rank of $\mathrm{Y}$.
13: $\mathbf{cvx}\left({\mathbf{ldcvx}},{\mathbf{mcv}}\right)$Real (Kind=nag_wp) array Output
On exit: the canonical variate loadings for the $x$ variables. ${\mathbf{cvx}}\left(i,j\right)$ contains the loading coefficient for the $i$th $x$ variable on the $j$th canonical variate.
14: $\mathbf{ldcvx}$Integer Input
On entry: the first dimension of the array cvx as declared in the (sub)program from which g03adf is called.
Constraint: ${\mathbf{ldcvx}}\ge {\mathbf{nx}}$.
15: $\mathbf{mcv}$Integer Input
On entry: an upper limit to the number of canonical variates.
Constraint: ${\mathbf{mcv}}\ge \mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nx}},{\mathbf{ny}}\right)$.
16: $\mathbf{cvy}\left({\mathbf{ldcvy}},{\mathbf{mcv}}\right)$Real (Kind=nag_wp) array Output
On exit: the canonical variate loadings for the $y$ variables. ${\mathbf{cvy}}\left(i,j\right)$ contains the loading coefficient for the $i$th $y$ variable on the $j$th canonical variate.
17: $\mathbf{ldcvy}$Integer Input
On entry: the first dimension of the array cvy as declared in the (sub)program from which g03adf is called.
Constraint: ${\mathbf{ldcvy}}\ge {\mathbf{ny}}$.
18: $\mathbf{tol}$Real (Kind=nag_wp) Input
On entry: the value of tol is used to decide if the variables are of full rank and, if not, what is the rank of the variables. The smaller the value of tol the stricter the criterion for selecting the singular value decomposition. If a non-negative value of tol less than machine precision is entered, the square root of machine precision is used instead.
Constraint: ${\mathbf{tol}}\ge 0.0$.
19: $\mathbf{wk}\left({\mathbf{iwk}}\right)$Real (Kind=nag_wp) array Workspace
20: $\mathbf{iwk}$Integer Input
On entry: the dimension of the array wk as declared in the (sub)program from which g03adf is called.
Constraints:
• if ${\mathbf{nx}}\ge {\mathbf{ny}}$, ${\mathbf{iwk}}\ge {\mathbf{n}}×{\mathbf{nx}}+{\mathbf{nx}}+{\mathbf{ny}}+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(\left(5×\left({\mathbf{nx}}-1\right)+{\mathbf{nx}}×{\mathbf{nx}}\right),{\mathbf{n}}×{\mathbf{ny}}\right)+1$;
• if ${\mathbf{nx}}<{\mathbf{ny}}$, ${\mathbf{iwk}}\ge {\mathbf{n}}×{\mathbf{ny}}+{\mathbf{nx}}+{\mathbf{ny}}+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(\left(5×\left({\mathbf{ny}}-1\right)+{\mathbf{ny}}×{\mathbf{ny}}\right),{\mathbf{n}}×{\mathbf{nx}}\right)+1$.
21: $\mathbf{ifail}$Integer Input/Output
On entry: ifail must be set to $0$, $-1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $-1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $-1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $0$ is recommended. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{iwk}}=〈\mathit{\text{value}}〉$
Constraint: ${\mathbf{iwk}}\ge 〈\mathit{\text{value}}〉$.
On entry, ${\mathbf{ldcvx}}=〈\mathit{\text{value}}〉$ and ${\mathbf{nx}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldcvx}}\ge {\mathbf{nx}}$.
On entry, ${\mathbf{ldcvy}}=〈\mathit{\text{value}}〉$ and ${\mathbf{ny}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldcvy}}\ge {\mathbf{ny}}$.
On entry, ${\mathbf{lde}}=〈\mathit{\text{value}}〉$ and $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nx}},{\mathbf{ny}}\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{lde}}\ge \mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nx}},{\mathbf{ny}}\right)$.
On entry, ${\mathbf{ldz}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldz}}\ge {\mathbf{n}}$.
On entry, ${\mathbf{m}}=〈\mathit{\text{value}}〉$ and ${\mathbf{nx}}+{\mathbf{ny}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{m}}\ge {\mathbf{nx}}+{\mathbf{ny}}$.
On entry, ${\mathbf{mcv}}=〈\mathit{\text{value}}〉$ and $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nx}},{\mathbf{ny}}\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{mcv}}\ge \mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nx}},{\mathbf{ny}}\right)$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$ and ${\mathbf{nx}}+{\mathbf{ny}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}>{\mathbf{nx}}+{\mathbf{ny}}$.
On entry, ${\mathbf{nx}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nx}}\ge 1$.
On entry, ${\mathbf{ny}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ny}}\ge 1$.
On entry, ${\mathbf{tol}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{tol}}\ge 0.0$.
On entry, ${\mathbf{weight}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{weight}}=\text{'U'}$ or $\text{'W'}$.
${\mathbf{ifail}}=2$
On entry, $i=〈\mathit{\text{value}}〉$ and ${\mathbf{wt}}\left(i\right)<0.0$.
Constraint: ${\mathbf{wt}}\left(\mathit{i}\right)\ge 0.0$.
${\mathbf{ifail}}=3$
On entry, ${\mathbf{nx}}=〈\mathit{\text{value}}〉$, expected $\text{value}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nx}}$ must be consistent with isz.
On entry, ${\mathbf{ny}}=〈\mathit{\text{value}}〉$, expected $\text{value}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ny}}$ must be consistent with isz.
${\mathbf{ifail}}=4$
On entry, the effective number of observations is less than ${\mathbf{nx}}+{\mathbf{ny}}+1$.
${\mathbf{ifail}}=5$
The singular value decomposition has failed to converge. This is an unlikely error exit.
${\mathbf{ifail}}=6$
A canonical correlation is equal to $1.0$. This will happen if the $x$ and $y$ variables are perfectly correlated.
${\mathbf{ifail}}=7$
On entry, the rank of the $X$ matrix is $0$.
On entry, the rank of the $Y$ matrix is $0$.
${\mathbf{ifail}}=-99$
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

## 7Accuracy

As the computation involves the use of orthogonal matrices and a singular value decomposition rather than the traditional computing of a sum of squares matrix and the use of an eigenvalue decomposition, g03adf should be less affected by ill-conditioned problems.

## 8Parallelism and Performance

g03adf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g03adf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

None.

## 10Example

This example has nine observations and two variables in each set of the four variables read in, the second and third are $x$ variables while the first and last are $y$ variables. Canonical variate analysis is performed and the results printed.