NAG CL Interface
g03acc (canon_var)
1
Purpose
g03acc performs a canonical variate (canonical discrimination) analysis.
2
Specification
void |
g03acc (Nag_Weightstype weight,
Integer n,
Integer m,
const double x[],
Integer tdx,
const Integer isx[],
Integer nx,
const Integer ing[],
Integer ng,
const double wt[],
Integer nig[],
double cvm[],
Integer tdcvm,
double e[],
Integer tde,
Integer *ncv,
double cvx[],
Integer tdcvx,
double tol,
Integer *irankx,
NagError *fail) |
|
The function may be called by the names: g03acc or nag_mv_canon_var.
3
Description
Let a sample of observations on variables in a data matrix come from groups with observations in each group, . Canonical variate analysis finds the linear combination of the variables that maximizes the ratio of between-group to within-group variation. The variables formed, the canonical variates can then be used to discriminate between groups.
The canonical variates can be calculated from the eigenvectors of the within-group sums of squares and cross-products matrix. However, g03acc calculates the canonical variates by means of a singular value decomposition (SVD) of a matrix . Let the data matrix with variable (column) means subtracted be , and let its rank be ; then the by matrix is given by:
where
is an
by
orthogonal matrix that defines the groups and
is the first
rows of the orthogonal matrix
either from the
decomposition of
:
if
is of full column rank, i.e.,
, else from the SVD of
:
Let the SVD of
be:
then the nonzero elements of the diagonal matrix
,
, for
, are the
canonical correlations associated with the
canonical variates, where
.
The eigenvalues,
, of the within-group sums of squares matrix are given by:
and the value of
gives the proportion of variation explained by the
th canonical variate. The values of the
's give an indication as to how many canonical variates are needed to adequately describe the data, i.e., the dimensionality of the problem.
To test for a significant dimensionality greater than
the
statistic:
can be used. This is asymptotically distributed as a
distribution with
degrees of freedom. If the test for
is not significant, then the remaining tests for
should be ignored.
The loadings for the canonical variates are calculated from the matrix . This matrix is scaled so that the canonical variates have unit within group variance.
In addition to the canonical variates loadings the means for each canonical variate are calculated for each group.
Weights can be used with the analysis, in which case the weighted means are subtracted from each column and then each row is scaled by an amount , where is the weight for the th observation (row).
4
References
Chatfield C and Collins A J (1980) Introduction to Multivariate Analysis Chapman and Hall
Gnanadesikan R (1977) Methods for Statistical Data Analysis of Multivariate Observations Wiley
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25
Kendall M G and Stuart A (1979) The Advanced Theory of Statistics (3 Volumes) (4th Edition) Griffin
5
Arguments
-
1:
– Nag_Weightstype
Input
-
On entry: indicates the type of weights to be used in the analysis.
- No weights are used.
- The weights are treated as frequencies and the effective number of observations is the sum of the weights.
- The weights are treated as being inversely proportional to the variance of the observations and the effective number of observations is the number of observations with nonzero weights.
Constraint:
, or .
-
2:
– Integer
Input
-
On entry: the number of observations, .
Constraint:
.
-
3:
– Integer
Input
-
On entry: the total number of variables, .
Constraint:
.
-
4:
– const double
Input
-
On entry: must contain the th observation for the th variable, for and .
-
5:
– Integer
Input
-
On entry: the stride separating matrix column elements in the array
x.
Constraint:
.
-
6:
– const Integer
Input
-
On entry:
indicates whether or not the
th variable is to be included in the analysis.
If
, then the variable contained in the
th column of
x is included in the canonical variate analysis, for
.
Constraint:
for
nx values of
.
-
7:
– Integer
Input
-
On entry: the number of variables in the analysis, .
Constraint:
.
-
8:
– const Integer
Input
-
On entry: indicates which group the th observation is in, for . The effective number of groups is the number of groups with nonzero membership.
Constraint:
, for .
-
9:
– Integer
Input
-
On entry: the number of groups, .
Constraint:
.
-
10:
– const double
Input
-
On entry: if
or
then the elements of
wt must contain the weights to be used in the analysis.
If then the th observation is not included in the analysis.
Constraints:
- , for ;
- effective number of groups.
Note: if
then
wt is not referenced and may be
NULL.
-
11:
– Integer
Output
-
On exit: gives the number of observations in group , for .
-
12:
– double
Output
-
On exit: contains the mean of the th canonical variate for the th group, for and ; the remaining columns, if any, are used as workspace.
-
13:
– Integer
Input
-
On entry: the stride separating matrix column elements in the array
cvm.
Constraint:
.
-
14:
– double
Output
-
On exit: the statistics of the canonical variate analysis.
, the canonical correlations,
, for
.
, the eigenvalues of the within-group sum of squares matrix, , for .
, the proportion of variation explained by the th canonical variate, for .
, the statistic for the th canonical variate, for .
, the degrees of freedom for statistic for the th canonical variate, for .
, the significance level for the statistic for the th canonical variate, for .
-
15:
– Integer
Input
-
On entry: the stride separating matrix column elements in the array
e.
Constraint:
.
-
16:
– Integer *
Output
-
On exit: the number of canonical variates,
. This will be the minimum of
and the rank of
x.
-
17:
– double
Output
-
On exit: the canonical variate loadings. contains the loading coefficient for the th variable on the th canonical variate, for and ; the remaining columns, if any, are used as workspace.
-
18:
– Integer
Input
-
On entry: the stride separating matrix column elements in the array
cvx.
Constraint:
.
-
19:
– double
Input
-
On entry: the value of
tol is used to decide if the variables are of full rank and, if not, what is the rank of the variables. The smaller the value of
tol the stricter the criterion for selecting the singular value decomposition. If a non-negative value of
tol less than
machine precision is entered, then the square root of
machine precision is used instead.
Constraint:
.
-
20:
– Integer *
Output
-
On exit: the rank of the dependent variables.
If the variables are of full rank then .
If the variables are not of full rank then
irankx is an estimate of the rank of the dependent variables.
irankx is calculated as the number of singular values greater than
(largest singular value).
-
21:
– NagError *
Input/Output
-
The NAG error argument (see
Section 7 in the Introduction to the NAG Library CL Interface).
6
Error Indicators and Warnings
- NE_2_INT_ARG_LT
-
On entry, while . These arguments must satisfy .
On entry, while . These arguments must satisfy .
On entry, while . These arguments must satisfy .
On entry, while . These arguments must satisfy .
- NE_3_INT_ARG_CONS
-
On entry, , and . These arguments must satisfy .
- NE_ALLOC_FAIL
-
Dynamic memory allocation failed.
- NE_BAD_PARAM
-
On entry, argument
weight had an illegal value.
- NE_CANON_CORR_1
-
A canonical correlation is equal to one. This will happen if the variables provide an exact indication as to which group every observation is allocated.
- NE_GROUPS
-
Either the effective number of groups is less than two or the effective number of groups plus the number of variables,
nx is greater than the effective number of observations.
- NE_INT_ARG_LT
-
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INTARR_INT
-
On entry, , . Constraint: , for .
- NE_INTERNAL_ERROR
-
An internal error has occurred in this function. Check the function call
and any array sizes. If the call is correct then please contact
NAG for
assistance.
- NE_NEG_WEIGHT_ELEMENT
-
On entry,
.
Constraint: When referenced, all elements of
wt must be non-negative.
- NE_RANK_ZERO
-
The rank of the variables is zero. This will happen if all the variables are constants.
- NE_REAL_ARG_LT
-
On entry,
tol must not be less than
:
.
- NE_SVD_NOT_CONV
-
The singular value decomposition has failed to converge. This is an unlikely error exit.
- NE_VAR_INCL_INDICATED
-
The number of variables,
nx in the analysis
, while number of variables included in the analysis via array
.
Constraint: these two numbers must be the same.
- NE_WT_ARGS
-
The
wt array argument must not be
NULL when the
weight argument indicates weights.
7
Accuracy
As the computation involves the use of orthogonal matrices and a singular value decomposition rather than the traditional computing of a sum of squares matrix and the use of an eigenvalue decomposition, g03acc should be less affected by ill conditioned problems.
8
Parallelism and Performance
g03acc is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g03acc makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the
Users' Note for your implementation for any additional implementation-specific information.
None.
10
Example
A sample of nine observations, each consisting of three variables plus group indicator, is read in. There are three groups. An unweighted canonical variate analysis is performed and the results printed.
10.1
Program Text
10.2
Program Data
10.3
Program Results