NAG CL Interface
g02bxc (corrmat)

1 Purpose

g02bxc calculates the Pearson product-moment correlation coefficients and the variance-covariance matrix for a set of data. Weights may be used.

2 Specification

#include <nag.h>
void  g02bxc (Integer n, Integer m, const double x[], Integer tdx, const Integer sx[], const double wt[], double *sw, double wmean[], double std[], double r[], Integer tdr, double v[], Integer tdv, NagError *fail)
The function may be called by the names: g02bxc, nag_correg_corrmat or nag_corr_cov.

3 Description

For n observations on m variables the one-pass algorithm of West (1979) as implemented in g02buc is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for p selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
x ¯ j = i=1 n w i x ij i=1 n w i j = 1 , , p  
(b) The variance-covariance matrix
C jk = i=1 n w i x ij - x ¯ j x ik - x ¯ k i=1 n w i - 1 j , k = 1 , , p  
(c) The standard deviations
s j = C jj j = 1 , , p  
(d) The Pearson product-moment correlation coefficients
R jk = C jk C jj C kk j , k = 1 , , p  
where x ij is the value of the i th observation on the j th variable and w i is the weight for the i th observation which will be 1 in the unweighted case.
Note that the denominator for the variance-covariance is i=1 n w i - 1 , so the weights should be scaled so that the sum of weights reflects the true sample size.

4 References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

5 Arguments

1: n Integer Input
On entry: the number of observations in the dataset, n .
Constraint: n>1 .
2: m Integer Input
On entry: the total number of variables, m .
Constraint: m1 .
3: x[n×tdx] const double Input
On entry: the data x[i-1×tdx+j-1] must contain the i th observation on the j th variable, x ij , for i=1,2,,n and j=1,2,,m.
4: tdx Integer Input
On entry: the stride separating matrix column elements in the array x.
Constraint: tdxm .
5: sx[m] const Integer Input
On entry: indicates which p variables to include in the analysis.
sx[j-1] > 0
The j th variable is to be included.
sx[j-1] = 0
The j th variable is not to be included.
sx is set to NULL
All variables are included in the analysis, i.e., p=m .
Constraint: sx[i] 0 , for i=1,2,,m.
6: wt[n] const double Input
On entry: w, the optional frequency weighting for each observation, with wt[i-1]=wi. Usually wi will be an integral value corresponding to the number of observations associated with the i th data value, or zero if the i th data value is to be ignored. If wt is NULL then wi is set to 1 for all i.
Constraints:
if wt is not NULL,
  • wt[i-1]0.0, for i=1,2,,n;
  • i=1 n wt[i-1]>1.0.
7: sw double * Output
On exit: the sum of weights if wt is not NULL, otherwise sw contains the number of observations, n .
8: wmean[m] double Output
On exit: the sample means. wmean[j-1] contains the mean for the j th variable.
9: std[m] double Output
On exit: the standard deviations. std[j-1] contains the standard deviation for the j th variable.
10: r[m×tdr] double Output
On exit: the matrix of Pearson product-moment correlation coefficients. r[j-1×tdr+k-1] contains the correlation between variables j and k , for j , k = 1 , , p .
11: tdr Integer Input
On entry: the stride separating matrix column elements in the array r.
Constraint: tdrm .
12: v[m×tdv] double Output
On exit: the variance-covariance matrix. v[j-1×tdv+k-1] contains the covariance between variables j and k , for j , k = 1 , , p .
13: tdv Integer Input
On entry: the stride separating matrix column elements in the array v.
Constraint: tdvm .
14: fail NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, tdr=value while m=value .
The arguments must satisfy tdrm .
On entry, tdv=value while m=value . These arguments must satisfy tdvm .
On entry, tdx=value while m=value . These arguments must satisfy tdxm .
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_INT_ARG_LE
On entry, n must be greater than 1: n=value .
NE_INT_ARG_LT
On entry, m=value.
Constraint: m1.
NE_NEG_SX
On entry, at least one element of sx is negative.
NE_NEG_WEIGHT
On entry, at least one of the weights is negative.
NE_POS_SX
On entry, no element of sx is positive.
NE_SW_LT_ONE
On entry, the sum of weights is less than 1.0.
NE_VAR_EQ_ZERO
A variable has zero variance.
At least one variable has zero variance. In this case v and std are as calculated, but r will contain zero for any correlation involving a variable with zero variance.

7 Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

8 Parallelism and Performance

g02bxc is not threaded in any implementation.

9 Further Comments

Correlation coefficients based on ranks can be computed using g02brc.

10 Example

A program to calculate the means, standard deviations, variance-covariance matrix and a matrix of Pearson product-moment correlation coefficients for a set of 3 observations of 3 variables.

10.1 Program Text

Program Text (g02bxce.c)

10.2 Program Data

Program Data (g02bxce.d)

10.3 Program Results

Program Results (g02bxce.r)