This manual relates to an old release of the Library.
The documentation for the current release is also available on this site.

NAG CL Interface
g02bxc (corrmat)

Settings help

CL Name Style:


1 Purpose

g02bxc calculates the Pearson product-moment correlation coefficients and the variance-covariance matrix for a set of data. Weights may be used.

2 Specification

#include <nag.h>
void  g02bxc (Integer n, Integer m, const double x[], Integer tdx, const Integer sx[], const double wt[], double *sw, double wmean[], double std[], double r[], Integer tdr, double v[], Integer tdv, NagError *fail)
The function may be called by the names: g02bxc, nag_correg_corrmat or nag_corr_cov.

3 Description

For n observations on m variables the one-pass algorithm of West (1979) as implemented in g02buc is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for p selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
x ¯ j = i=1 n w i x ij i=1 n w i j = 1 , , p  
(b) The variance-covariance matrix
C jk = i=1 n w i ( x ij - x ¯ j ) ( x ik - x ¯ k ) i=1 n w i - 1 j , k = 1 , , p  
(c) The standard deviations
s j = C jj j = 1 , , p  
(d) The Pearson product-moment correlation coefficients
R jk = C jk C jj C kk j , k = 1 , , p  
where x ij is the value of the i th observation on the j th variable and w i is the weight for the i th observation which will be 1 in the unweighted case.
Note that the denominator for the variance-covariance is i=1 n w i - 1 , so the weights should be scaled so that the sum of weights reflects the true sample size.

4 References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

5 Arguments

1: n Integer Input
On entry: the number of observations in the dataset, n .
Constraint: n>1 .
2: m Integer Input
On entry: the total number of variables, m .
Constraint: m1 .
3: x[n×tdx] const double Input
On entry: the data x[(i-1)×tdx+j-1] must contain the i th observation on the j th variable, x ij , for i=1,2,,n and j=1,2,,m.
4: tdx Integer Input
On entry: the stride separating matrix column elements in the array x.
Constraint: tdxm .
5: sx[m] const Integer Input
On entry: indicates which p variables to include in the analysis.
sx[j-1] > 0
The j th variable is to be included.
sx[j-1] = 0
The j th variable is not to be included.
sx is set to NULL
All variables are included in the analysis, i.e., p=m .
Constraint: sx[i] 0 , for i=1,2,,m.
6: wt[n] const double Input
On entry: w, the optional frequency weighting for each observation, with wt[i-1]=wi. Usually wi will be an integral value corresponding to the number of observations associated with the i th data value, or zero if the i th data value is to be ignored. If wt is NULL then wi is set to 1 for all i.
Constraints:
if wt is not NULL,
  • wt[i-1]0.0, for i=1,2,,n;
  • i=1 n wt[i-1]>1.0.
7: sw double * Output
On exit: the sum of weights if wt is not NULL, otherwise sw contains the number of observations, n .
8: wmean[m] double Output
On exit: the sample means. wmean[j-1] contains the mean for the j th variable.
9: std[m] double Output
On exit: the standard deviations. std[j-1] contains the standard deviation for the j th variable.
10: r[m×tdr] double Output
On exit: the matrix of Pearson product-moment correlation coefficients. r[(j-1)×tdr+k-1] contains the correlation between variables j and k , for j , k = 1 , , p .
11: tdr Integer Input
On entry: the stride separating matrix column elements in the array r.
Constraint: tdrm .
12: v[m×tdv] double Output
On exit: the variance-covariance matrix. v[(j-1)×tdv+k-1] contains the covariance between variables j and k , for j , k = 1 , , p .
13: tdv Integer Input
On entry: the stride separating matrix column elements in the array v.
Constraint: tdvm .
14: fail NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, tdr=value while m=value .
The arguments must satisfy tdrm .
On entry, tdv=value while m=value . These arguments must satisfy tdvm .
On entry, tdx=value while m=value . These arguments must satisfy tdxm .
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_INT_ARG_LE
On entry, n must be greater than 1: n=value .
NE_INT_ARG_LT
On entry, m=value.
Constraint: m1.
NE_NEG_SX
On entry, at least one element of sx is negative.
NE_NEG_WEIGHT
On entry, at least one of the weights is negative.
NE_POS_SX
On entry, no element of sx is positive.
NE_SW_LT_ONE
On entry, the sum of weights is less than 1.0.
NE_VAR_EQ_ZERO
A variable has zero variance.
At least one variable has zero variance. In this case v and std are as calculated, but r will contain zero for any correlation involving a variable with zero variance.

7 Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

8 Parallelism and Performance

g02bxc is not threaded in any implementation.

9 Further Comments

Correlation coefficients based on ranks can be computed using g02brc.

10 Example

A program to calculate the means, standard deviations, variance-covariance matrix and a matrix of Pearson product-moment correlation coefficients for a set of 3 observations of 3 variables.

10.1 Program Text

Program Text (g02bxce.c)

10.2 Program Data

Program Data (g02bxce.d)

10.3 Program Results

Program Results (g02bxce.r)