void	g02bxc (Integer n, Integer m, const double x[], Integer tdx, const Integer sx[], const double wt[], double sw, double wmean[], double std[], double r[], Integer tdr, double v[], Integer tdv, NagError fail)

The function may be called by the names: g02bxc, nag_correg_corrmat or nag_corr_cov.

3 Description

For

n

observations on

m

variables the one-pass algorithm of West (1979) as implemented in g02buc is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for

p

selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:

(a) The means

{\bar{x}}_{j} = \frac{\sum_{i = 1}^{n} w_{i} x_{i j}}{\sum_{i = 1}^{n} w_{i}} j = 1, \dots, p

(b) The variance-covariance matrix

C_{j k} = \frac{\sum_{i = 1}^{n} w_{i} (x_{i j} - {\bar{x}}_{j}) (x_{i k} - {\bar{x}}_{k})}{\sum_{i = 1}^{n} w_{i} - 1} j, k = 1, \dots, p

s_{j} = \sqrt{C_{j j}} j = 1, \dots, p

(d) The Pearson product-moment correlation coefficients

R_{j k} = \frac{C_{j k}}{\sqrt{C_{j j} C_{k k}}} j, k = 1, \dots, p

where

x_{i j}

is the value of the

i

th observation on the

j

th variable and

w_{i}

is the weight for the

i

th observation which will be 1 in the unweighted case.

Note that the denominator for the variance-covariance is

\sum_{i = 1}^{n} w_{i} - 1

, so the weights should be scaled so that the sum of weights reflects the true sample size.

4 References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag

West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

5 Arguments

1: $n$ – Integer Input

On entry: the number of observations in the dataset,

n

Constraint:

n > 1

2: $m$ – Integer Input

On entry: the total number of variables,

m

Constraint:

m \geq 1

3: $x [n \times tdx]$ – const double Input

On entry: the data

x [(i - 1) \times tdx + j - 1]

must contain the

i

th observation on the

j

th variable,

x_{i j}

, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m

4: $tdx$ – Integer Input

On entry: the stride separating matrix column elements in the array x.

Constraint:

tdx \geq m

5: $sx [m]$ – const Integer Input

On entry: indicates which

p

variables to include in the analysis.

$sx [j - 1] > 0$: The $j$ th variable is to be included.
$sx [j - 1] = 0$: The $j$ th variable is not to be included.
sx is set to NULL: All variables are included in the analysis, i.e., $p = m$ .

Constraint:

sx [i] \geq 0

, for

i = 1, 2, \dots, m

6: $wt [n]$ – const double Input

On entry:

w

, the optional frequency weighting for each observation, with

wt [i - 1] = w_{i}

. Usually

w_{i}

will be an integral value corresponding to the number of observations associated with the

i

th data value, or zero if the

i

th data value is to be ignored. If wt is NULL then

w_{i}

is set to

1

for all

i

Constraints:

if wt is not NULL,

$wt [i - 1] \geq 0.0$ , for $i = 1, 2, \dots, n$ ;
$\sum_{i = 1}^{n} wt [i - 1] > 1.0$ .

7: $sw$ – double * Output

On exit: the sum of weights if wt is not NULL, otherwise sw contains the number of observations,

n

8: $wmean [m]$ – double Output

On exit: the sample means.

wmean [j - 1]

contains the mean for the

j

th variable.

9: $std [m]$ – double Output

On exit: the standard deviations.

std [j - 1]

contains the standard deviation for the

j

th variable.

10: $r [m \times tdr]$ – double Output

On exit: the matrix of Pearson product-moment correlation coefficients.

r [(j - 1) \times tdr + k - 1]

contains the correlation between variables

j

and

k

, for

j, k = 1, \dots, p

11: $tdr$ – Integer Input

On entry: the stride separating matrix column elements in the array r.

Constraint:

tdr \geq m

12: $v [m \times tdv]$ – double Output

On exit: the variance-covariance matrix.

v [(j - 1) \times tdv + k - 1]

contains the covariance between variables

j

and

k

, for

j, k = 1, \dots, p

13: $tdv$ – Integer Input

On entry: the stride separating matrix column elements in the array v.

Constraint:

tdv \geq m

14: $fail$ – NagError * Input/Output

The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_2_INT_ARG_LT: On entry, $tdr = ⟨ value ⟩$ while $m = ⟨ value ⟩$ .
The arguments must satisfy $tdr \geq m$ .

On entry, $tdv = ⟨ value ⟩$ while $m = ⟨ value ⟩$ . These arguments must satisfy $tdv \geq m$ .

On entry, $tdx = ⟨ value ⟩$ while $m = ⟨ value ⟩$ . These arguments must satisfy $tdx \geq m$ .
NE_ALLOC_FAIL: Dynamic memory allocation failed.
NE_INT_ARG_LE: On entry, n must be greater than 1: $n = ⟨ value ⟩$ .
NE_INT_ARG_LT: On entry, $m = ⟨ value ⟩$ .
Constraint: $m \geq 1$ .
NE_NEG_SX: On entry, at least one element of sx is negative.
NE_NEG_WEIGHT: On entry, at least one of the weights is negative.
NE_POS_SX: On entry, no element of sx is positive.
NE_SW_LT_ONE: On entry, the sum of weights is less than 1.0.
NE_VAR_EQ_ZERO: A variable has zero variance.
At least one variable has zero variance. In this case v and std are as calculated, but r will contain zero for any correlation involving a variable with zero variance.

7 Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

8 Parallelism and Performance

g02bxc is not threaded in any implementation.

9 Further Comments

Correlation coefficients based on ranks can be computed using g02brc.

10 Example

A program to calculate the means, standard deviations, variance-covariance matrix and a matrix of Pearson product-moment correlation coefficients for a set of 3 observations of 3 variables.

g02bx: FL CL CPP AD

NAG CL Interfaceg02bxc (corrmat)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG CL Interface
g02bxc (corrmat)