NAG Library Routine Document

g02bxf calculates the sample means, the standard deviations, the variance-covariance matrix, and the matrix of Pearson product-moment correlation coefficients for a set of data. Weights may be used.

2

Specification

Fortran Interface

Subroutine g02bxf (

weight, n, m, x, ldx, wt, xbar, std, v, ldv, r, ifail)

Integer, Intent (In)	::	n, m, ldx, ldv
Integer, Intent (Inout)	::	ifail
Real (Kind=nag_wp), Intent (In)	::	x(ldx,m), wt(*)
Real (Kind=nag_wp), Intent (Inout)	::	v(ldv,m), r(ldv,m)
Real (Kind=nag_wp), Intent (Out)	::	xbar(m), std(m)
Character (1), Intent (In)	::	weight

C Header Interface

#include <nagmk26.h>

void	g02bxf_ (const char weight, const Integer n, const Integer m, const double x[], const Integer ldx, const double wt[], double xbar[], double std[], double v[], const Integer ldv, double r[], Integer ifail, const Charlen length_weight)

3

Description

For

n

observations on

m

variables the one-pass algorithm of West (1979) as implemented in g02buf is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for

p

selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:

(a) The means

{\bar{x}}_{j} = \frac{\sum_{i = 1}^{n} w_{i} x_{i j}}{\sum_{i = 1}^{n} w_{i}} j = 1, \dots, p

(b) The variance-covariance matrix

C_{j k} = \frac{\sum_{i = 1}^{n} w_{i} (x_{i j} - {\bar{x}}_{j}) (x_{i k} - {\bar{x}}_{k})}{\sum_{i = 1}^{n} w_{i} - 1} j, k = 1, \dots, p

s_{j} = \sqrt{C_{j j}} j = 1, \dots, p

(d) The Pearson product-moment correlation coefficients

R_{j k} = \frac{C_{j k}}{\sqrt{C_{j j} C_{k k}}} j, k = 1, \dots, p

where

x_{i j}

is the value of the

i

th observation on the

j

th variable and

w_{i}

is the weight for the

i

th observation which will be

1

in the unweighted case.

Note that the denominator for the variance-covariance is

\sum_{i = 1}^{n} w_{i} - 1

, so the weights should be scaled so that the sum of weights reflects the true sample size.

4

References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag

West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

5

Arguments

1: $weight$ – Character(1)Input

On entry: indicates whether weights are to be used.

$weight ='U'$: Weights are not used and unit weights are assumed.
$weight ='W'$ or $'V'$: Weights are used and must be supplied in wt. The only difference between $weight ='W'$ or $weight ='V'$ is in computing the variance. If $weight ='W'$ the divisor for the variance is the sum of the weights minus one and if $weight ='V'$ the divisor is the number of observations with nonzero weights minus one. The former is useful if the weights represent the frequency of the observed values.

Constraint:

weight ='U'

'V'

'W'

2: $n$ – IntegerInput

On entry: the number of data observations in the sample.

Constraint:

n > 1

3: $m$ – IntegerInput

On entry: the number of variables.

Constraint:

m \geq 1

4: $x (ldx, m)$ – Real (Kind=nag_wp) arrayInput

On entry:

x (i, j)

must contain the

i

th observation for the

j

th variable, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m

5: $ldx$ – IntegerInput

On entry: the first dimension of the array x as declared in the (sub)program from which g02bxf is called.

Constraint:

ldx \geq n

6: $wt (*)$ – Real (Kind=nag_wp) arrayInput

Note: the dimension of the array wt must be at least

n

weight ='W'

'V'

On entry:

w

, the optional frequency weighting for each observation, with

wt (i) = w_{i}

. Usually

w_{i}

will be an integral value corresponding to the number of observations associated with the

i

th data value, or zero if the

i

th data value is to be ignored. If

weight ='U'

w_{i}

is set to

1

for all

i

and wt is not referenced.

Constraint: if

weight ='W'

'V'

\sum_{i = 1}^{n} wt (i) > 1.0

wt (i) \geq 0.0

, for

i = 1, 2, \dots, n

7: $xbar (m)$ – Real (Kind=nag_wp) arrayOutput

On exit: the sample means.

xbar (j)

contains the mean of the

j

th variable.

8: $std (m)$ – Real (Kind=nag_wp) arrayOutput

On exit: the standard deviations.

std (j)

contains the standard deviation for the

j

th variable.

9: $v (ldv, m)$ – Real (Kind=nag_wp) arrayOutput

On exit: the variance-covariance matrix.

v (j, k)

contains the covariance between variables

j

and

k

, for

j = 1, 2, \dots, m

and

k = 1, 2, \dots, m

10: $ldv$ – IntegerInput

On entry: the first dimension of the arrays r and v as declared in the (sub)program from which g02bxf is called.

Constraint:

ldv \geq m

11: $r (ldv, m)$ – Real (Kind=nag_wp) arrayOutput

On exit: the matrix of Pearson product-moment correlation coefficients.

r (j, k)

contains the correlation coefficient between variables

j

and

k

12: $ifail$ – IntegerInput/Output

On entry: ifail must be set to

0

- 1 or 1

. If you are unfamiliar with this argument you should refer to Section 3.4 in How to Use the NAG Library and its Documentation for details.

For environments where it might be inappropriate to halt program execution when an error is detected, the value

- 1 or 1

is recommended. If the output of error messages is undesirable, then the value

1

is recommended. Otherwise, because for this routine the values of the output arguments may be useful even if

ifail \neq 0

on exit, the recommended value is

- 1

. When the value $- 1 or 1$ is used it is essential to test the value of ifail on exit.

On exit:

ifail = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

6

Error Indicators and Warnings

If on entry

ifail = 0

- 1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Note: g02bxf may return useful information for one or more of the following detected errors or warnings.

Errors or warnings detected by the routine:

$ifail = 1$: On entry, $ldv = 〈value〉$ and $m = 〈value〉$ .
Constraint: $ldv \geq m$ .

On entry, $ldx = 〈value〉$ and $n = 〈value〉$ .
Constraint: $ldx \geq n$ .

On entry, $m = 〈value〉$ .
Constraint: $m \geq 1$ .

On entry, $n = 〈value〉$ .
Constraint: $n > 1$ .

$ifail = 2$: On entry, $weight = 〈value〉$ .
Constraint: $weight ='U'$ , $'V'$ or $'W'$

$ifail = 3$: On entry, at least one value of wt is negative.
Constraint: $wt (i) \geq 0$ , for $i = 1, 2, \dots, n$ .

$ifail = 4$: On entry, $〈value〉$ observations have nonzero weight.
Constraint: at least two observations must have a nonzero weight.

On entry, Sum of the weights is $〈value〉$ .
Constraint: Sum of the weights must be greater than $1$ .

$ifail = 5$: A variable has a zero variance. In this case v and std are returned as calculated but r will contain zero for any correlation involving a variable with zero variance.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 3.9 in How to Use the NAG Library and its Documentation for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 3.8 in How to Use the NAG Library and its Documentation for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 3.7 in How to Use the NAG Library and its Documentation for further information.

7

Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

8

Parallelism and Performance

g02bxf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9

Further Comments

None.

10

Example

The data are some of the results from 1988 Olympic Decathlon. They are the times (in seconds) for the 100m and 400m races and the distances (in metres) for the long jump, high jump and shot. Twenty observations are input and the correlation matrix is computed and printed.

NAG Library Routine Document

g02bxf (corrmat)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Parallelism and Performance

9

Further Comments

10

Example

10.1

Program Text

10.2

Program Data

10.3

Program Results