g02bx calculates the sample means, the standard deviations, the variance-covariance matrix, and the matrix of Pearson product-moment correlation coefficients for a set of data. Weights may be used.

Syntax

C#
public static void g02bx( string weight, int n, int m, double[,] x, double[] wt, double[] xbar, double[] std, double[,] v, double[,] r, out int ifail )

Visual Basic
Public Shared Sub g02bx ( _ weight As String, _ n As Integer, _ m As Integer, _ x As Double(,), _ wt As Double(), _ xbar As Double(), _ std As Double(), _ v As Double(,), _ r As Double(,), _ <OutAttribute> ByRef ifail As Integer _ )

Visual Basic

Public Shared Sub g02bx ( _
	weight As String, _
	n As Integer, _
	m As Integer, _
	x As Double(,), _
	wt As Double(), _
	xbar As Double(), _
	std As Double(), _
	v As Double(,), _
	r As Double(,), _
	<OutAttribute> ByRef ifail As Integer _
)

Visual C++
public: static void g02bx( String^ weight, int n, int m, array<double,2>^ x, array<double>^ wt, array<double>^ xbar, array<double>^ std, array<double,2>^ v, array<double,2>^ r, [OutAttribute] int% ifail )

Visual C++

public:
static void g02bx(
	String^ weight, 
	int n, 
	int m, 
	array<double,2>^ x, 
	array<double>^ wt, 
	array<double>^ xbar, 
	array<double>^ std, 
	array<double,2>^ v, 
	array<double,2>^ r, 
	[OutAttribute] int% ifail
)

F#
static member g02bx : weight : string * n : int * m : int * x : float[,] * wt : float[] * xbar : float[] * std : float[] * v : float[,] * r : float[,] * ifail : int byref -> unit

Parameters

weight

Type: System..::..String

On entry: indicates whether weights are to be used.

$weight = "U"$: Weights are not used and unit weights are assumed.
$weight = "W"$ or $"V"$: Weights are used and must be supplied in wt. The only difference between $weight = "W"$ or $weight = "V"$ is in computing the variance. If $weight = "W"$ the divisor for the variance is the sum of the weights minus one and if $weight = "V"$ the divisor is the number of observations with nonzero weights minus one. The former is useful if the weights represent the frequency of the observed values.

Constraint:

weight = "U"

"V"

"W"

n: Type: System..::..Int32
On entry: the number of data observations in the sample.

Constraint: $n > 1$ .

m: Type: System..::..Int32
On entry: the number of variables.

Constraint: $m \geq 1$ .

x: Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [dim1, m]
Note: dim1 must satisfy the constraint: $dim1 \geq n$
On entry: $x [i - 1, j - 1]$ must contain the $i$ th observation for the $j$ th variable, for $i = 1, 2, \dots, n$ and $j = 1, 2, \dots, m$ .

wt: Type: array<System..::..Double>[]()[][]
An array of size [dim1]
Note: the dimension of the array wt must be at least $n$ if $weight = "W"$ or $"V"$ , and at least $1$ otherwise.
On entry: $w$ , the optional frequency weighting for each observation, with $wt [i - 1] = w_{i}$ . Usually $w_{i}$ will be an integral value corresponding to the number of observations associated with the $i$ th data value, or zero if the $i$ th data value is to be ignored. If $weight = "U"$ , $w_{i}$ is set to $1$ for all $i$ and wt is not referenced.

Constraint: if $weight = "W"$ or $"V"$ , $\sum_{i = 0}^{n - 1} wt [i] > 1.0$ , $wt [i] \geq 0.0$ , for $i = 0, 1, \dots, n - 1$ .

xbar: Type: array<System..::..Double>[]()[][]
An array of size [m]
On exit: the sample means. $xbar [j - 1]$ contains the mean of the $j$ th variable.

std: Type: array<System..::..Double>[]()[][]
An array of size [m]
On exit: the standard deviations. $std [j - 1]$ contains the standard deviation for the $j$ th variable.

v: Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [dim1, m]
Note: dim1 must satisfy the constraint: $dim1 \geq m$
On exit: the variance-covariance matrix. $v [j - 1, k - 1]$ contains the covariance between variables $j$ and $k$ , for $j = 1, 2, \dots, m$ and $k = 1, 2, \dots, m$ .

r: Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [dim1, m]
Note: dim1 must satisfy the constraint: $dim1 \geq m$
On exit: the matrix of Pearson product-moment correlation coefficients. $r [j - 1, k - 1]$ contains the correlation coefficient between variables $j$ and $k$ .

ifail: Type: System..::..Int32%
On exit: $ifail = 0$ unless the method detects an error or a warning has been flagged (see [Error Indicators and Warnings]).

Description

For

n

observations on

m

variables the one-pass algorithm of West (1979) as implemented in g02bu is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for

p

selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:

(a) The means

{\bar{x}}_{j} = \frac{\sum_{i = 1}^{n} w_{i} x_{i j}}{\sum_{i = 1}^{n} w_{i}} j = 1, \dots, p

(b) The variance-covariance matrix

C_{j k} = \frac{\sum_{i = 1}^{n} w_{i} (x_{i j} - {\bar{x}}_{j}) (x_{i k} - {\bar{x}}_{k})}{\sum_{i = 1}^{n} w_{i} - 1} j, k = 1, \dots, p

s_{j} = \sqrt{C_{j j}} j = 1, \dots, p

(d) The Pearson product-moment correlation coefficients

R_{j k} = \frac{C_{j k}}{\sqrt{C_{j j} C_{k k}}} j, k = 1, \dots, p

where

x_{i j}

is the value of the

i

th observation on the

j

th variable and

w_{i}

is the weight for the

i

th observation which will be 1 in the unweighted case.

Note that the denominator for the variance-covariance is

\sum_{i = 1}^{n} w_{i} - 1

, so the weights should be scaled so that the sum of weights reflects the true sample size.

References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag

West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

Error Indicators and Warnings

Note: g02bx may return useful information for one or more of the following detected errors or warnings.

Errors or warnings detected by the method:

Some error messages may refer to parameters that are dropped from this interface (LDX, LDV) In these cases, an error in another parameter has usually caused an incorrect value to be inferred.

$ifail = 1$

On entry,	$m < 1$ ,
or	$n \leq 1$ ,

$ifail = 2$

On entry,

weight \neq "U"

"V"

"W"

$ifail = 3$

On entry,

weight = "W"

"V"

and a value of

wt < 0.0

$ifail = 4$: $weight = "W"$ and the sum of weights is not greater than $1.0$ , or $weight = "V"$ and fewer than $2$ observations have nonzero weights.

$ifail = 5$: A variable has a zero variance. In this case v and std are returned as calculated but r will contain zero for any correlation involving a variable with zero variance.

$ifail = -9000$: An error occured, see message report.
$ifail = -6000$: Invalid Parameters $〈value〉$
$ifail = -4000$: Invalid dimension for array $〈value〉$
$ifail = -8000$: Negative dimension for array $〈value〉$
$ifail = -6000$: Invalid Parameters $〈value〉$

Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

Parallelism and Performance

None.

Further Comments

None.

Example

The data are some of the results from 1988 Olympic Decathlon. They are the times (in seconds) for the 100m and 400m races and the distances (in metres) for the long jump, high jump and shot. Twenty observations are input and the correlation matrix is computed and printed.

Example program (C#): g02bxe.cs

Example program data: g02bxe.d

Example program results: g02bxe.r