NAG Library Function Document

nag_sum_sqs (g02buc)

+− Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

+− 10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1 Purpose

nag_sum_sqs (g02buc) calculates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations from the mean, in a single pass for a set of data. The data may be weighted.

2 Specification

#include <nag.h>

#include <nagg02.h>

void	nag_sum_sqs (Nag_OrderType order, Nag_SumSquare mean, Integer n, Integer m, const double x[], Integer pdx, const double wt[], double sw, double wmean[], double c[], NagError fail)

3 Description

nag_sum_sqs (g02buc) is an adaptation of West's WV2 algorithm; see West (1979). This function calculates the (optionally weighted) sample means and (optionally weighted) sums of squares and cross-products or sums of squares and cross-products of deviations from the (weighted) mean for a sample of

n

observations on

m

variables

X_{j}

, for

j = 1, 2, \dots, m

. The algorithm makes a single pass through the data.

For the first

i - 1

observations let the mean of the

j

th variable be

{\bar{x}}_{j} (i - 1)

, the cross-product about the mean for the

j

th and

k

th variables be

c_{j k} (i - 1)

and the sum of weights be

W_{i - 1}

. These are updated by the

i

th observation,

x_{i j}

, for

j = 1, 2, \dots, m

, with weight

w_{i}

as follows:

\begin{matrix} W_{i} = W_{i - 1} + w_{i} \\ {\bar{x}}_{j} (i) = {\bar{x}}_{j} (i - 1) + \frac{w_{i}}{W_{i}} (x_{j} - {\bar{x}}_{j} (i - 1)), j = 1, 2, \dots, m \end{matrix}

and

c_{j k} (i) = c_{j k} (i - 1) + \frac{w_{i}}{W_{i}} (x_{j} - {\bar{x}}_{j} (i - 1)) (x_{k} - {\bar{x}}_{k} (i - 1)) W_{i - 1}, j = 1, 2, \dots, m ​ and ​ k = j, j + 1, \dots, m .

The algorithm is initialized by taking

{\bar{x}}_{j} (1) = x_{1 j}

, the first observation, and

c_{i j} (1) = 0.0

For the unweighted case

w_{i} = 1

and

W_{i} = i

for all

i

Note that only the upper triangle of the matrix is calculated and returned packed by column.

4 References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag

West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

5 Arguments

1: order – Nag_OrderTypeInput

On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by

order = Nag_RowMajor

. See Section 3.2.1.3 in the Essential Introduction for a more detailed explanation of the use of this argument.

Constraint:

order = Nag_RowMajor

Nag_ColMajor

2: mean – Nag_SumSquareInput

On entry: indicates whether nag_sum_sqs (g02buc) is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.

$mean = Nag_AboutMean$: The sums of squares and cross-products of deviations about the mean are calculated.
$mean = Nag_AboutZero$: The sums of squares and cross-products are calculated.

Constraint:

mean = Nag_AboutMean

Nag_AboutZero

3: n – IntegerInput

On entry:

n

, the number of observations in the dataset.

Constraint:

n \geq 1

4: m – IntegerInput

On entry:

m

, the number of variables.

Constraint:

m \geq 1

5: x[ $\dim$ ] – const doubleInput

Note: the dimension, dim, of the array x must be at least

$\max (1, pdx \times m)$ when $order = Nag_ColMajor$ ;
$\max (1, n \times pdx)$ when $order = Nag_RowMajor$ .

Where

X (i, j)

appears in this document, it refers to the array element

$x [(j - 1) \times pdx + i - 1]$ when $order = Nag_ColMajor$ ;
$x [(i - 1) \times pdx + j - 1]$ when $order = Nag_RowMajor$ .

On entry:

X (i, j)

must contain the

i

th observation on the

j

th variable, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m

6: pdx – IntegerInput

On entry: the stride separating row or column elements (depending on the value of order) in the array x.

Constraints:

if $order = Nag_ColMajor$ , $pdx \geq n$ ;
if $order = Nag_RowMajor$ , $pdx \geq m$ .

7: wt[ $\dim$ ] – const doubleInput

Note: the dimension, dim, of the array wt must be at least

n

On entry: the optional weights of each observation. If weights are not provided then wt must be set to NULL, otherwise

wt [i - 1]

must contain the weight for the

i

th observation.

Constraint: if

wt is not NULL

wt [i - 1] \geq 0.0

, for

i = 1, 2, \dots, n

8: sw – double *Output

On exit: the sum of weights.

wt is NULL

, sw contains the number of observations,

n

9: wmean[m] – doubleOutput

On exit: the sample means.

wmean [j - 1]

contains the mean for the

j

th variable.

10: c[ $(m \times m + m) / 2$ ] – doubleOutput

On exit: the cross-products.

mean = Nag_AboutMean

, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products of deviations about the mean.

mean = Nag_AboutZero

, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products.

These are stored packed by columns, i.e., the cross-product between the

j

th and

k

th variable,

k \geq j

, is stored in

c [k \times (k - 1) / 2 + j - 1]

11: fail – NagError *Input/Output

The NAG error argument (see Section 3.6 in the Essential Introduction).

6 Error Indicators and Warnings

NE_ALLOC_FAIL: Dynamic memory allocation failed.
NE_BAD_PARAM: On entry, argument $⟨value⟩$ had an illegal value.
NE_INT: On entry, $m = ⟨value⟩$ .
Constraint: $m \geq 1$ .
On entry, $n = ⟨value⟩$ .
Constraint: $n \geq 1$ .
On entry, $pdx = ⟨value⟩$ .
Constraint: $pdx > 0$ .
NE_INT_2: On entry, $pdx = ⟨value⟩$ and $m = ⟨value⟩$ .
Constraint: $pdx \geq m$ .
On entry, $pdx = ⟨value⟩$ and $n = ⟨value⟩$ .
Constraint: $pdx \geq n$ .
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL_ARRAY_ELEM_CONS: On entry, $wt [⟨value⟩] < 0.0$ .

7 Accuracy

For a detailed discussion of the accuracy of this algorithm see Chan et al. (1982) or West (1979).

8 Parallelism and Performance

Not applicable.

9 Further Comments

nag_cov_to_corr (g02bwc) may be used to calculate the correlation coefficients from the cross-products of deviations about the mean. The cross-products of deviations about the mean may be scaled to give a variance-covariance matrix.

The means and cross-products produced by nag_sum_sqs (g02buc) may be updated by adding or removing observations using nag_sum_sqs_update (g02btc).

10 Example

A program to calculate the means, the required sums of squares and cross-products matrix, and the variance matrix for a set of

3

observations of

3

variables.

NAG Library Function Documentnag_sum_sqs (g02buc)

+− Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG Library Function Document

nag_sum_sqs (g02buc)