NAG Library Function Document
nag_sum_sqs (g02buc)
1 Purpose
nag_sum_sqs (g02buc) calculates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations from the mean, in a single pass for a set of data. The data may be weighted.
2 Specification
#include <nag.h> |
#include <nagg02.h> |
void |
nag_sum_sqs (Nag_OrderType order,
Nag_SumSquare mean,
Integer n,
Integer m,
const double x[],
Integer pdx,
const double wt[],
double *sw,
double wmean[],
double c[],
NagError *fail) |
|
3 Description
nag_sum_sqs (g02buc) is an adaptation of West's WV2 algorithm; see
West (1979). This function calculates the (optionally weighted) sample means and (optionally weighted) sums of squares and cross-products or sums of squares and cross-products of deviations from the (weighted) mean for a sample of
observations on
variables
, for
. The algorithm makes a single pass through the data.
For the first
observations let the mean of the
th variable be
, the cross-product about the mean for the
th and
th variables be
and the sum of weights be
. These are updated by the
th observation,
, for
, with weight
as follows:
and
The algorithm is initialized by taking , the first observation, and .
For the unweighted case and for all .
Note that only the upper triangle of the matrix is calculated and returned packed by column.
4 References
Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555
5 Arguments
- 1:
order – Nag_OrderTypeInput
-
On entry: the
order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by
. See
Section 3.2.1.3 in the Essential Introduction for a more detailed explanation of the use of this argument.
Constraint:
or .
- 2:
mean – Nag_SumSquareInput
On entry: indicates whether nag_sum_sqs (g02buc) is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
- The sums of squares and cross-products of deviations about the mean are calculated.
- The sums of squares and cross-products are calculated.
Constraint:
or .
- 3:
n – IntegerInput
On entry: , the number of observations in the dataset.
Constraint:
.
- 4:
m – IntegerInput
On entry: , the number of variables.
Constraint:
.
- 5:
x[] – const doubleInput
-
Note: the dimension,
dim, of the array
x
must be at least
- when ;
- when .
Where
appears in this document, it refers to the array element
- when ;
- when .
On entry: must contain the th observation on the th variable, for and .
- 6:
pdx – IntegerInput
-
On entry: the stride separating row or column elements (depending on the value of
order) in the array
x.
Constraints:
- if ,
;
- if , .
- 7:
wt[] – const doubleInput
-
Note: the dimension,
dim, of the array
wt
must be at least
.
On entry: the optional weights of each observation. If weights are not provided then
wt must be set to
NULL, otherwise
must contain the weight for the
th observation.
Constraint:
if , , for .
- 8:
sw – double *Output
On exit: the sum of weights.
If
,
sw contains the number of observations,
.
- 9:
wmean[m] – doubleOutput
On exit: the sample means. contains the mean for the th variable.
- 10:
c[] – doubleOutput
On exit: the cross-products.
If
,
c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products of deviations about the mean.
If
,
c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products.
These are stored packed by columns, i.e., the cross-product between the th and th variable, , is stored in .
- 11:
fail – NagError *Input/Output
-
The NAG error argument (see
Section 3.6 in the Essential Introduction).
6 Error Indicators and Warnings
- NE_ALLOC_FAIL
-
Dynamic memory allocation failed.
- NE_BAD_PARAM
-
On entry, argument had an illegal value.
- NE_INT
-
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INT_2
-
On entry, and .
Constraint: .
On entry, and .
Constraint: .
- NE_INTERNAL_ERROR
-
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
- NE_REAL_ARRAY_ELEM_CONS
-
On entry, .
7 Accuracy
For a detailed discussion of the accuracy of this algorithm see
Chan et al. (1982) or
West (1979).
8 Parallelism and Performance
Not applicable.
nag_cov_to_corr (g02bwc) may be used to calculate the correlation coefficients from the cross-products of deviations about the mean. The cross-products of deviations about the mean may be scaled
to give a variance-covariance matrix.
The means and cross-products produced by nag_sum_sqs (g02buc) may be updated by adding or removing observations using
nag_sum_sqs_update (g02btc).
10 Example
A program to calculate the means, the required sums of squares and cross-products matrix, and the variance matrix for a set of observations of variables.
10.1 Program Text
Program Text (g02buce.c)
10.2 Program Data
Program Data (g02buce.d)
10.3 Program Results
Program Results (g02buce.r)