PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_correg_ssqmat (g02bu)
Purpose
nag_correg_ssqmat (g02bu) calculates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations from the mean, in a single pass for a set of data. The data may be weighted.
Syntax
[
sw,
wmean,
c,
ifail] = g02bu(
x, 'mean_p',
mean_p, 'n',
n, 'm',
m, 'wt',
wt)
[
sw,
wmean,
c,
ifail] = nag_correg_ssqmat(
x, 'mean_p',
mean_p, 'n',
n, 'm',
m, 'wt',
wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 24: |
mean_p was made optional |
At Mark 22: |
n was made optional |
Description
nag_correg_ssqmat (g02bu) is an adaptation of West's WV2 algorithm; see
West (1979). This function calculates the (optionally weighted) sample means and (optionally weighted) sums of squares and cross-products or sums of squares and cross-products of deviations from the (weighted) mean for a sample of
observations on
variables
, for
. The algorithm makes a single pass through the data.
For the first
observations let the mean of the
th variable be
, the cross-product about the mean for the
th and
th variables be
and the sum of weights be
. These are updated by the
th observation,
, for
, with weight
as follows:
and
The algorithm is initialized by taking , the first observation, and .
For the unweighted case and for all .
Note that only the upper triangle of the matrix is calculated and returned packed by column.
References
Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555
Parameters
Compulsory Input Parameters
- 1:
– double array
-
ldx, the first dimension of the array, must satisfy the constraint
.
must contain the th observation on the th variable, for and .
Optional Input Parameters
- 1:
– string (length ≥ 1)
Default:
Indicates whether
nag_correg_ssqmat (g02bu) is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
- The sums of squares and cross-products of deviations about the mean are calculated.
- The sums of squares and cross-products are calculated.
Constraint:
or .
- 2:
– int64int32nag_int scalar
-
Default:
the first dimension of the array
x.
, the number of observations in the dataset.
Constraint:
.
- 3:
– int64int32nag_int scalar
-
Default:
the second dimension of the array
x.
, the number of variables.
Constraint:
.
- 4:
– double array
-
The dimension of the array
wt
must be at least
if
, and at least
otherwise
The optional weights of each observation.
If
,
wt is not referenced.
If , must contain the weight for the th observation.
Constraint:
if , , for .
Output Parameters
- 1:
– double scalar
-
The sum of weights.
If
,
sw contains the number of observations,
.
- 2:
– double array
-
The sample means. contains the mean for the th variable.
- 3:
– double array
-
The cross-products.
If
,
c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products of deviations about the mean.
If
,
c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products.
These are stored packed by columns, i.e., the cross-product between the th and th variable, , is stored in .
- 4:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
-
-
On entry, | , |
or | , |
or | . |
-
-
On entry, | or . |
-
-
On entry, | or . |
-
-
On entry, | , and a value of . |
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
For a detailed discussion of the accuracy of this algorithm see
Chan et al. (1982) or
West (1979).
Further Comments
nag_correg_ssqmat_to_corrmat (g02bw) may be used to calculate the correlation coefficients from the cross-products of deviations about the mean. The cross-products of deviations about the mean may be scaled
using
to give a variance-covariance matrix.
The means and cross-products produced by
nag_correg_ssqmat (g02bu) may be updated by adding or removing observations using
nag_correg_ssqmat_update (g02bt).
Two sets of means and cross-products, as produced by
nag_correg_ssqmat (g02bu), can be combined using
nag_correg_ssqmat_combine (g02bz).
Example
A program to calculate the means, the required sums of squares and cross-products matrix, and the variance matrix for a set of observations of variables.
Open in the MATLAB editor:
g02bu_example
function g02bu_example
fprintf('g02bu example results\n\n');
wt = [0.1300 1.3070 0.3700];
x = [9.1231 0.9310 0.0009;
3.7011 0.0900 0.0099;
4.5230 0.8870 0.0999];
[m,n] = size(x);
cn = (m*(m+1))/2;
m = int64(m);
[sw, wmean, c, ifail] = g02bu(x', 'wt', wt);
disp('Means');
disp(wmean');
disp('Weights');
disp(wt);
mtitle = 'Sums of squares and cross-products:';
uplo = 'Upper';
diag = 'Non-unit';
[ifail] = x04cc( ...
uplo, diag, m, c, mtitle);
v = c/(sw-1);
fprintf('\n');
mtitle = 'Variance matrix:';
[ifail] = x04cc( ...
uplo, diag, m, v, mtitle);
g02bu example results
Means
1.3299 0.3334 0.9874
Weights
0.1300 1.3070 0.3700
Sums of squares and cross-products:
1 2 3
1 8.7569 3.6978 4.0707
2 1.5905 1.6861
3 1.9297
Variance matrix:
1 2 3
1 10.8512 4.5822 5.0443
2 1.9709 2.0893
3 2.3912
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015