PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_correg_corrmat (g02bx)
Purpose
nag_correg_corrmat (g02bx) calculates the sample means, the standard deviations, the variance-covariance matrix, and the matrix of Pearson product-moment correlation coefficients for a set of data. Weights may be used.
Syntax
[
xbar,
std,
v,
r,
ifail] = g02bx(
x, 'nonzwt',
nonzwt, 'n',
n, 'm',
m, 'wt',
wt)
[
xbar,
std,
v,
r,
ifail] = nag_correg_corrmat(
x, 'nonzwt',
nonzwt, 'n',
n, 'm',
m, 'wt',
wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 23: |
nonzwt was added to the interface; weight was removed from the interface; wt was made optional |
At Mark 22: |
n was made optional |
Description
For
observations on
variables the one-pass algorithm of
West (1979) as implemented in
nag_correg_ssqmat (g02bu) is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for
selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
(b) The variance-covariance matrix
(c) The standard deviations
(d) The Pearson product-moment correlation coefficients
where
is the value of the
th observation on the
th variable and
is the weight for the
th observation which will be 1 in the unweighted case.
Note that the denominator for the variance-covariance is , so the weights should be scaled so that the sum of weights reflects the true sample size.
References
Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555
Parameters
Compulsory Input Parameters
- 1:
– double array
-
ldx, the first dimension of the array, must satisfy the constraint
.
must contain the th observation for the th variable, for and .
Optional Input Parameters
- 1:
– string (length ≥ 1)
Default:
The variance calculation uses a divisor which is either the number of weights or the number of nonzero weights.
Constraint:
or .
- 2:
– int64int32nag_int scalar
-
Default:
the first dimension of the array
x.
The number of data observations in the sample.
Constraint:
.
- 3:
– int64int32nag_int scalar
-
Default:
the second dimension of the array
x.
The number of variables.
Constraint:
.
- 4:
– double array
-
The dimension of the array
wt
must be at least
if
or
, and at least
otherwise
, the optional frequency weighting for each observation, with
. Usually
will be an integral value corresponding to the number of observations associated with the
th data value, or zero if the
th data value is to be ignored. If
,
is set to
for all
and
wt is not referenced.
Constraint:
if or , , , for .
Output Parameters
- 1:
– double array
-
The sample means. contains the mean of the th variable.
- 2:
– double array
-
The standard deviations. contains the standard deviation for the th variable.
- 3:
– double array
-
The variance-covariance matrix.
contains the covariance between variables and , for and .
- 4:
– double array
-
The matrix of Pearson product-moment correlation coefficients. contains the correlation coefficient between variables and .
- 5:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Note: nag_correg_corrmat (g02bx) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:
Cases prefixed with W are classified as warnings and
do not generate an error of type NAG:error_n. See nag_issue_warnings.
-
-
On entry, | , |
or | , |
or | , |
or | . |
-
-
On entry, | , or . |
-
-
On entry, | or and a value of . |
-
-
and the sum of weights is not greater than , or and fewer than observations have nonzero weights.
- W
-
A variable has a zero variance. In this case
v and
std are returned as calculated but
r will contain zero for any correlation involving a variable with zero variance.
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
For a discussion of the accuracy of the one pass algorithm see
Chan et al. (1982) and
West (1979).
Further Comments
None.
Example
The data are some of the results from 1988 Olympic Decathlon. They are the times (in seconds) for the 100m and 400m races and the distances (in metres) for the long jump, high jump and shot. Twenty observations are input and the correlation matrix is computed and printed.
Open in the MATLAB editor:
g02bx_example
function g02bx_example
fprintf('g02bx example results\n\n');
x = [11.25 48.9 7.43 2.270 15.48;
10.87 47.7 7.45 1.971 14.97;
11.18 48.2 7.44 1.979 14.20;
10.62 49.0 7.38 2.026 15.02;
11.02 47.4 7.43 1.974 12.92;
10.83 48.3 7.72 2.124 13.58;
11.18 49.3 7.05 2.064 14.12;
11.05 48.2 6.95 2.001 15.34;
11.15 49.1 7.12 2.035 14.52;
11.23 48.6 7.28 1.970 15.25;
10.94 49.9 7.45 1.974 15.34;
11.18 49.0 7.34 1.942 14.48;
11.02 48.2 7.29 2.063 12.92;
10.99 47.8 7.37 1.973 13.61;
11.03 48.9 7.45 1.974 14.20;
11.09 48.8 7.08 2.039 14.51;
11.46 51.2 6.75 2.008 16.07;
11.57 49.8 7.00 1.944 16.60;
11.07 47.9 7.04 1.947 13.41;
10.89 49.6 7.07 1.798 15.84];
[xbar, std, v, r, ifail] = g02bx(x);
disp(' Means');
disp(xbar');
disp(' Standard deviations');
disp(std');
mtitle = ' Correlation matrix:';
matrix = 'Upper';
diag = 'Non-unit';
[ifail] = x04ca( ...
matrix, diag, r, mtitle);
g02bx example results
Means
11.0810 48.7900 7.2545 2.0038 14.6190
Standard deviations
0.2132 0.9002 0.2349 0.0902 1.0249
Correlation matrix:
1 2 3 4 5
1 1.0000 0.4416 -0.5427 0.0696 0.3912
2 1.0000 -0.5058 -0.0678 0.7057
3 1.0000 0.2768 -0.4352
4 1.0000 -0.1494
5 1.0000
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015