NAG Library Routine Document
G02BXF
1 Purpose
G02BXF calculates the sample means, the standard deviations, the variance-covariance matrix, and the matrix of Pearson product-moment correlation coefficients for a set of data. Weights may be used.
2 Specification
SUBROUTINE G02BXF ( |
WEIGHT, N, M, X, LDX, WT, XBAR, STD, V, LDV, R, IFAIL) |
INTEGER |
N, M, LDX, LDV, IFAIL |
REAL (KIND=nag_wp) |
X(LDX,M), WT(*), XBAR(M), STD(M), V(LDV,M), R(LDV,M) |
CHARACTER(1) |
WEIGHT |
|
3 Description
For
observations on
variables the one-pass algorithm of
West (1979) as implemented in
G02BUF is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for
selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
(b) The variance-covariance matrix
(c) The standard deviations
(d) The Pearson product-moment correlation coefficients
where
is the value of the
th observation on the
th variable and
is the weight for the
th observation which will be 1 in the unweighted case.
Note that the denominator for the variance-covariance is , so the weights should be scaled so that the sum of weights reflects the true sample size.
4 References
Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555
5 Parameters
- 1: – CHARACTER(1)Input
-
On entry: indicates whether weights are to be used.
- Weights are not used and unit weights are assumed.
- or
- Weights are used and must be supplied in WT. The only difference between or is in computing the variance. If the divisor for the variance is the sum of the weights minus one and if the divisor is the number of observations with nonzero weights minus one. The former is useful if the weights represent the frequency of the observed values.
Constraint:
, or .
- 2: – INTEGERInput
-
On entry: the number of data observations in the sample.
Constraint:
.
- 3: – INTEGERInput
-
On entry: the number of variables.
Constraint:
.
- 4: – REAL (KIND=nag_wp) arrayInput
-
On entry: must contain the th observation for the th variable, for and .
- 5: – INTEGERInput
-
On entry: the first dimension of the array
X as declared in the (sub)program from which G02BXF is called.
Constraint:
.
- 6: – REAL (KIND=nag_wp) arrayInput
-
Note: the dimension of the array
WT
must be at least
if
or
, and at least
otherwise.
On entry:
, the optional frequency weighting for each observation, with
. Usually
will be an integral value corresponding to the number of observations associated with the
th data value, or zero if the
th data value is to be ignored. If
,
is set to
for all
and
WT is not referenced.
Constraint:
if or , , , for .
- 7: – REAL (KIND=nag_wp) arrayOutput
-
On exit: the sample means. contains the mean of the th variable.
- 8: – REAL (KIND=nag_wp) arrayOutput
-
On exit: the standard deviations. contains the standard deviation for the th variable.
- 9: – REAL (KIND=nag_wp) arrayOutput
-
On exit: the variance-covariance matrix.
contains the covariance between variables and , for and .
- 10: – INTEGERInput
-
On entry: the first dimension of the arrays
R and
V as declared in the (sub)program from which G02BXF is called.
Constraint:
.
- 11: – REAL (KIND=nag_wp) arrayOutput
-
On exit: the matrix of Pearson product-moment correlation coefficients. contains the correlation coefficient between variables and .
- 12: – INTEGERInput/Output
-
On entry:
IFAIL must be set to
,
. If you are unfamiliar with this parameter you should refer to
Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
is recommended. If the output of error messages is undesirable, then the value
is recommended. Otherwise, because for this routine the values of the output parameters may be useful even if
on exit, the recommended value is
.
When the value is used it is essential to test the value of IFAIL on exit.
On exit:
unless the routine detects an error or a warning has been flagged (see
Section 6).
6 Error Indicators and Warnings
If on entry
or
, explanatory error messages are output on the current error message unit (as defined by
X04AAF).
Note: G02BXF may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the routine:
-
On entry, | , |
or | , |
or | , |
or | . |
-
On entry, | , or . |
-
On entry, | or and a value of . |
-
and the sum of weights is not greater than , or and fewer than observations have nonzero weights.
-
A variable has a zero variance. In this case
V and
STD are returned as calculated but
R will contain zero for any correlation involving a variable with zero variance.
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 3.8 in the Essential Introduction for further information.
Your licence key may have expired or may not have been installed correctly.
See
Section 3.7 in the Essential Introduction for further information.
Dynamic memory allocation failed.
See
Section 3.6 in the Essential Introduction for further information.
7 Accuracy
For a discussion of the accuracy of the one pass algorithm see
Chan et al. (1982) and
West (1979).
8 Parallelism and Performance
G02BXF is not threaded by NAG in any implementation.
G02BXF makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementation-specific information.
None.
10 Example
The data are some of the results from 1988 Olympic Decathlon. They are the times (in seconds) for the 100m and 400m races and the distances (in metres) for the long jump, high jump and shot. Twenty observations are input and the correlation matrix is computed and printed.
10.1 Program Text
Program Text (g02bxfe.f90)
10.2 Program Data
Program Data (g02bxfe.d)
10.3 Program Results
Program Results (g02bxfe.r)