g02bxf calculates the sample means, the standard deviations, the variance-covariance matrix, and the matrix of Pearson product-moment correlation coefficients for a set of data. Weights may be used.
The routine may be called by the names g02bxf or nagf_correg_corrmat.
3Description
For observations on variables the one-pass algorithm of West (1979) as implemented in g02buf is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
(b) The variance-covariance matrix
(c) The standard deviations
(d) The Pearson product-moment correlation coefficients
where is the value of the th observation on the th variable and is the weight for the th observation which will be in the unweighted case.
Note that the denominator for the variance-covariance is , so the weights should be scaled so that the sum of weights reflects the true sample size.
4References
Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM22 532–555
5Arguments
1: – Character(1)Input
On entry: indicates whether weights are to be used.
Weights are not used and unit weights are assumed.
or
Weights are used and must be supplied in wt. The only difference between or is in computing the variance. If the divisor for the variance is the sum of the weights minus one and if the divisor is the number of observations with nonzero weights minus one. The former is useful if the weights represent the frequency of the observed values.
Constraint:
, or .
2: – IntegerInput
On entry: the number of data observations in the sample.
Constraint:
.
3: – IntegerInput
On entry: the number of variables.
Constraint:
.
4: – Real (Kind=nag_wp) arrayInput
On entry: must contain the th observation for the th variable, for and .
5: – IntegerInput
On entry: the first dimension of the array x as declared in the (sub)program from which g02bxf is called.
Constraint:
.
6: – Real (Kind=nag_wp) arrayInput
Note: the dimension of the array wt
must be at least
if or .
On entry: , the optional frequency weighting for each observation, with . Usually will be an integral value corresponding to the number of observations associated with the th data value, or zero if the th data value is to be ignored. If , is set to for all and wt is not referenced.
Constraints:
if or ,
, for ;
.
7: – Real (Kind=nag_wp) arrayOutput
On exit: the sample means. contains the mean of the th variable.
8: – Real (Kind=nag_wp) arrayOutput
On exit: the standard deviations. contains the standard deviation for the th variable.
9: – Real (Kind=nag_wp) arrayOutput
On exit: the variance-covariance matrix.
contains the covariance between variables and , for and .
10: – IntegerInput
On entry: the first dimension of the array v and the first dimension of the array r as declared in the (sub)program from which g02bxf is called.
Constraint:
.
11: – Real (Kind=nag_wp) arrayOutput
On exit: the matrix of Pearson product-moment correlation coefficients. contains the correlation coefficient between variables and .
12: – IntegerInput/Output
On entry: ifail must be set to , or to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of means that an error message is printed while a value of means that it is not.
If halting is not appropriate, the value or is recommended. If message printing is undesirable, then the value is recommended. Otherwise, the value is recommended since useful values can be provided in some output arguments even when on exit. When the value or is used it is essential to test the value of ifail on exit.
On exit: unless the routine detects an error or a warning has been flagged (see Section 6).
6Error Indicators and Warnings
If on entry or , explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
Note: in some cases g02bxf may return useful information.
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: , or
On entry, at least one value of wt is negative.
Constraint: , for .
On entry, observations have nonzero weight.
Constraint: at least two observations must have a nonzero weight.
On entry, Sum of the weights is .
Constraint: Sum of the weights must be greater than .
A variable has a zero variance. In this case v and std are returned as calculated but r will contain zero for any correlation involving a variable with zero variance.
An unexpected error has been triggered by this routine. Please
contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.
Background information to multithreading can be found in the Multithreading documentation.
g02bxf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.
9Further Comments
None.
10Example
The data are some of the results from 1988 Olympic Decathlon. They are the times (in seconds) for the 100m and 400m races and the distances (in metres) for the long jump, high jump and shot. Twenty observations are input and the correlation matrix is computed and printed.