naginterfaces.library.correg.corrmat¶

naginterfaces.library.correg.corrmat(x, nonzwt='W', wt=None)[source]¶

corrmat calculates the sample means, the standard deviations, the variance-covariance matrix, and the matrix of Pearson product-moment correlation coefficients for a set of data. Weights may be used.

For full information please refer to the NAG Library document for g02bx

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g02/g02bxf.html

Parameters

xfloat, array-like, shape $(n, m)$: $x [i - 1, j - 1]$ must contain the $i$ th observation for the $j$ th variable, for $j = 1, 2, \dots, m$ , for $i = 1, 2, \dots, n$ .
nonzwtstr, length 1, optional: The variance calculation uses a divisor which is either the number of weights or the number of nonzero weights.
wtNone or float, array-like, shape $(n)$ , optional: $w$ , the optional frequency weighting for each observation, with $w t [i - 1] = w_{i}$ . Usually $w_{i}$ will be an integral value corresponding to the number of observations associated with the $i$ th data value, or zero if the $i$ th data value is to be ignored. If $w t is N o n e$ , $w_{i}$ is set to $1$ for all $i$ .

Returns

xbarfloat, ndarray, shape $(m)$: The sample means. $x b a r [j - 1]$ contains the mean of the $j$ th variable.
stdfloat, ndarray, shape $(m)$: The standard deviations. $s t d [j - 1]$ contains the standard deviation for the $j$ th variable.
vfloat, ndarray, shape $(m, m)$: The variance-covariance matrix. $v [j - 1, k - 1]$ contains the covariance between variables $j$ and $k$ , for $k = 1, 2, \dots, m$ , for $j = 1, 2, \dots, m$ .
rfloat, ndarray, shape $(m, m)$: The matrix of Pearson product-moment correlation coefficients. $r [j - 1, k - 1]$ contains the correlation coefficient between variables $j$ and $k$ .

Raises

NagValueError

(errno $1$ )

On entry, $m = ⟨ v a l u e ⟩$ .

Constraint: $m \geq 1$ .

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n > 1$ .

(errno $2$ )

On entry, $weight = ⟨ v a l u e ⟩$ .

Constraint: $weight ='U'$ , $'V'$ or $'W'$

(errno $3$ )

On entry, at least one value of $w t$ is negative.

Constraint: $w t [i - 1] \geq 0$ , for $i = 1, 2, \dots, n$ .

(errno $4$ )

On entry, $⟨ v a l u e ⟩$ observations have nonzero weight.

Constraint: at least two observations must have a nonzero weight.

(errno $4$ )

On entry, Sum of the weights is $⟨ v a l u e ⟩$ .

Constraint: Sum of the weights must be greater than $1$ .

Warns

NagAlgorithmicWarning

(errno $5$ ): A variable has a zero variance. In this case $v$ and $s t d$ are returned as calculated but $r$ will contain zero for any correlation involving a variable with zero variance.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

For $n$ observations on $m$ variables the one-pass algorithm of West (1979) as implemented in ssqmat() is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for $p$ selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:

The means

{¯ x}_{j} = \frac{\sum_{i = 1}^{n} w_{i} x_{i j}}{\sum_{i = 1}^{n} w_{i}} j = 1, \dots, p

The variance-covariance matrix

C_{j k} = \frac{\sum_{i = 1}^{n} w_{i} (x_{i j} - {¯ x}_{j}) (x_{i k} - {¯ x}_{k})}{\sum_{i = 1}^{n} w_{i} - 1} j, k = 1, \dots, p

The standard deviations

s_{j} = \sqrt{C_{j j}} j = 1, \dots, p

The Pearson product-moment correlation coefficients

R_{j k} = \frac{C_{j k}}{\sqrt{C_{j j} C_{k k}}} j, k = 1, \dots, p

where $x_{i j}$ is the value of the $i$ th observation on the $j$ th variable and $w_{i}$ is the weight for the $i$ th observation which will be $1$ in the unweighted case.

Note that the denominator for the variance-covariance is $\sum_{i = 1}^{n} w_{i} - 1$ , so the weights should be scaled so that the sum of weights reflects the true sample size.

References

Chan, T F, Golub, G H and Leveque, R J, 1982, Updating Formulae and a Pairwise Algorithm for Computing Sample Variances, Compstat, Physica-Verlag

West, D H D, 1979, Updating mean and variance estimates: An improved method, Comm. ACM (22), 532–555

NAG and Python

Return to Front

naginterfaces.library.correg.corrmat¶