nag_correg_coeffs_pearson_subset (g02bg) computes means and standard deviations, sums of squares and cross-products of deviations from means, and Pearson product-moment correlation coefficients for selected variables.

Syntax

[xbar, std, ssp, r, ifail] = g02bg(x, kvar, 'n', n, 'm', m, 'nvars', nvars)

[xbar, std, ssp, r, ifail] = nag_correg_coeffs_pearson_subset(x, kvar, 'n', n, 'm', m, 'nvars', nvars)

Note: the interface to this routine has changed since earlier releases of the toolbox:

At Mark 22:

n was made optional

Description

The input data consist of

n

observations for each of

m

variables, given as an array

[x_{i j}], i = 1, 2, \dots, n (n \geq 2), j = 1, 2, \dots, m (m \geq 2),

where

x_{i j}

is the

i

th observation on the

j

th variable, together with the subset of these variables,

v_{1}, v_{2}, \dots, v_{p}

, for which information is required.

The quantities calculated are:

(a)

Means:

{\bar{x}}_{j} = \frac{1}{n} \sum_{i = 1}^{n} x_{i j}, j = v_{1}, v_{2}, \dots, v_{p} .

(b)

Standard deviations:

s_{j} = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(x_{i j} - {\bar{x}}_{j})}^{2}}, j = v_{1}, v_{2}, \dots, v_{p} .

(c)

Sums of squares and cross-products of deviations from zero:

S_{j k} = \sum_{i = 1}^{n} (x_{i j} - {\bar{x}}_{j}) (x_{i k} - {\bar{x}}_{k}), j, k = v_{1}, v_{2}, \dots, v_{p} .

(d)

Pearson product-moment correlation coefficients:

R_{j k} = \frac{S_{j k}}{\sqrt{S_{j j} S_{k k}}}, j, k = v_{1}, v_{2}, \dots v_{p} .

S_{j j}

S_{k k}

is zero,

R_{j k}

is set to zero.

References

None.

Parameters

Compulsory Input Parameters

1: $x (ldx, m)$ – double array: ldx, the first dimension of the array, must satisfy the constraint $ldx \geq n$ .
$x (i, j)$ must be set to $x_{i j}$ , the value of the $i$ th observation on the $j$ th variable, for $i = 1, 2, \dots, n$ and $j = 1, 2, \dots, m$ .
2: $kvar (nvars)$ – int64int32nag_int array: $kvar (j)$ must be set to the column number in x of the $j$ th variable for which information is required, for $j = 1, 2, \dots, p$ .

Constraint: $1 \leq kvar (j) \leq m$ , for $j = 1, 2, \dots, p$ .

Optional Input Parameters

1: $n$ – int64int32nag_int scalar: Default: the first dimension of the array x.
$n$ , the number of observations or cases.

Constraint: $n \geq 2$ .
2: $m$ – int64int32nag_int scalar: Default: the second dimension of the array x.
$m$ , the number of variables.

Constraint: $m \geq 2$ .
3: $nvars$ – int64int32nag_int scalar: Default: the dimension of the array kvar.
$p$ , the number of variables for which information is required.

Constraint: $2 \leq nvars \leq m$ .

Output Parameters

1: $xbar (nvars)$ – double array: The mean value, ${\bar{x}}_{j}$ , of the variable specified in $kvar (j)$ , for $j = 1, 2, \dots, p$ .
2: $std (nvars)$ – double array: The standard deviation, $s_{j}$ , of the variable specified in $kvar (j)$ , for $j = 1, 2, \dots, p$ .
3: $ssp (ldssp, nvars)$ – double array: $ssp (j, k)$ is the cross-product of deviations, $S_{j k}$ , for the variables specified in $kvar (j)$ and $kvar (k)$ , for $j = 1, 2, \dots, p$ and $k = 1, 2, \dots, p$ .
4: $r (ldr, nvars)$ – double array: $r (j, k)$ is the product-moment correlation coefficient, $R_{j k}$ , between the variables specified in $kvar (j)$ and $kvar (k)$ , for $j = 1, 2, \dots, p$ and $k = 1, 2, \dots, p$ .
5: $ifail$ – int64int32nag_int scalar: $ifail = 0$ unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:

$ifail = 1$

On entry,

n < 2

$ifail = 2$

On entry,	$nvars < 2$ ,
or	$nvars > m$ .

$ifail = 3$

On entry,	$ldx < n$ ,
or	$ldssp < nvars$ ,
or	$ldr < nvars$ .

$ifail = 4$

On entry,	$kvar (j) < 1$ ,
or	$kvar (j) > m$ for some $j = 1, 2, \dots, nvars$ .

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.

$ifail = - 999$: Dynamic memory allocation failed.

Accuracy

nag_correg_coeffs_pearson_subset (g02bg) does not use additional precision arithmetic for the accumulation of scalar products, so there may be a loss of significant figures for large

n

Further Comments

The time taken by nag_correg_coeffs_pearson_subset (g02bg) depends on

n

and

p

The function uses a two pass algorithm.

Example

This example reads in a set of data consisting of five observations on each of four variables. The means, standard deviations, sums of squares and cross-products of deviations from means, and Pearson product-moment correlation coefficients for the fourth, first and second variables are then calculated and printed.

Open in the MATLAB editor: g02bg_example

function g02bg_example


fprintf('g02bg example results\n\n');

x = [ 3,  3,  1,  2;
      6,  4, -1,  4;
      9,  0,  5,  9;
     12,  2,  0,  0;
     -1,  5,  4, 12];
[n,m] = size(x);
fprintf('Number of variables (columns) = %d\n', m);
fprintf('Number of cases     (rows)    = %d\n\n', n);
disp('Data matrix is:-');
disp(x);

kvar = [int64(4); 1; 2];
nvar = size(kvar,1);

[xbar, std, ssp, r, ifail] = g02bg( ...
                                    x, kvar);

fprintf('Variable   Mean     St. dev.\n');
fprintf('%5d%11.4f%11.4f\n',[double(kvar) xbar(1:nvar) std(1:nvar)]');
fprintf('\nSums of squares and cross-products of deviations\n');
disp(ssp(1:nvar,1:nvar))
fprintf('Correlation coefficients\n');
disp(r(1:nvar,1:nvar));

g02bg example results

Number of variables (columns) = 4
Number of cases     (rows)    = 5

Data matrix is:-
     3     3     1     2
     6     4    -1     4
     9     0     5     9
    12     2     0     0
    -1     5     4    12

Variable   Mean     St. dev.
    4     5.4000     4.9800
    1     5.8000     5.0695
    2     2.8000     1.9235

Sums of squares and cross-products of deviations
   99.2000  -57.6000    6.4000
  -57.6000  102.8000  -29.2000
    6.4000  -29.2000   14.8000

Correlation coefficients
    1.0000   -0.5704    0.1670
   -0.5704    1.0000   -0.7486
    0.1670   -0.7486    1.0000

PDF version (NAG web site, 64-bit version, 64-bit version)

Chapter Contents

Chapter Introduction

NAG Toolbox