Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_correg_coeffs_pearson_subset (g02bg)

## Purpose

nag_correg_coeffs_pearson_subset (g02bg) computes means and standard deviations, sums of squares and cross-products of deviations from means, and Pearson product-moment correlation coefficients for selected variables.

## Syntax

[xbar, std, ssp, r, ifail] = g02bg(x, kvar, 'n', n, 'm', m, 'nvars', nvars)
[xbar, std, ssp, r, ifail] = nag_correg_coeffs_pearson_subset(x, kvar, 'n', n, 'm', m, 'nvars', nvars)
Note: the interface to this routine has changed since earlier releases of the toolbox:
 At Mark 22: n was made optional

## Description

The input data consist of $n$ observations for each of $m$ variables, given as an array
 $xij, i=1,2,…,nn≥2,j=1,2,…,mm≥2,$
where ${x}_{ij}$ is the $i$th observation on the $j$th variable, together with the subset of these variables, ${v}_{1},{v}_{2},\dots ,{v}_{p}$, for which information is required.
The quantities calculated are:
(a) Means:
 $x-j=1n∑i=1nxij, j=v1,v2,…,vp.$
(b) Standard deviations:
 $sj=1n- 1 ∑i= 1n xij-x-j 2, j=v1,v2,…,vp.$
(c) Sums of squares and cross-products of deviations from zero:
 $Sjk=∑i=1nxij-x-jxik-x-k, j,k=v1,v2,…,vp.$
(d) Pearson product-moment correlation coefficients:
 $Rjk=SjkSjjSkk , j,k=v1,v2,…vp.$
If ${S}_{jj}$ or ${S}_{kk}$ is zero, ${R}_{jk}$ is set to zero.

None.

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{x}\left(\mathit{ldx},{\mathbf{m}}\right)$ – double array
ldx, the first dimension of the array, must satisfy the constraint $\mathit{ldx}\ge {\mathbf{n}}$.
${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must be set to ${x}_{\mathit{i}\mathit{j}}$, the value of the $\mathit{i}$th observation on the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.
2:     $\mathrm{kvar}\left({\mathbf{nvars}}\right)$int64int32nag_int array
${\mathbf{kvar}}\left(\mathit{j}\right)$ must be set to the column number in x of the $\mathit{j}$th variable for which information is required, for $\mathit{j}=1,2,\dots ,p$.
Constraint: $1\le {\mathbf{kvar}}\left(\mathit{j}\right)\le {\mathbf{m}}$, for $\mathit{j}=1,2,\dots ,p$.

### Optional Input Parameters

1:     $\mathrm{n}$int64int32nag_int scalar
Default: the first dimension of the array x.
$n$, the number of observations or cases.
Constraint: ${\mathbf{n}}\ge 2$.
2:     $\mathrm{m}$int64int32nag_int scalar
Default: the second dimension of the array x.
$m$, the number of variables.
Constraint: ${\mathbf{m}}\ge 2$.
3:     $\mathrm{nvars}$int64int32nag_int scalar
Default: the dimension of the array kvar.
$p$, the number of variables for which information is required.
Constraint: $2\le {\mathbf{nvars}}\le {\mathbf{m}}$.

### Output Parameters

1:     $\mathrm{xbar}\left({\mathbf{nvars}}\right)$ – double array
The mean value, ${\stackrel{-}{x}}_{\mathit{j}}$, of the variable specified in ${\mathbf{kvar}}\left(\mathit{j}\right)$, for $\mathit{j}=1,2,\dots ,p$.
2:     $\mathrm{std}\left({\mathbf{nvars}}\right)$ – double array
The standard deviation, ${s}_{\mathit{j}}$, of the variable specified in ${\mathbf{kvar}}\left(\mathit{j}\right)$, for $\mathit{j}=1,2,\dots ,p$.
3:     $\mathrm{ssp}\left(\mathit{ldssp},{\mathbf{nvars}}\right)$ – double array
${\mathbf{ssp}}\left(\mathit{j},\mathit{k}\right)$ is the cross-product of deviations, ${S}_{\mathit{j}\mathit{k}}$, for the variables specified in ${\mathbf{kvar}}\left(\mathit{j}\right)$ and ${\mathbf{kvar}}\left(\mathit{k}\right)$, for $\mathit{j}=1,2,\dots ,p$ and $\mathit{k}=1,2,\dots ,p$.
4:     $\mathrm{r}\left(\mathit{ldr},{\mathbf{nvars}}\right)$ – double array
${\mathbf{r}}\left(\mathit{j},\mathit{k}\right)$ is the product-moment correlation coefficient, ${R}_{\mathit{j}\mathit{k}}$, between the variables specified in ${\mathbf{kvar}}\left(\mathit{j}\right)$ and ${\mathbf{kvar}}\left(\mathit{k}\right)$, for $\mathit{j}=1,2,\dots ,p$ and $\mathit{k}=1,2,\dots ,p$.
5:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{n}}<2$.
${\mathbf{ifail}}=2$
 On entry, ${\mathbf{nvars}}<2$, or ${\mathbf{nvars}}>{\mathbf{m}}$.
${\mathbf{ifail}}=3$
 On entry, $\mathit{ldx}<{\mathbf{n}}$, or $\mathit{ldssp}<{\mathbf{nvars}}$, or $\mathit{ldr}<{\mathbf{nvars}}$.
${\mathbf{ifail}}=4$
 On entry, ${\mathbf{kvar}}\left(j\right)<1$, or ${\mathbf{kvar}}\left(j\right)>{\mathbf{m}}$ for some $j=1,2,\dots ,{\mathbf{nvars}}$.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

nag_correg_coeffs_pearson_subset (g02bg) does not use additional precision arithmetic for the accumulation of scalar products, so there may be a loss of significant figures for large $n$.

The time taken by nag_correg_coeffs_pearson_subset (g02bg) depends on $n$ and $p$.
The function uses a two pass algorithm.

## Example

This example reads in a set of data consisting of five observations on each of four variables. The means, standard deviations, sums of squares and cross-products of deviations from means, and Pearson product-moment correlation coefficients for the fourth, first and second variables are then calculated and printed.
```function g02bg_example

fprintf('g02bg example results\n\n');

x = [ 3,  3,  1,  2;
6,  4, -1,  4;
9,  0,  5,  9;
12,  2,  0,  0;
-1,  5,  4, 12];
[n,m] = size(x);
fprintf('Number of variables (columns) = %d\n', m);
fprintf('Number of cases     (rows)    = %d\n\n', n);
disp('Data matrix is:-');
disp(x);

kvar = [int64(4); 1; 2];
nvar = size(kvar,1);

[xbar, std, ssp, r, ifail] = g02bg( ...
x, kvar);

fprintf('Variable   Mean     St. dev.\n');
fprintf('%5d%11.4f%11.4f\n',[double(kvar) xbar(1:nvar) std(1:nvar)]');
fprintf('\nSums of squares and cross-products of deviations\n');
disp(ssp(1:nvar,1:nvar))
fprintf('Correlation coefficients\n');
disp(r(1:nvar,1:nvar));

```
```g02bg example results

Number of variables (columns) = 4
Number of cases     (rows)    = 5

Data matrix is:-
3     3     1     2
6     4    -1     4
9     0     5     9
12     2     0     0
-1     5     4    12

Variable   Mean     St. dev.
4     5.4000     4.9800
1     5.8000     5.0695
2     2.8000     1.9235

Sums of squares and cross-products of deviations
99.2000  -57.6000    6.4000
-57.6000  102.8000  -29.2000
6.4000  -29.2000   14.8000

Correlation coefficients
1.0000   -0.5704    0.1670
-0.5704    1.0000   -0.7486
0.1670   -0.7486    1.0000

```