are the fitted values from the model that assumes the effects due to the classification variables are additive, i.e., there is no association. These values are the expected cell frequencies and are given by

f_{i j} = n_{i .} n_{. j} / n .

Under the hypothesis of no association between the two classification variables, both these statistics have, approximately, a

χ^{2}

-distribution with

(c - 1) (r - 1)

degrees of freedom. This distribution is arrived at under the assumption that the expected cell frequencies,

f_{i j}

, are not too small. For a discussion of this point see Everitt (1977). He concludes by saying, ‘... in the majority of cases the chi-square criterion may be used for tables with expectations in excess of

0.5

in the smallest cell’.

In the case of the

2 \times 2

table, i.e.,

c = 2

and

r = 2

, the

χ^{2}

approximation can be improved by using Yates' continuity correction factor. This decreases the absolute value of

(n_{i j} - f_{i j})

\frac{1}{2}

. For

2 \times 2

tables with a small value of

n

the exact probabilities from Fisher's test are computed. These are based on the hypergeometric distribution and are computed using nag_stat_prob_hypergeom (g01bl). A two tail probability is computed as

\min (1, 2 p_{u}, 2 p_{l})

, where

p_{u}

and

p_{l}

are the upper and lower one-tail probabilities from the hypergeometric distribution.

References

Everitt B S (1977) The Analysis of Contingency Tables Chapman and Hall

Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin

Parameters

Compulsory Input Parameters

1: $nrow$ – int64int32nag_int scalar: $r$ , the number of rows in the contingency table.

Constraint: $nrow \geq 2$ .
2: $nobs (ldnobs, ncol)$ – int64int32nag_int array: ldnobs, the first dimension of the array, must satisfy the constraint $ldnobs \geq nrow$ .
The contingency table $nobs (i, j)$ must contain $n_{i j}$ , for $i = 1, 2, \dots, r$ and $j = 1, 2, \dots, c$ .

Constraint: $nobs (i, j) \geq 0$ , for $i = 1, 2, \dots, r$ and $j = 1, 2, \dots, c$ .

Optional Input Parameters

1: $ncol$ – int64int32nag_int scalar: Default: the second dimension of the array nobs.
$c$ , the number of columns in the contingency table.

Constraint: $ncol \geq 2$ .

Output Parameters

1: $expt (ldnobs, ncol)$ – double array: The table of expected values. $expt (i, j)$ contains $f_{i j}$ , for $i = 1, 2, \dots, r$ and $j = 1, 2, \dots, c$ .
2: $chist (ldnobs, ncol)$ – double array: The table of $χ^{2}$ contributions. $chist (i, j)$ contains $\frac{{(n_{i j} - f_{i j})}^{2}}{f_{i j}}$ , for $i = 1, 2, \dots, r$ and $j = 1, 2, \dots, c$ .
3: $prob$ – double scalar: If $c = 2$ , $r = 2$ and $n \leq 40$ then prob contains the two tail significance level for Fisher's exact test, otherwise prob contains the significance level from the Pearson $χ^{2}$ statistic.
4: $chi$ – double scalar: The Pearson $χ^{2}$ statistic.
5: $g$ – double scalar: The likelihood ratio test statistic.
6: $df$ – double scalar: The degrees of freedom for the statistics.
7: $ifail$ – int64int32nag_int scalar: $ifail = 0$ unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Note: nag_contab_chisq (g11aa) may return useful information for one or more of the following detected errors or warnings.

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

$ifail = 1$

On entry,	$nrow < 2$ ,
or	$ncol < 2$ ,
or	$ldnobs < nrow$ .

$ifail = 2$

On entry,

a value in

nobs < 0

, or all values in nobs are zero.

$ifail = 3$

On entry,

2 \times 2

table has a row or column with both values

0

W $ifail = 4$: At least one cell has expected frequency, $f_{i j}$ , $\leq 0.5$ . The $χ^{2}$ approximation may be poor.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.

$ifail = - 999$: Dynamic memory allocation failed.

Accuracy

For the accuracy of the probabilities for Fisher's exact test see nag_stat_prob_hypergeom (g01bl).

Further Comments

The function nag_stat_contingency_table (g01af) allows for the automatic amalgamation of rows and columns. In most circumstances this is not recommended; see Everitt (1977).

Multidimensional contingency tables can be analysed using log-linear models fitted by nag_correg_glm_binomial (g02gb).

Example

The data below, taken from Everitt (1977), is from

141

patients with brain tumours. The row classification variable is the site of the tumour: frontal lobes, temporal lobes and other cerebral areas. The column classification variable is the type of tumour: benign, malignant and other cerebral tumours.

\begin{array}{r} 23 & 9 & 6 & 38 \\ 21 & 4 & 3 & 28 \\ 34 & 24 & 17 & 75 \\ 78 & 37 & 26 & 141 \end{array}

The data is read in and the statistics computed and printed.

Open in the MATLAB editor: g11aa_example

function g11aa_example


fprintf('g11aa example results\n\n');

nrow  = int64(3);
nobst = [int64(23),  9,  6;
                 21,   4,  3;
                 34,  24, 17];

[expt, chist, prob, chi, g, df, ifail] = ...
  g11aa(nrow, nobst);

% Display results
fprintf('Probability                     = %9.4f\n', prob);
fprintf('Pearson Chi-square statistic    = %8.3f\n', chi);
fprintf('Likelihood ratio test statistic = %8.3f\n', g);
fprintf('Degrees of freedom              = %4.0f\n', df);

g11aa example results

Probability                     =    0.0975
Pearson Chi-square statistic    =    7.844
Likelihood ratio test statistic =    8.096
Degrees of freedom              =    4

PDF version (NAG web site, 64-bit version, 64-bit version)

Chapter Contents

Chapter Introduction

NAG Toolbox