Integer, Intent (In)	::	nrow, ncol, nobs(ldnobs,ncol), ldnobs
Integer, Intent (Inout)	::	ifail
Real (Kind=nag_wp), Intent (Inout)	::	expt(ldnobs,ncol), chist(ldnobs,ncol)
Real (Kind=nag_wp), Intent (Out)	::	prob, chi, g, df

C Header Interface

#include <nag.h>

void	g11aaf_ (const Integer nrow, const Integer ncol, const Integer nobs[], const Integer ldnobs, double expt[], double chist[], double prob, double chi, double g, double df, Integer ifail)

The routine may be called by the names g11aaf or nagf_contab_chisq.

3 Description

For a set of

n

observations classified by two variables, with

r

and

c

levels respectively, a two-way table of frequencies with

r

rows and

c

columns can be computed.

\begin{matrix} n_{11} & n_{12} & \dots & n_{1 c} & n_{1 .} \\ n_{21} & n_{22} & \dots & n_{2 c} & n_{2 .} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ n_{r 1} & n_{r 2} & \dots & n_{r c} & n_{r .} \\ n_{. 1} & n_{. 2} & \dots & n_{. c} & n \end{matrix}

To measure the association between the two classification variables two statistics that can be used are, the Pearson

χ^{2}

statistic,

\sum_{i = 1}^{r} \sum_{j = 1}^{c} \frac{{(n_{i j} - f_{i j})}^{2}}{f_{i j}}

, and the likelihood ratio test statistic,

2 \sum_{i = 1}^{r} \sum_{j = 1}^{c} n_{i j} \times \log (n_{i j} / f_{i j})

, where

f_{i j}

are the fitted values from the model that assumes the effects due to the classification variables are additive, i.e., there is no association. These values are the expected cell frequencies and are given by

f_{i j} = n_{i .} n_{. j} / n .

Under the hypothesis of no association between the two classification variables, both these statistics have, approximately, a

χ^{2}

-distribution with

(c - 1) (r - 1)

degrees of freedom. This distribution is arrived at under the assumption that the expected cell frequencies,

f_{i j}

, are not too small. For a discussion of this point see Everitt (1977). He concludes by saying, ‘... in the majority of cases the chi-square criterion may be used for tables with expectations in excess of

0.5

in the smallest cell’.

In the case of the

2 \times 2

table, i.e.,

c = 2

and

r = 2

, the

χ^{2}

approximation can be improved by using Yates' continuity correction factor. This decreases the absolute value of

(n_{i j} - f_{i j})

\frac{1}{2}

. For

2 \times 2

tables with a small value of

n

the exact probabilities from Fisher's test are computed. These are based on the hypergeometric distribution and are computed using g01blf. A two tail probability is computed as

\min (1, 2 p_{u}, 2 p_{l})

, where

p_{u}

and

p_{l}

are the upper and lower one-tail probabilities from the hypergeometric distribution.

4 References

Everitt B S (1977) The Analysis of Contingency Tables Chapman and Hall

Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin

5 Arguments

1: $nrow$ – Integer Input: On entry: $r$ , the number of rows in the contingency table.

Constraint: $nrow \geq 2$ .
2: $ncol$ – Integer Input: On entry: $c$ , the number of columns in the contingency table.

Constraint: $ncol \geq 2$ .
3: $nobs (ldnobs, ncol)$ – Integer array Input: On entry: the contingency table $nobs (i, j)$ must contain $n_{i j}$ , for $i = 1, 2, \dots, r$ and $j = 1, 2, \dots, c$ .

Constraint: $nobs (i, j) \geq 0$ , for $i = 1, 2, \dots, r$ and $j = 1, 2, \dots, c$ .
4: $ldnobs$ – Integer Input: On entry: the first dimension of the arrays nobs, expt and chist as declared in the (sub)program from which g11aaf is called.

Constraint: $ldnobs \geq nrow$ .
5: $expt (ldnobs, ncol)$ – Real (Kind=nag_wp) array Output: On exit: the table of expected values. $expt (i, j)$ contains $f_{i j}$ , for $i = 1, 2, \dots, r$ and $j = 1, 2, \dots, c$ .
6: $chist (ldnobs, ncol)$ – Real (Kind=nag_wp) array Output: On exit: the table of $χ^{2}$ contributions. $chist (i, j)$ contains $\frac{{(n_{i j} - f_{i j})}^{2}}{f_{i j}}$ , for $i = 1, 2, \dots, r$ and $j = 1, 2, \dots, c$ .
7: $prob$ – Real (Kind=nag_wp) Output: On exit: if $c = 2$ , $r = 2$ and $n \leq 40$ then prob contains the two tail significance level for Fisher's exact test, otherwise prob contains the significance level from the Pearson $χ^{2}$ statistic.
8: $chi$ – Real (Kind=nag_wp) Output: On exit: the Pearson $χ^{2}$ statistic.
9: $g$ – Real (Kind=nag_wp) Output: On exit: the likelihood ratio test statistic.
10: $df$ – Real (Kind=nag_wp) Output: On exit: the degrees of freedom for the statistics.
11: $ifail$ – Integer Input/Output: On entry: ifail must be set to $0$ , $−1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $−1$ means that an error message is printed while a value of $1$ means that it is not.

If halting is not appropriate, the value $−1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $−1$ is recommended since useful values can be provided in some output arguments even when $ifail \neq 0$ on exit. When the value $- 1$ or $1$ is used it is essential to test the value of ifail on exit.

On exit: $ifail = 0$ unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry

ifail = 0

−1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Errors or warnings detected by the routine:

Note: in some cases g11aaf may return useful information.

$ifail = 1$: On entry, $ldnobs = ⟨ value ⟩$ and $nrow = ⟨ value ⟩$ .
Constraint: $ldnobs \geq nrow$ .

On entry, $ncol = ⟨ value ⟩$ .
Constraint: $ncol \geq 2$ .

On entry, $nrow = ⟨ value ⟩$ .
Constraint: $nrow \geq 2$ .

$ifail = 2$: On entry, all elements of $nobs = 0$ .

On entry, $i = ⟨ value ⟩$ , $j = ⟨ value ⟩$ and $nobs (i, j) = ⟨ value ⟩$ .
Constraint: $nobs (i, j) \geq 0$ .

$ifail = 3$: On entry, a $2 \times 2$ table has a row or column with both values zero.

$ifail = 4$: At least one cell has an expected frequency, $f_{i j} \leq 0.5$ . The $χ^{2}$ approximation may be poor.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

For the accuracy of the probabilities for Fisher's exact test see g01blf.

8 Parallelism and Performance

g11aaf is not threaded in any implementation.

9 Further Comments

The routine g01aff allows for the automatic amalgamation of rows and columns. In most circumstances this is not recommended; see Everitt (1977).

Multidimensional contingency tables can be analysed using log-linear models fitted by g02gbf.

10 Example

The data below, taken from Everitt (1977), is from

141

patients with brain tumours. The row classification variable is the site of the tumour: frontal lobes, temporal lobes and other cerebral areas. The column classification variable is the type of tumour: benign, malignant and other cerebral tumours.

\begin{array}{r} 23 & 9 & 6 & 38 \\ 21 & 4 & 3 & 28 \\ 34 & 24 & 17 & 75 \\ 78 & 37 & 26 & 141 \end{array}

The data is read in and the statistics computed and printed.

g11aa: FL CL CPP AD

NAG FL Interfaceg11aaf (chisq)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG FL Interface
g11aaf (chisq)