naginterfaces.library.contab.chisq¶

naginterfaces.library.contab.chisq(nobs)[source]¶

chisq computes $χ^{2}$ statistics for a two-way contingency table. For a $2 \times 2$ table with a small number of observations exact probabilities are computed.

For full information please refer to the NAG Library document for g11aa

https://support.nag.com/numeric/nl/nagdoc_31.1/flhtml/g11/g11aaf.html

Parameters

nobsint, array-like, shape $(nrow, ncol)$: The contingency table $n o b s [i - 1, j - 1]$ must contain $n_{i j}$ , for $j = 1, 2, \dots, c$ , for $i = 1, 2, \dots, r$ .

Returns

exptfloat, ndarray, shape $(nrow, ncol)$: The table of expected values. $e x p t [i - 1, j - 1]$ contains $f_{i j}$ , for $j = 1, 2, \dots, c$ , for $i = 1, 2, \dots, r$ .
chistfloat, ndarray, shape $(nrow, ncol)$: The table of $χ^{2}$ contributions. $c h i s t [i - 1, j - 1]$ contains $\frac{{(n_{i j} - f_{i j})}_{i j}^{2}}{f_{i j}}$ , for $j = 1, 2, \dots, c$ , for $i = 1, 2, \dots, r$ .
probfloat: If $c = 2$ , $r = 2$ and $n \leq 40$ then $p r o b$ contains the two tail significance level for Fisher’s exact test, otherwise $p r o b$ contains the significance level from the Pearson $χ^{2}$ statistic.
chifloat: The Pearson $χ^{2}$ statistic.
gfloat: The likelihood ratio test statistic.
dffloat: The degrees of freedom for the statistics.

Raises

NagValueError

(errno $1$ )

On entry, $ncol = ⟨ v a l u e ⟩$ .

Constraint: $ncol \geq 2$ .

(errno $1$ )

On entry, $nrow = ⟨ v a l u e ⟩$ .

Constraint: $nrow \geq 2$ .

(errno $2$ )

On entry, all elements of $n o b s = 0$ .

(errno $2$ )

On entry, $i = ⟨ v a l u e ⟩$ , $j = ⟨ v a l u e ⟩$ and $n o b s [i - 1, j - 1] = ⟨ v a l u e ⟩$ .

Constraint: $n o b s [i - 1, j - 1] \geq 0$ .

(errno $3$ )

On entry, a $2 \times 2$ table has a row or column with both values zero.

Warns

NagAlgorithmicWarning

(errno $4$ ): At least one cell has an expected frequency, $f_{i j} \leq 0.5$ .

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

For a set of $n$ observations classified by two variables, with $r$ and $c$ levels respectively, a two-way table of frequencies with $r$ rows and $c$ columns can be computed.

\begin{matrix} \begin{matrix} n_{11} & n_{12} & \dots & n_{1 c} & n_{1.} n_{21} & n_{22} & \dots & n_{2 c} & n_{2.} ⋮ & ⋮ & ⋮ & ⋮ & ⋮ n_{r 1} & n_{r 2} & \dots & n_{r c} & n_{r .} n_{.1} & n_{.2} & \dots & n_{. c} & n \end{matrix} \end{matrix}

To measure the association between the two classification variables two statistics that can be used are, the Pearson $χ^{2}$ statistic, $\sum_{i = 1}^{r} \sum_{j = 1}^{c} \frac{{(n_{i j} - f_{i j})}_{i j}^{2}}{f_{i j}}$ , and the likelihood ratio test statistic, $2 \sum_{i = 1}^{r} \sum_{j = 1}^{c} n_{i j} \times log (n_{i j} / f_{i j})$ , where $f_{i j}$ are the fitted values from the model that assumes the effects due to the classification variables are additive, i.e., there is no association. These values are the expected cell frequencies and are given by

f_{i j} = n_{i .} n_{. j} / n .

Under the hypothesis of no association between the two classification variables, both these statistics have, approximately, a $χ^{2}$ -distribution with $(c - 1) (r - 1)$ degrees of freedom. This distribution is arrived at under the assumption that the expected cell frequencies, $f_{i j}$ , are not too small. For a discussion of this point see Everitt (1977). He concludes by saying, ‘… in the majority of cases the chi-square criterion may be used for tables with expectations in excess of $0.5$ in the smallest cell’.

In the case of the $2 \times 2$ table, i.e., $c = 2$ and $r = 2$ , the $χ^{2}$ approximation can be improved by using Yates’ continuity correction factor. This decreases the absolute value of $(n_{i j} - f_{i j})$ by $\frac{1}{2}$ . For $2 \times 2$ tables with a small value of $n$ the exact probabilities from Fisher’s test are computed. These are based on the hypergeometric distribution and are computed using stat.prob_hypergeom. A two tail probability is computed as $m i n (1, 2 p_{u}, 2 p_{l})$ , where $p_{u}$ and $p_{l}$ are the upper and lower one-tail probabilities from the hypergeometric distribution.

References

Everitt, B S, 1977, The Analysis of Contingency Tables, Chapman and Hall

Kendall, M G and Stuart, A, 1973, The Advanced Theory of Statistics (Volume 2), (3rd Edition), Griffin

NAG and Python

Return to Front

naginterfaces.library.contab.chisq¶