NAG FL Interface
g01aff (contingency_​table)

1 Purpose

g01aff performs the analysis of a two-way r×c contingency table or classification. If r=c=2, and the total number of objects classified is 40 or fewer, then the probabilities for Fisher's exact test are computed. Otherwise, a test statistic is computed (with Yates' correction when r=c=2), which under the assumption of no association between the classifications has approximately a chi-square distribution with r-1×c-1 degrees of freedom.

2 Specification

Fortran Interface
Subroutine g01aff ( ldnob, ldpred, m, n, nobs, num, pred, chis, p, npos, ndf, m1, n1, ifail)
Integer, Intent (In) :: ldnob, ldpred, m, n
Integer, Intent (Inout) :: nobs(ldnob,n), num, ifail
Integer, Intent (Out) :: npos, ndf, m1, n1
Real (Kind=nag_wp), Intent (Inout) :: pred(ldpred,n)
Real (Kind=nag_wp), Intent (Out) :: chis, p(21)
C Header Interface
#include <nag.h>
void  g01aff_ (const Integer *ldnob, const Integer *ldpred, const Integer *m, const Integer *n, Integer nobs[], Integer *num, double pred[], double *chis, double p[], Integer *npos, Integer *ndf, Integer *m1, Integer *n1, Integer *ifail)
The routine may be called by the names g01aff or nagf_stat_contingency_table.

3 Description

The data consist of the frequencies for the two-way classification, denoted by nij, for i=1,2,,m and j=1,2,,n with m,n>1.
A check is made to see whether any row or column of the matrix of frequencies consists entirely of zeros, and if so, the matrix of frequencies is reduced by omitting that row or column. Suppose the final size of the matrix is m1 by n1 (m1,n1>1), and let
There are two situations:
  1. (i)If m1>2 and/or n1>2, or m1=n1=2 and T>40, then the matrix of expected frequencies, denoted by rij, for i=1,2,,m1 and j=1,2,,n1, and the test statistic, χ2, are computed, where
    rij=RiCj/T,  i=1,2,,m1;j=1,2,,n1  
    and
    χ2=i= 1m1j= 1n1rij-nij-Y2/rij,  
    where
    Y= 12  if ​ m1=n1=2 0  otherwise  
    is Yates' correction for continuity.
    Under the assumption that there is no association between the two classifications, χ2 will have approximately a chi-square distribution with m1-1×n1-1 degrees of freedom.
    An option exists which allows for further ‘shrinkage’ of the matrix of frequencies in the case where rij<1 for the (i,j)th cell. If this is the case, then row i or column j will be combined with the adjacent row or column with smaller total. Row i is selected for combination if Ri×m1Cj×n1. This ‘shrinking’ process is continued until rij1 for all cells (i,j).
  2. (ii)If m1=n1=2 and T40, the probabilities to enable Fisher's exact test to be made are computed.
    The matrix of frequencies may be rearranged so that R1 is the smallest marginal (i.e., column and row) total, and C2C1. Under the assumption of no association between the classifications, the probability of obtaining r entries in cell 1,1 is computed where
    Pr+1=R1!R2!C1!C2! T!r!R1-r!C1-r!T-C1-R1+r! ,  r=0,1,,R1.  
    The probability of obtaining the table of given frequencies is returned. A test of the assumption against some alternative may then be made by summing the relevant values of Pr.

4 References

None.

5 Arguments

1: ldnob Integer Input
On entry: the first dimension of the array nobs as declared in the (sub)program from which g01aff is called.
Constraint: ldnobm.
2: ldpred Integer Input
On entry: the first dimension of the array pred as declared in the (sub)program from which g01aff is called.
Constraint: ldpredm.
3: m Integer Input
On entry: m+1, one more than the number of rows of the frequency matrix.
Constraint: m>2.
4: n Integer Input
On entry: n+1, one more than the number of columns of the frequency matrix.
Constraint: n>2.
5: nobsldnobn Integer array Input/Output
On entry: the elements nobsij, for i=1,2,,m and j=1,2,,n, must contain the frequencies for the two-way classification. The m+1th row and the n+1th column of nobs need not be set.
On exit: contains the following information:
  • nobsij, for i=1,2,,m1 and j=1,2,,n1, contain the frequencies for the two-way classification after ‘shrinkage’ has taken place (see Section 3).
  • nobsin+1, for i=1,2,,m1, contain the total frequencies in the remaining rows, Ri.
  • nobsm+1j, for j=1,2,,n1, contain the total frequencies in the remaining columns, Cj.
  • nobsm+1n+1, contains the total frequency, T.
If any ‘shrinkage’ has occurred, all other cells contain no useful information.
Constraint: nobsij0, for i=1,2,,m-1 and j=1,2,,n-1.
6: num Integer Input/Output
On entry: the value assigned to num must determine whether automatic ‘shrinkage’ is required when any rij<1, as outlined in Section 3(i).
If num=1, shrinkage is required, otherwise shrinkage is not required.
On exit: when Fisher's exact test for a 2×2 classification is used then num contains the number of elements used in the array p, otherwise num is set to zero.
7: predldpredn Real (Kind=nag_wp) array Output
On exit: the elements predij, where i=1,2,,m1 and j=1,2,,n1 contain the expected frequencies, rij corresponding to the observed frequencies nobsij, except in the case when Fisher's exact test for a 2×2 classification is to be used, when pred is not used. No other elements are utilized.
8: chis Real (Kind=nag_wp) Output
On exit: the value of the test statistic, χ2, except when Fisher's exact test for a 2×2 classification is used in which case it is unspecified.
9: p21 Real (Kind=nag_wp) array Output
p is used only when Fisher's exact test for a 2×2 classification is to be used.
On exit: the first num elements contain the probabilities associated with the various possible frequency tables, Pr, for r=0,1,,R1, the remainder are unspecified.
10: npos Integer Output
npos is used only when Fisher's exact test for a 2×2 classification is to be used.
On exit: pnpos holds the probability associated with the given table of frequencies.
11: ndf Integer Output
On exit: the value of ndf gives the number of degrees of freedom for the chi-square distribution, m1-1×n1-1; when Fisher's exact test is used ndf=1.
12: m1 Integer Output
On exit: the number of rows of the two-way classification, after any ‘shrinkage’, m1.
13: n1 Integer Output
On exit: the number of columns of the two-way classification, after any ‘shrinkage’, n1.
14: ifail Integer Input/Output
On entry: ifail must be set to 0, -1 or 1 to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of 0 causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of -1 means that an error message is printed while a value of 1 means that it is not.
If halting is not appropriate, the value -1 or 1 is recommended. If message printing is undesirable, then the value 1 is recommended. Otherwise, the value 0 is recommended. When the value -1 or 1 is used it is essential to test the value of ifail on exit.
On exit: ifail=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry ifail=0 or -1, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
ifail=1
On entry, m=value.
Constraint: m>2.
On entry, n=value.
Constraint: n>2.
The number of rows or columns of nobs is less than 2.
ifail=2
At least one frequency is negative, or all frequencies are zero.
ifail=4
On entry, ldnob=value and m=value.
Constraint: ldnobm.
On entry, ldpred=value and m=value.
Constraint: ldpredm.
ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
ifail=-399
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
ifail=-999
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

The method used is believed to be stable.

8 Parallelism and Performance

g01aff is not threaded in any implementation.

9 Further Comments

The time taken by g01aff will increase with m and n, except when Fisher's exact test is to be used, in which case it increases with size of the marginal and total frequencies.
If, on exit, num>0, or alternatively ndf is 1 and nobsmn40, the probabilities for use in Fisher's exact test for a 2×2 classification will be calculated, and not the test statistic with approximately a chi-square distribution.

10 Example

In the example program, NPROB determines the number of two-way classifications to be analysed. For each classification the frequencies are read, g01aff called, and information given on how much ‘shrinkage’ has taken place. If Fisher's exact test is to be used, the given frequencies and the array of probabilities associated with the possible frequency tables are printed. Otherwise, if the chi-square test is to be used, the given and expected frequencies, and the test statistic with its degrees of freedom are printed. In the example, there is one 2×3 classification, with shrinkage not requested.

10.1 Program Text

Program Text (g01affe.f90)

10.2 Program Data

Program Data (g01affe.d)

10.3 Program Results

Program Results (g01affe.r)