PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_stat_contingency_table (g01af)
Purpose
nag_stat_contingency_table (g01af) performs the analysis of a two-way contingency table or classification. If , and the total number of objects classified is or fewer, then the probabilities for Fisher's exact test are computed. Otherwise, a test statistic is computed (with Yates' correction when ), which under the assumption of no association between the classifications has approximately a chi-square distribution with degrees of freedom.
Syntax
[
nobs,
num,
pred,
chis,
p,
npos,
ndf,
m1,
n1,
ifail] = g01af(
nobs, 'm',
m, 'n',
n, 'num',
num)
[
nobs,
num,
pred,
chis,
p,
npos,
ndf,
m1,
n1,
ifail] = nag_stat_contingency_table(
nobs, 'm',
m, 'n',
n, 'num',
num)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 23: |
num was made optional (default 0) |
At Mark 22: |
m was made optional |
Description
The data consist of the frequencies for the two-way classification, denoted by , for and with .
A check is made to see whether any row or column of the matrix of frequencies consists entirely of zeros, and if so, the matrix of frequencies is reduced by omitting that row or column. Suppose the final size of the matrix is
by
(
), and let
- , the total frequency for the th row, for ,
- , the total frequency for the th column, for , and
- , the total frequency.
There are two situations:
(i) |
If and/or , or and , then the matrix of expected frequencies, denoted by , for and , and the test statistic, , are computed, where
and
where
is Yates' correction for continuity.
Under the assumption that there is no association between the two classifications, will have approximately a chi-square distribution with degrees of freedom.
An option exists which allows for further ‘shrinkage’ of the matrix of frequencies in the case where for the ()th cell. If this is the case, then row or column will be combined with the adjacent row or column with smaller total. Row is selected for combination if . This ‘shrinking’ process is continued until for all cells (). |
(ii) |
If and , the probabilities to enable Fisher's exact test to be made are computed.
The matrix of frequencies may be rearranged so that is the smallest marginal (i.e., column and row) total, and . Under the assumption of no association between the classifications, the probability of obtaining entries in cell is computed where
The probability of obtaining the table of given frequencies is returned. A test of the assumption against some alternative may then be made by summing the relevant values of . |
References
None.
Parameters
Compulsory Input Parameters
- 1:
– int64int32nag_int array
-
ldnob, the first dimension of the array, must satisfy the constraint
.
The elements
, for
and
, must contain the frequencies for the two-way classification. The
th row and the
th column of
nobs need not be set.
Constraint:
, for and .
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the first dimension of the array
nobs.
, one more than the number of rows of the frequency matrix.
Constraint:
.
- 2:
– int64int32nag_int scalar
-
Default:
the second dimension of the array
nobs.
, one more than the number of columns of the frequency matrix.
Constraint:
.
- 3:
– int64int32nag_int scalar
Default:
The value assigned to
num must determine whether automatic ‘shrinkage’ is required when any
, as outlined in
Description(i).
If , shrinkage is required, otherwise shrinkage is not required.
Output Parameters
- 1:
– int64int32nag_int array
-
Contains the following information:
- , for and , contain the frequencies for the two-way classification after ‘shrinkage’ has taken place (see Description).
- , for , contain the total frequencies in the remaining rows, .
- , for , contain the total frequencies in the remaining columns, .
- , contains the total frequency, .
If any ‘shrinkage’ has occurred, then all other cells contain no useful information.
- 2:
– int64int32nag_int scalar
Default:
When Fisher's exact test for a
classification is used then
num contains the number of elements used in the array
p, otherwise
num is set to zero.
- 3:
– double array
-
The elements
, where
and
contain the expected frequencies,
corresponding to the observed frequencies
, except in the case when Fisher's exact test for a
classification is to be used, when
pred is not used. No other elements are utilized.
- 4:
– double scalar
-
The value of the test statistic, , except when Fisher's exact test for a classification is used in which case it is unspecified.
- 5:
– double array
-
The first
num elements contain the probabilities associated with the various possible frequency tables,
, for
, the remainder are unspecified.
- 6:
– int64int32nag_int scalar
-
holds the probability associated with the given table of frequencies.
- 7:
– int64int32nag_int scalar
-
The value of
ndf gives the number of degrees of freedom for the chi-square distribution,
; when Fisher's exact test is used
.
- 8:
– int64int32nag_int scalar
-
The number of rows of the two-way classification, after any ‘shrinkage’, .
- 9:
– int64int32nag_int scalar
-
The number of columns of the two-way classification, after any ‘shrinkage’, .
- 10:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
-
-
The number of rows or columns of
nobs is less than
, possibly after shrinkage.
-
-
At least one frequency is negative, or all frequencies are zero.
-
-
On entry, | , |
or | . |
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
The method used is believed to be stable.
Further Comments
The time taken by
nag_stat_contingency_table (g01af) will increase with
m and
n, except when Fisher's exact test is to be used, in which case it increases with size of the marginal and total frequencies.
If, on exit,
, or alternatively
ndf is
and
, the probabilities for use in Fisher's exact test for a
classification will be calculated, and not the test statistic with approximately a chi-square distribution.
Example
In the example program, NPROB determines the number of two-way classifications to be analysed. For each classification the frequencies are read, nag_stat_contingency_table (g01af) called, and information given on how much ‘shrinkage’ has taken place. If Fisher's exact test is to be used, the given frequencies and the array of probabilities associated with the possible frequency tables are printed. Otherwise, if the chi-square test is to be used, the given and expected frequencies, and the test statistic with its degrees of freedom are printed. In the example, there is one classification, with shrinkage not requested.
Open in the MATLAB editor:
g01af_example
function g01af_example
fprintf('g01af example results\n\n');
nr = 2;
nc = 3;
nobs = zeros(nr+1,nc+1,'int64');
nobs(1:nr,1:nc) = [ 86, 51, 13;
130, 115, 41];
[nobs, num, pred, chis, p, npos, ndf, m1, n1, ifail] = ...
g01af(nobs);
if (m1~=nr)
fprintf('Number of rows reduced from %2d to %2d\n', nr, m1);
end
if (n1~=nc)
fprintf('Number of rows reduced from %2d to %2d\n', nc, n1);
end
fprintf('\nTable of observed frequencies\n\n');
fprintf(' total\n');
for j = 1:m1
fprintf('%8s',' ');
fprintf('%5d',nobs(j,1:n1));
fprintf('%8d\n',nobs(j,n1+1));
end
fprintf('\n%8s','total');
fprintf('%5d',nobs(m1+1,1:n1));
fprintf('%8d\n',nobs(m1+1,n1+1));
fprintf('\n\nTable of expected frequencies\n\n');
for j = 1:m1
fprintf('%8s',' ');
fprintf('%5d',int64(pred(j,1:n1)));
fprintf('\n');
end
fprintf('\nChi-squared = %7.3f\n', chis);
fprintf('Degrees of freedom = %4d\n', ndf);
function g01af_table(m1,n1,obs)
fprintf(' total\n');
for j = 1:m1
fprintf('%8s',' ');
fprintf('%5d',obs(j,1:n1));
fprintf('%8d\n',obs(j,n1+1));
end
fprintf('\n%8s','total');
fprintf('%5d',obs(m1+1,1:n1));
fprintf('%8d\n',obs(m1+1,n1+1));
g01af example results
Table of observed frequencies
total
86 51 13 150
130 115 41 286
total 216 166 54 436
Table of expected frequencies
74 57 19
142 109 35
Chi-squared = 6.352
Degrees of freedom = 2
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015