hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_nonpar_test_chisq (g08cg)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_nonpar_test_chisq (g08cg) computes the test statistic for the χ2 goodness-of-fit test for data with a chosen number of class intervals.

Syntax

[chisq, p, ndf, eval, chisqi, ifail] = g08cg(ifreq, cb, dist, par, npest, prob, 'nclass', nclass)
[chisq, p, ndf, eval, chisqi, ifail] = nag_nonpar_test_chisq(ifreq, cb, dist, par, npest, prob, 'nclass', nclass)

Description

The χ2 goodness-of-fit test performed by nag_nonpar_test_chisq (g08cg) is used to test the null hypothesis that a random sample arises from a specified distribution against the alternative hypothesis that the sample does not arise from the specified distribution.
Given a sample of size n, denoted by x1,x2,,xn, drawn from a random variable X, and that the data has been grouped into k classes,
xc1, ci-1<xci, i=2,3,,k-1, x>ck-1,  
then the χ2 goodness-of-fit test statistic is defined by
X2=i=1k Oi-Ei 2Ei,  
where Oi is the observed frequency of the ith class, and Ei is the expected frequency of the ith class.
The expected frequencies are computed as
Ei=pi×n,  
where pi is the probability that X lies in the ith class, that is
p1=PXc1, pi=Pci-1<Xci, i=2,3,,k-1, pk=PX>ck-1.  
These probabilities are either taken from a common probability distribution or are supplied by you. The available probability distributions within this function are:
You must supply the frequencies and classes. Given a set of data and classes the frequencies may be calculated using nag_stat_frequency_table (g01ae).
nag_nonpar_test_chisq (g08cg) returns the χ2 test statistic, X2, together with its degrees of freedom and the upper tail probability from the χ2-distribution associated with the test statistic. Note that the use of the χ2-distribution as an approximation to the distribution of the test statistic improves as the expected values in each class increase.

References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill

Parameters

Compulsory Input Parameters

1:     ifreqnclass int64int32nag_int array
ifreqi must specify the frequency of the ith class, Oi, for i=1,2,,k.
Constraint: ifreqi0, for i=1,2,,k.
2:     cbnclass-1 – double array
cbi must specify the upper boundary value for the ith class, for i=1,2,,k-1.
Constraint: cb1<cb2<<cbnclass-1. For the exponential, gamma and χ2-distributions cb10.0.
3:     dist – string (length ≥ 1)
Indicates for which distribution the test is to be carried out.
dist='N'
The Normal distribution is used.
dist='U'
The uniform distribution is used.
dist='E'
The exponential distribution is used.
dist='C'
The χ2-distribution is used.
dist='G'
The gamma distribution is used.
dist='A'
You must supply the class probabilities in the array prob.
Constraint: dist='N', 'U', 'E', 'C', 'G' or 'A'.
4:     par2 – double array
Must contain the parameters of the distribution which is being tested. If you supply the probabilities (i.e., dist='A') the array par is not referenced.
If a Normal distribution is used then par1 and par2 must contain the mean, μ, and the variance, σ2, respectively.
If a uniform distribution is used then par1 and par2 must contain the boundaries a and b respectively.
If an exponential distribution is used then par1 must contain the parameter λ. par2 is not used.
If a χ2-distribution is used then par1 must contain the number of degrees of freedom. par2 is not used.
If a gamma distribution is used par1 and par2 must contain the parameters α and β respectively.
Constraints:
  • if dist='N', par2>0.0;
  • if dist='U', par1<par2 and par1cb1 and par2cbnclass-1;
  • if dist='E', par1>0.0;
  • if dist='C', par1>0.0;
  • if dist='G', par1>0.0 and par2>0.0.
5:     npest int64int32nag_int scalar
The number of estimated parameters of the distribution.
Constraint: 0npest<nclass-1.
6:     probnclass – double array
If you are supplying the probability distribution (i.e., dist='A') then probi must contain the probability that X lies in the ith class.
If dist'A', prob is not referenced.
Constraint: if dist='A', i=1kprobi=1.0, probi>0.0, for i=1,2,,k.

Optional Input Parameters

1:     nclass int64int32nag_int scalar
Default: the dimension of the arrays ifreq, prob. (An error is raised if these dimensions are not equal.)
k, the number of classes into which the data is divided.
Constraint: nclass2.

Output Parameters

1:     chisq – double scalar
The test statistic, X2, for the χ2 goodness-of-fit test.
2:     p – double scalar
The upper tail probability from the χ2-distribution associated with the test statistic, X2, and the number of degrees of freedom.
3:     ndf int64int32nag_int scalar
Contains nclass-1-npest, the degrees of freedom associated with the test.
4:     evalnclass – double array
evali contains the expected frequency for the ith class, Ei, for i=1,2,,k.
5:     chisqinclass – double array
chisqii contains the contribution from the ith class to the test statistic, that is, Oi-Ei 2/Ei, for i=1,2,,k.
6:     ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Note: nag_nonpar_test_chisq (g08cg) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

   ifail=1
On entry,nclass<2.
   ifail=2
On entry,dist is invalid.
   ifail=3
On entry,npest<0,
ornpestnclass-1.
   ifail=4
On entry,ifreqi<0.0 for some i, for i=1,2,,k.
   ifail=5
On entry, the elements of cb are not in ascending order. That is, cbicbi-1 for some i, for i=2,3,,k-1.
   ifail=6
On entry, dist='E', 'C' or 'G' and cb1<0.0. No negative class boundary values are valid for the exponential, gamma or χ2-distributions.
   ifail=7
On entry,the values provided in par are invalid.
   ifail=8
On entry,with dist='A', probi0.0 for some i, for i=1,2,,k,
ori=1kprobi1.0.
   ifail=9
An expected frequency is equal to zero when the observed frequency was not.
W  ifail=10
This is a warning that expected values for certain classes are less than 1.0. This implies that we cannot be confident that the χ2-distribution is a good approximation to the distribution of the test statistic.
W  ifail=11
The solution obtained when calculating the probability for a certain class for the gamma or χ2-distribution did not converge in 600 iterations. The solution may be an adequate approximation.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

The computations are believed to be stable.

Further Comments

The time taken by nag_nonpar_test_chisq (g08cg) is dependent both on the distribution chosen and on the number of classes, k.

Example

This example applies the χ2 goodness-of-fit test to test whether there is evidence to suggest that a sample of 100 randomly generated observations do not arise from a uniform distribution U0,1. The class intervals are calculated such that the interval 0,1 is divided into five equal classes. The frequencies for each class are calculated using nag_stat_frequency_table (g01ae).
function g08cg_example


fprintf('g08cg example results\n\n');

x = [ 0.59 0.23 0.76 0.96 0.20 0.91 0.29 0.22 0.36 0.81 ...
      0.91 0.80 0.17 0.82 0.07 0.74 0.15 0.91 0.26 0.98 ...
      0.59 0.34 0.28 0.95 0.33 0.42 0.72 0.35 0.86 0.22 ...
      0.15 0.39 0.32 0.82 0.13 0.48 0.46 0.74 0.99 0.26 ...
      0.04 0.21 0.04 0.24 0.56 0.36 0.48 0.53 1.00 0.58 ...
      0.50 0.41 0.03 0.38 0.89 0.40 0.66 0.79 0.34 0.94 ...
      0.49 0.12 0.24 0.05 1.00 0.29 0.67 0.29 0.75 0.81 ...
      0.45 0.21 0.51 0.68 0.78 0.20 0.23 0.57 0.25 0.48 ...
      0.96 0.33 0.48 0.55 0.04 0.48 0.42 0.11 0.38 0.73 ...
      0.91 0.45 0.59 0.97 0.27 0.27 0.25 0.99 0.99 0.80];

cb     = [0.2;     0.4;     0.6;     0.8;    1.0 ];
nclass = int64(5);

% Produce frequency table
[~, ifreq, ~, ~, ifail] = ...
  g01ae( ...
         nclass, x, 'cb', cb);

% Test parameters
dist   = 'Uniform';
npest  = int64(0);
par    = [0;  1];
prob   = zeros(nclass,1);

% Perform Chi^2 test
[chisq, p, ndf, eval, chisqi, ifail] = ...
  g08cg( ...
         ifreq, cb, dist, par, npest, prob, 'nclass', nclass);

fprintf('Chi-squared test statistic   = %10.4f\n', chisq);
fprintf('Degrees of freedom.          = %5d\n', ndf);
fprintf('Significance level           = %10.4f\n\n', p);
fprintf('The contributions to the test statistic are :-\n');
disp(chisqi');


g08cg example results

Chi-squared test statistic   =    14.2000
Degrees of freedom.          =     4
Significance level           =     0.0067

The contributions to the test statistic are :-
    3.2000    6.0500    0.4500    4.0500    0.4500


PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015