# NAG FL Interfaceg08cgf (test_​chisq)

## 1Purpose

g08cgf computes the test statistic for the ${\chi }^{2}$ goodness-of-fit test for data with a chosen number of class intervals.

## 2Specification

Fortran Interface
 Subroutine g08cgf ( cb, dist, par, prob, p, ndf, eval,
 Integer, Intent (In) :: nclass, ifreq(nclass), npest Integer, Intent (Inout) :: ifail Integer, Intent (Out) :: ndf Real (Kind=nag_wp), Intent (In) :: cb(nclass-1), par(2), prob(nclass) Real (Kind=nag_wp), Intent (Out) :: chisq, p, eval(nclass), chisqi(nclass) Character (1), Intent (In) :: dist
#include <nag.h>
 void g08cgf_ (const Integer *nclass, const Integer ifreq[], const double cb[], const char *dist, const double par[], const Integer *npest, const double prob[], double *chisq, double *p, Integer *ndf, double eval[], double chisqi[], Integer *ifail, const Charlen length_dist)
The routine may be called by the names g08cgf or nagf_nonpar_test_chisq.

## 3Description

The ${\chi }^{2}$ goodness-of-fit test performed by g08cgf is used to test the null hypothesis that a random sample arises from a specified distribution against the alternative hypothesis that the sample does not arise from the specified distribution.
Given a sample of size $n$, denoted by ${x}_{1},{x}_{2},\dots ,{x}_{n}$, drawn from a random variable $X$, and that the data has been grouped into $k$ classes,
 $x≤c1, ci-1ck-1,$
then the ${\chi }^{2}$ goodness-of-fit test statistic is defined by
 $X2=∑i=1k Oi-Ei 2Ei,$
where ${O}_{i}$ is the observed frequency of the $i$th class, and ${E}_{i}$ is the expected frequency of the $i$th class.
The expected frequencies are computed as
 $Ei=pi×n,$
where ${p}_{i}$ is the probability that $X$ lies in the $i$th class, that is
 $p1=PX≤c1, pi=Pci-1ck-1.$
These probabilities are either taken from a common probability distribution or are supplied by you. The available probability distributions within this routine are:
• Normal distribution with mean $\mu$, variance ${\sigma }^{2}$;
• uniform distribution on the interval $\left[a,b\right]$;
• exponential distribution with probability density function $\left(\mathrm{pdf}\right)=\lambda {e}^{-\lambda x}$;
• ${\chi }^{2}$-distribution with $f$ degrees of freedom; and
• gamma distribution with $\mathrm{pdf}=\frac{{x}^{\alpha -1}{e}^{-x/\beta }}{\Gamma \left(\alpha \right){\beta }^{\alpha }}$.
You must supply the frequencies and classes. Given a set of data and classes the frequencies may be calculated using g01aef.
g08cgf returns the ${\chi }^{2}$ test statistic, ${X}^{2}$, together with its degrees of freedom and the upper tail probability from the ${\chi }^{2}$-distribution associated with the test statistic. Note that the use of the ${\chi }^{2}$-distribution as an approximation to the distribution of the test statistic improves as the expected values in each class increase.
Conover W J (1980) Practical Nonparametric Statistics Wiley
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill

## 5Arguments

1: $\mathbf{nclass}$Integer Input
On entry: $k$, the number of classes into which the data is divided.
Constraint: ${\mathbf{nclass}}\ge 2$.
2: $\mathbf{ifreq}\left({\mathbf{nclass}}\right)$Integer array Input
On entry: ${\mathbf{ifreq}}\left(\mathit{i}\right)$ must specify the frequency of the $\mathit{i}$th class, ${O}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
Constraint: ${\mathbf{ifreq}}\left(\mathit{i}\right)\ge 0$, for $\mathit{i}=1,2,\dots ,k$.
3: $\mathbf{cb}\left({\mathbf{nclass}}-1\right)$Real (Kind=nag_wp) array Input
On entry: ${\mathbf{cb}}\left(\mathit{i}\right)$ must specify the upper boundary value for the $\mathit{i}$th class, for $\mathit{i}=1,2,\dots ,k-1$.
Constraint: ${\mathbf{cb}}\left(1\right)<{\mathbf{cb}}\left(2\right)<\cdots <{\mathbf{cb}}\left({\mathbf{nclass}}-1\right)$. For the exponential, gamma and ${\chi }^{2}$-distributions ${\mathbf{cb}}\left(1\right)\ge 0.0$.
4: $\mathbf{dist}$Character(1) Input
On entry: indicates for which distribution the test is to be carried out.
${\mathbf{dist}}=\text{'N'}$
The Normal distribution is used.
${\mathbf{dist}}=\text{'U'}$
The uniform distribution is used.
${\mathbf{dist}}=\text{'E'}$
The exponential distribution is used.
${\mathbf{dist}}=\text{'C'}$
The ${\chi }^{2}$-distribution is used.
${\mathbf{dist}}=\text{'G'}$
The gamma distribution is used.
${\mathbf{dist}}=\text{'A'}$
You must supply the class probabilities in the array prob.
Constraint: ${\mathbf{dist}}=\text{'N'}$, $\text{'U'}$, $\text{'E'}$, $\text{'C'}$, $\text{'G'}$ or $\text{'A'}$.
5: $\mathbf{par}\left(2\right)$Real (Kind=nag_wp) array Input
On entry: must contain the parameters of the distribution which is being tested. If you supply the probabilities (i.e., ${\mathbf{dist}}=\text{'A'}$) the array par is not referenced.
If a Normal distribution is used then ${\mathbf{par}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)$ must contain the mean, $\mu$, and the variance, ${\sigma }^{2}$, respectively.
If a uniform distribution is used then ${\mathbf{par}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)$ must contain the boundaries $a$ and $b$ respectively.
If an exponential distribution is used then ${\mathbf{par}}\left(1\right)$ must contain the parameter $\lambda$. ${\mathbf{par}}\left(2\right)$ is not used.
If a ${\chi }^{2}$-distribution is used then ${\mathbf{par}}\left(1\right)$ must contain the number of degrees of freedom. ${\mathbf{par}}\left(2\right)$ is not used.
If a gamma distribution is used ${\mathbf{par}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)$ must contain the parameters $\alpha$ and $\beta$ respectively.
Constraints:
• if ${\mathbf{dist}}=\text{'N'}$, ${\mathbf{par}}\left(2\right)>0.0$;
• if ${\mathbf{dist}}=\text{'U'}$, ${\mathbf{par}}\left(1\right)<{\mathbf{par}}\left(2\right)$ and ${\mathbf{par}}\left(1\right)\le {\mathbf{cb}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)\ge {\mathbf{cb}}\left({\mathbf{nclass}}-1\right)$;
• if ${\mathbf{dist}}=\text{'E'}$, ${\mathbf{par}}\left(1\right)>0.0$;
• if ${\mathbf{dist}}=\text{'C'}$, ${\mathbf{par}}\left(1\right)>0.0$;
• if ${\mathbf{dist}}=\text{'G'}$, ${\mathbf{par}}\left(1\right)>0.0$ and ${\mathbf{par}}\left(2\right)>0.0$.
6: $\mathbf{npest}$Integer Input
On entry: the number of estimated parameters of the distribution.
Constraint: $0\le {\mathbf{npest}}<{\mathbf{nclass}}-1$.
7: $\mathbf{prob}\left({\mathbf{nclass}}\right)$Real (Kind=nag_wp) array Input
On entry: if you are supplying the probability distribution (i.e., ${\mathbf{dist}}=\text{'A'}$) then ${\mathbf{prob}}\left(i\right)$ must contain the probability that $X$ lies in the $i$th class.
If ${\mathbf{dist}}\ne \text{'A'}$, prob is not referenced.
Constraints:
if ${\mathbf{dist}}=\text{'A'}$,
• ${\mathbf{prob}}\left(\mathit{i}\right)>0.0$, for $\mathit{i}=1,2,\dots ,k$;
• $\sum _{i=1}^{k}{\mathbf{prob}}\left(i\right)=1.0$.
8: $\mathbf{chisq}$Real (Kind=nag_wp) Output
On exit: the test statistic, ${X}^{2}$, for the ${\chi }^{2}$ goodness-of-fit test.
9: $\mathbf{p}$Real (Kind=nag_wp) Output
On exit: the upper tail probability from the ${\chi }^{2}$-distribution associated with the test statistic, ${X}^{2}$, and the number of degrees of freedom.
10: $\mathbf{ndf}$Integer Output
On exit: contains $\left({\mathbf{nclass}}-1-{\mathbf{npest}}\right)$, the degrees of freedom associated with the test.
11: $\mathbf{eval}\left({\mathbf{nclass}}\right)$Real (Kind=nag_wp) array Output
On exit: ${\mathbf{eval}}\left(\mathit{i}\right)$ contains the expected frequency for the $\mathit{i}$th class, ${E}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
12: $\mathbf{chisqi}\left({\mathbf{nclass}}\right)$Real (Kind=nag_wp) array Output
On exit: ${\mathbf{chisqi}}\left(\mathit{i}\right)$ contains the contribution from the $\mathit{i}$th class to the test statistic, that is, ${\left({O}_{\mathit{i}}-{E}_{\mathit{i}}\right)}^{2}/{E}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
13: $\mathbf{ifail}$Integer Input/Output
On entry: ifail must be set to $0$, $-1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $-1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $-1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $-1$ is recommended since useful values can be provided in some output arguments even when ${\mathbf{ifail}}\ne {\mathbf{0}}$ on exit. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
Note: in some cases g08cgf may return useful information.
${\mathbf{ifail}}=1$
On entry, ${\mathbf{nclass}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nclass}}\ge 2$.
${\mathbf{ifail}}=2$
On entry, ${\mathbf{dist}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{dist}}=\text{'N'}$, $\text{'U'}$, $\text{'E'}$, $\text{'C'}$, $\text{'G'}$ or $\text{'A'}$.
${\mathbf{ifail}}=3$
On entry, ${\mathbf{npest}}=〈\mathit{\text{value}}〉$.
Constraint: $0\le {\mathbf{npest}}<{\mathbf{nclass}}-1$.
${\mathbf{ifail}}=4$
On entry, $i=〈\mathit{\text{value}}〉$ and ${\mathbf{ifreq}}\left(i\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ifreq}}\left(i\right)\ge 0$.
${\mathbf{ifail}}=5$
On entry, $i=〈\mathit{\text{value}}〉$, ${\mathbf{cb}}\left(i-1\right)=〈\mathit{\text{value}}〉$ and ${\mathbf{cb}}\left(i\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{cb}}\left(i-1\right)<{\mathbf{cb}}\left(i\right)$.
${\mathbf{ifail}}=6$
On entry, ${\mathbf{cb}}\left(1\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{cb}}\left(1\right)\ge 0.0$.
${\mathbf{ifail}}=7$
On entry, ${\mathbf{par}}\left(1\right)=〈\mathit{\text{value}}〉$.
Constraint: for the exponential distribution, ${\mathbf{par}}\left(1\right)>0.0$.
On entry, ${\mathbf{par}}\left(1\right)=〈\mathit{\text{value}}〉$.
Constraint: for the ${\chi }^{2}$ distribution, ${\mathbf{par}}\left(1\right)>0.0$.
On entry, ${\mathbf{par}}\left(1\right)=〈\mathit{\text{value}}〉$ and ${\mathbf{par}}\left(2\right)=〈\mathit{\text{value}}〉$.
Constraint: for the gamma distribution, ${\mathbf{par}}\left(1\right)>0.0$ and ${\mathbf{par}}\left(2\right)>0.0$.
On entry, ${\mathbf{par}}\left(1\right)=〈\mathit{\text{value}}〉$ and ${\mathbf{par}}\left(2\right)=〈\mathit{\text{value}}〉$.
Constraint: for the uniform distribution, ${\mathbf{par}}\left(1\right)<{\mathbf{par}}\left(2\right)$, ${\mathbf{par}}\left(1\right)\le {\mathbf{cb}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)\ge {\mathbf{cb}}\left({\mathbf{nclass}}-1\right)$.
On entry, ${\mathbf{par}}\left(2\right)=〈\mathit{\text{value}}〉$.
Constraint: for the Normal distribution, ${\mathbf{par}}\left(2\right)>0.0$.
${\mathbf{ifail}}=8$
On entry, $i=〈\mathit{\text{value}}〉$ and ${\mathbf{prob}}\left(i\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{prob}}>0.0$
On entry, ${\sum }_{i}{\mathbf{prob}}\left(i\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\sum }_{i}{\mathbf{prob}}\left(i\right)=1.0$.
${\mathbf{ifail}}=9$
An expected frequency equals zero, when the observed frequency was not.
${\mathbf{ifail}}=10$
At least one class has an expected frequency less than $1$. The ${\chi }^{2}$ distribution may not be a good approximation to the distribution of the test statistic.
${\mathbf{ifail}}=11$
The solution has failed to converge whilst computing the expected values. The returned solution may be an adequate approximation.
${\mathbf{ifail}}=-99$
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

## 7Accuracy

The computations are believed to be stable.

## 8Parallelism and Performance

g08cgf is not threaded in any implementation.

The time taken by g08cgf is dependent both on the distribution chosen and on the number of classes, $k$.

## 10Example

This example applies the ${\chi }^{2}$ goodness-of-fit test to test whether there is evidence to suggest that a sample of $100$ randomly generated observations do not arise from a uniform distribution $U\left(0,1\right)$. The class intervals are calculated such that the interval $\left(0,1\right)$ is divided into five equal classes. The frequencies for each class are calculated using g01aef.

### 10.1Program Text

Program Text (g08cgfe.f90)

### 10.2Program Data

Program Data (g08cgfe.d)

### 10.3Program Results

Program Results (g08cgfe.r)