PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_nonpar_test_ks_1sample (g08cb)
Purpose
nag_nonpar_test_ks_1sample (g08cb) performs the one sample Kolmogorov–Smirnov test, using one of the distributions provided.
Syntax
[
par,
d,
z,
p,
sx,
ifail] = g08cb(
x,
dist,
par,
estima,
ntype, 'n',
n)
[
par,
d,
z,
p,
sx,
ifail] = nag_nonpar_test_ks_1sample(
x,
dist,
par,
estima,
ntype, 'n',
n)
Description
The data consist of a single sample of observations denoted by . Let and represent the sample cumulative distribution function and the theoretical (null) cumulative distribution function respectively at the point where is the th smallest sample observation.
The Kolmogorov–Smirnov test provides a test of the null hypothesis
: the data are a random sample of observations from a theoretical distribution specified by you against one of the following alternative hypotheses:
(i) |
: the data cannot be considered to be a random sample from the specified null distribution. |
(ii) |
: the data arise from a distribution which dominates the specified null distribution. In practical terms, this would be demonstrated if the values of the sample cumulative distribution function tended to exceed the corresponding values of the theoretical cumulative distribution function . |
(iii) |
: the data arise from a distribution which is dominated by the specified null distribution. In practical terms, this would be demonstrated if the values of the theoretical cumulative distribution function tended to exceed the corresponding values of the sample cumulative distribution function . |
One of the following test statistics is computed depending on the particular alternative null hypothesis specified (see the description of the argument
ntype in
Arguments).
For the alternative hypothesis
.
- – the largest absolute deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally .
For the alternative hypothesis
.
- – the largest positive deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally for both discrete and continuous null distributions.
For the alternative hypothesis
.
- – the largest positive deviation between the theoretical cumulative distribution function and the sample cumulative distribution function. Formally if the null distribution is discrete then and if the null distribution is continuous then .
The standardized statistic
is also computed where
may be
or
depending on the choice of the alternative hypothesis. This is the standardized value of
with no correction for continuity applied and the distribution of
converges asymptotically to a limiting distribution, first derived by
Kolmogorov (1933), and then tabulated by
Smirnov (1948). The asymptotic distributions for the one-sided statistics were obtained by
Smirnov (1933).
The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed. If
an exact method given by
Conover (1980), is used. Note that the method used is only exact for continuous theoretical distributions and does not include Conover's modification for discrete distributions. This method computes the one-sided probabilities. The two-sided probabilities are estimated by doubling the one-sided probability. This is a good estimate for small
, that is
, but it becomes very poor for larger
. If
then
is computed using the Kolmogorov–Smirnov limiting distributions, see
Feller (1948),
Kendall and Stuart (1973),
Kolmogorov (1933),
Smirnov (1933) and
Smirnov (1948).
References
Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov–Smirnov limit theorems for empirical distributions Ann. Math. Statist. 19 179–181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Kolmogorov A N (1933) Sulla determinazione empirica di una legge di distribuzione Giornale dell' Istituto Italiano degli Attuari 4 83–91
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill
Smirnov N (1933) Estimate of deviation between empirical distribution functions in two independent samples Bull. Moscow Univ. 2(2) 3–16
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions Ann. Math. Statist. 19 279–281
Parameters
Compulsory Input Parameters
- 1:
– double array
-
The sample observations .
Constraint:
the sample observations supplied must be consistent, in the usual manner, with the null distribution chosen, as specified by the arguments
dist and
par. For further details see
Further Comments.
- 2:
– string
-
The theoretical (null) distribution from which it is suspected the data may arise.
- The uniform distribution over .
- The Normal distribution with mean and variance .
- The gamma distribution with shape parameter and scale parameter , where the mean .
- The beta distribution with shape parameters and , where the mean .
- The binomial distribution with the number of trials, , and the probability of a success, .
- The exponential distribution with parameter , where the mean .
- The Poisson distribution with parameter , where the mean .
- The negative binomial distribution with the number of trials, , and the probability of success, .
- The generalized Pareto distribution with shape parameter and scale .
Any number of characters may be supplied as the actual argument, however only the characters, maximum 2, required to uniquely identify the distribution are referenced.
Constraint:
, , , , , , , or .
- 3:
– double array
-
If
,
par must contain the known values of the parameter(s) of the null distribution as follows.
If a uniform distribution is used, then and must contain the boundaries and respectively.
If a Normal distribution is used, then and must contain the mean, , and the variance, , respectively.
If a gamma distribution is used, then and must contain the parameters and respectively.
If a beta distribution is used, then and must contain the parameters and respectively.
If a binomial distribution is used, then and must contain the parameters and respectively.
If an exponential distribution is used, then must contain the parameter .
If a Poisson distribution is used, then must contain the parameter .
If a negative binomial distribution is used, and must contain the parameters and respectively.
If a generalized Pareto distribution is used, and must contain the parameters and respectively.
If
,
par need not be set except when the null distribution requested is either the binomial or the negative binomial distribution in which case
must contain the parameter
.
Constraints:
- if , ;
- if , ;
- if , and ;
- if , and and and ;
- if , and and and , where , see nag_machine_precision (x02aj);
- if , ;
- if , and ;
- if , and and and , where , see nag_machine_precision (x02aj);
- if , .
- 4:
– string (length ≥ 1)
-
estima must specify whether values of the parameters of the null distribution are known or are to be estimated from the data.
- Values of the parameters will be supplied in the array par described above.
- Parameters are to be estimated from the data except when the null distribution requested is the binomial distribution or the negative binomial distribution in which case the first parameter, , must be supplied in and only the second parameter, , is estimated from the data.
Constraint:
or .
- 5:
– int64int32nag_int scalar
-
The test statistic to be calculated, i.e., the choice of alternative hypothesis.
- Computes , to test against ,
- Computes , to test against ,
- Computes , to test against .
Constraint:
, or .
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the dimension of the array
x.
, the number of observations in the sample.
Constraint:
.
Output Parameters
- 1:
– double array
-
If
,
par is unchanged; if
, and
or
then
is estimated from the data; otherwise
and
are estimated from the data.
- 2:
– double scalar
-
The Kolmogorov–Smirnov test statistic (
,
or
according to the value of
ntype).
- 3:
– double scalar
-
A standardized value, , of the test statistic, , without any correction for continuity.
- 4:
– double scalar
-
The probability,
, associated with the observed value of
where
may be
or
depending on the value of
ntype (see
Description).
- 5:
– double array
-
The sample observations, , sorted in ascending order.
- 6:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
-
-
Constraint: .
-
-
On entry, was an illegal value.
-
-
Constraint: , or .
-
-
On entry, was an illegal value.
-
-
Constraint: for the beta distribution, and .
Constraint: for the binomial distribution, .
Constraint: for the binomial distribution,
, where
, see
nag_machine_precision (x02aj).
Constraint: for the exponential distribution, .
Constraint: for the gamma distribution, and .
Constraint: for the generalized Pareto distribution, .
Constraint: for the generalized Pareto distribution with , , for .
Constraint: for the negative binomial distribution, .
Constraint: for the negative binomial distribution,
, where
, see
nag_machine_precision (x02aj).
Constraint: for the Normal distribution, .
Constraint: for the Poisson distribution, .
Constraint: for the uniform distribution, .
-
-
On entry, and at least one observation is illegal.
Constraint: , for .
On entry, and all observations are zero or .
Constraint: at least one , for .
On entry, and at least one observation is illegal.
Constraint: , for .
On entry, or and all observations are zero.
Constraint: at least one , for .
On entry, , , , or and at least one observation is negative.
Constraint: , for .
On entry, and .
The parameter estimates are invalid; the data may not be from the generalized Pareto distribution.
On entry, and at least one observation is illegal.
Constraint: , for .
-
-
On entry, , , , or , and the whole sample is constant. Thus the variance is zero.
-
-
On entry, , , .
The variance exceeds 1000000.
On entry, , , .
The variance exceeds 1000000.
-
-
On entry,
and in the computation of the incomplete gamma function by
nag_specfun_gamma_incomplete (s14ba) the convergence of the Taylor series or Legendre continued fraction fails within
iterations.
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
The approximation for , given when , has a relative error of at most 2.5% for most cases. The two-sided probability is approximated by doubling the one-sided probability. This is only good for small , i.e., but very poor for large . The error is always on the conservative side, that is the tail probability, , is over estimated.
Further Comments
The time taken by nag_nonpar_test_ks_1sample (g08cb) increases with until at which point it drops and then increases slowly with . The time may also depend on the choice of null distribution and on whether or not the parameters are to be estimated.
The data supplied in the argument
x must be consistent with the chosen null distribution as follows:
- when , then , for ;
- when , then there are no constraints on the 's;
- when , then , for ;
- when , then , for ;
- when , then , for ;
- when , then , for ;
- when , then , for ;
- when , then , for ;
- when and , then , for ;
- when and , then , for .
Example
The following example program reads in a set of data consisting of 30 observations. The Kolmogorov–Smirnov test is then applied twice, firstly to test whether the sample is taken from a uniform distribution,
, and secondly to test whether the sample is taken from a Normal distribution where the mean and variance are estimated from the data. In both cases we are testing against
; that is, we are doing a two tailed test. The values of
d,
z and
p are printed for each case.
Open in the MATLAB editor:
g08cb_example
function g08cb_example
fprintf('g08cb example results\n\n');
x = [0.01; 0.30; 0.20; 0.90; 1.20; 0.09; 1.30; 0.18; 0.90; 0.48;
1.98; 0.03; 0.50; 0.07; 0.70; 0.60; 0.95; 1.00; 0.31; 1.45;
1.04; 1.25; 0.15; 0.75; 0.85; 0.22; 1.56; 0.81; 0.57; 0.55];
dist = 'Normal';
estima = 'Estimate';
ntype = int64(1);
par = [0; 0];
[par, d, z, p, sx, ifail] = g08cb( ...
x, dist, par, estima, ntype);
fprintf('K-S Test\n');
fprintf('Distribution: %s\n',dist);
fprintf('Parameters :');
fprintf('%7.4f', par);
fprintf('\n\nTest statistic D = %8.4f\n', d);
fprintf('Z statistic = %8.4f\n', z);
fprintf('Tail probability = %8.4f\n', p);
g08cb example results
K-S Test
Distribution: Normal
Parameters : 0.6967 0.2564
Test statistic D = 0.1108
Z statistic = 0.6068
Tail probability = 0.8925
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015