PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_nonpar_test_ks_2sample (g08cd)
Purpose
nag_nonpar_test_ks_2sample (g08cd) performs the two sample Kolmogorov–Smirnov distribution test.
Syntax
[
d,
z,
p,
sx,
sy,
ifail] = g08cd(
x,
y,
ntype, 'n1',
n1, 'n2',
n2)
[
d,
z,
p,
sx,
sy,
ifail] = nag_nonpar_test_ks_2sample(
x,
y,
ntype, 'n1',
n1, 'n2',
n2)
Description
The data consists of two independent samples, one of size , denoted by , and the other of size denoted by . Let and represent their respective, unknown, distribution functions. Also let and denote the values of the sample cumulative distribution functions at the point for the two samples respectively.
The Kolmogorov–Smirnov test provides a test of the null hypothesis
:
against one of the following alternative hypotheses:
(i) |
: . |
(ii) |
: . This alternative hypothesis is sometimes stated as, ‘The 's tend to be smaller than the 's’, i.e., it would be demonstrated in practical terms if the values of tended to exceed the corresponding values of . |
(iii) |
: . This alternative hypothesis is sometimes stated as, ‘The 's tend to be larger than the 's’, i.e., it would be demonstrated in practical terms if the values of tended to exceed the corresponding values of . |
One of the following test statistics is computed depending on the particular alternative null hypothesis specified (see the description of the argument
ntype in
Arguments).
For the alternative hypothesis
.
- – the largest absolute deviation between the two sample cumulative distribution functions.
For the alternative hypothesis
.
- – the largest positive deviation between the sample cumulative distribution function of the first sample, , and the sample cumulative distribution function of the second sample, . Formally .
For the alternative hypothesis
.
- – the largest positive deviation between the sample cumulative distribution function of the second sample, , and the sample cumulative distribution function of the first sample, . Formally .
nag_nonpar_test_ks_2sample (g08cd) also returns the standardized statistic
, where
may be
,
or
depending on the choice of the alternative hypothesis. The distribution of this statistic converges asymptotically to a distribution given by Smirnov as
and
increase; see
Feller (1948),
Kendall and Stuart (1973),
Kim and Jenrich (1973),
Smirnov (1933) or
Smirnov (1948)The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed. If
and
then an exact method given by Kim and Jenrich (see
Kim and Jenrich (1973)) is used. Otherwise
is computed using the approximations suggested by
Kim and Jenrich (1973). Note that the method used is only exact for continuous theoretical distributions. This method computes the two-sided probability. The one-sided probabilities are estimated by halving the two-sided probability. This is a good estimate for small
, that is
, but it becomes very poor for larger
.
References
Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov–Smirnov limit theorems for empirical distributions Ann. Math. Statist. 19 179–181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Kim P J and Jenrich R I (1973) Tables of exact sampling distribution of the two sample Kolmogorov–Smirnov criterion Selected Tables in Mathematical Statistics 1 80–129 American Mathematical Society
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill
Smirnov N (1933) Estimate of deviation between empirical distribution functions in two independent samples Bull. Moscow Univ. 2(2) 3–16
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions Ann. Math. Statist. 19 279–281
Parameters
Compulsory Input Parameters
- 1:
– double array
-
The observations from the first sample, .
- 2:
– double array
-
The observations from the second sample, .
- 3:
– int64int32nag_int scalar
-
The statistic to be computed, i.e., the choice of alternative hypothesis.
- Computes , to test against .
- Computes , to test against .
- Computes , to test against .
Constraint:
, or .
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the dimension of the array
x.
The number of observations in the first sample, .
Constraint:
.
- 2:
– int64int32nag_int scalar
-
Default:
the dimension of the array
y.
The number of observations in the second sample, .
Constraint:
.
Output Parameters
- 1:
– double scalar
-
The Kolmogorov–Smirnov test statistic (
,
or
according to the value of
ntype).
- 2:
– double scalar
-
A standardized value, , of the test statistic, , without any correction for continuity.
- 3:
– double scalar
-
The tail probability associated with the observed value of
, where
may be
or
depending on the value of
ntype (see
Description).
- 4:
– double array
-
The observations from the first sample sorted in ascending order.
- 5:
– double array
-
The observations from the second sample sorted in ascending order.
- 6:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
-
-
-
-
On entry, | , or . |
-
-
The iterative procedure used in the approximation of the probability for large and did not converge. For the two-sided test, is returned. For the one-sided test, is returned.
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
The large sample distributions used as approximations to the exact distribution should have a relative error of less than 5% for most cases.
Further Comments
The time taken by nag_nonpar_test_ks_2sample (g08cd) increases with and , until or . At this point one of the approximations is used and the time decreases significantly. The time then increases again modestly with and .
Example
This example computes the two-sided Kolmogorov–Smirnov test statistic for two independent samples of size and respectively. The first sample is from a uniform distribution . The second sample is from a uniform distribution . The test statistic, , the standardized test statistic, , and the tail probability, , are computed and printed.
Open in the MATLAB editor:
g08cd_example
function g08cd_example
fprintf('g08cd example results\n\n');
x = [ 1.160 1.785 0.322 1.437 1.695 1.770 1.209 0.479 1.122 0.974 ...
0.290 1.155 0.218 1.595 1.053 1.058 1.282 1.278 1.066 0.725 ...
0.113 1.516 1.329 1.907 0.101 0.387 1.392 0.613 0.692 1.397 ...
1.627 0.417 1.079 0.607 0.899 0.493 0.381 1.660 0.233 0.718 ...
1.376 1.395 1.557 1.610 1.632 0.851 1.824 0.921 0.139 0.618 ...
0.050 0.956 0.669 1.109 1.882 1.462 1.465 0.201 1.036 1.127 ...
0.907 0.876 1.199 1.667 1.141 0.820 0.488 0.732 0.725 0.753 ...
0.760 1.833 0.074 1.101 0.620 1.858 0.681 0.705 0.876 1.096 ...
1.870 1.597 0.990 0.430 0.410 0.399 1.693 0.492 1.318 0.883 ...
1.291 1.051 1.934 1.314 1.496 0.391 1.079 0.881 0.983 1.306];
y = [ 1.695 1.452 0.997 1.771 1.114 1.624 2.005 0.782 1.870 0.954 ...
1.606 2.059 0.774 0.741 1.040 0.521 2.163 0.818 1.781 1.420 ...
0.558 1.437 2.004 1.325 0.398 0.582 2.047 0.332 1.186 0.890 ...
1.825 1.324 1.334 0.261 0.299 1.733 1.172 1.000 1.663 1.093 ...
1.045 2.022 1.174 0.670 1.143 1.189 0.494 1.275 1.122 1.823];
ntype = int64(1);
[d, z, p, sx, sy, ifail] = g08cd(...
x, y, ntype);
fprintf('Test statistic D = %8.4f\n', d);
fprintf('Z statistic = %8.4f\n', z);
fprintf('Tail probability = %8.4f\n', p);
g08cd example results
Test statistic D = 0.1800
Z statistic = 0.0312
Tail probability = 0.2222
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015