NAG CL Interface
g08cdc (test_​ks_​2sample)

1 Purpose

g08cdc performs the two sample Kolmogorov–Smirnov distribution test.

2 Specification

#include <nag.h>
void  g08cdc (Integer n1, const double x[], Integer n2, const double y[], Nag_TestStatistics dtype, double *d, double *z, double *p, NagError *fail)
The function may be called by the names: g08cdc, nag_nonpar_test_ks_2sample or nag_2_sample_ks_test.

3 Description

The data consist of two independent samples, one of size n 1 , denoted by x 1 , x 2 , , x n 1 , and the other of size n 2 denoted by y 1 , y 2 , , y n 2 . Let F x and G x represent their respective, unknown, distribution functions. Also let S 1 x and S 2 x denote the values of the sample cumulative distribution functions at the point x for the two samples respectively.
The Kolmogorov–Smirnov test provides a test of the null hypothesis H 0 : F x = G x against one of the following alternative hypotheses:
  1. (i) H 1 : F x G x .
  2. (ii) H 2 : F x > G x . This alternative hypothesis is sometimes stated as, ‘The x 's tend to be smaller than the y 's’, i.e., it would be demonstrated in practical terms if the values of S 1 x tended to exceed the corresponding values of S 2 x .
  3. (iii) H 3 : F x < G x . This alternative hypothesis is sometimes stated as, ‘The x 's tend to be larger than the y 's’, i.e., it would be demonstrated in practical terms if the values of S 2 x tended to exceed the corresponding values of S 1 x .
One of the following test statistics is computed depending on the particular alternative null hypothesis specified (see the description of the argument dtype in Section 5).
For the alternative hypothesis H 1 .
For the alternative hypothesis H 2 .
For the alternative hypothesis H 3 .
g08cdc also returns the standardized statistic Z = n 1 + n 2 / n 1 n 2 × D where D may be D n 1 , n 2 , D n 1 , n 2 + or D n 1 , n 2 - depending on the choice of the alternative hypothesis. The distribution of this statistic converges asymptotically to a distribution given by Smirnov as n 1 and n 2 increase (see Feller (1948), Kendall and Stuart (1973), Kim and Jenrich (1973), Smirnov (1933) or Smirnov (1948)).
The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed. If max n 1 , n 2 2500 and n 1 n 2 10000 then an exact method given by Kim and Jenrich is used. Otherwise p is computed using the approximations suggested by Kim and Jenrich (see Kim and Jenrich (1973)). Note that the method used is only exact for continuous theoretical distributions. This method computes the two-sided probability. The one-sided probabilities are estimated by halving the two-sided probability. This is a good estimate for small p , that is p0.10 , but it becomes very poor for larger p .

4 References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov–Smirnov limit theorems for empirical distributions Ann. Math. Statist. 19 179–181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Kim P J and Jenrich R I (1973) Tables of exact sampling distribution of the two sample Kolmogorov–Smirnov criterion Dmnm<n Selected Tables in Mathematical Statistics 1 80–129 American Mathematical Society
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill
Smirnov N (1933) Estimate of deviation between empirical distribution functions in two independent samples Bull. Moscow Univ. 2(2) 3–16
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions Ann. Math. Statist. 19 279–281

5 Arguments

1: n1 Integer Input
On entry: the number of observations in the first sample, n 1 .
Constraint: n11 .
2: x[n1] const double Input
On entry: the observations from the first sample, x 1 , x 2 , , x n 1 .
3: n2 Integer Input
On entry: the number of observations in the second sample, n 2 .
Constraint: n21 .
4: y[n2] const double Input
On entry: the observations from the second sample, y 1 , y 2 , , y n 2 .
5: dtype Nag_TestStatistics Input
On entry: the statistic to be computed, i.e., the choice of alternative hypothesis.
dtype=Nag_TestStatisticsDAbs
Computes D n 1 n 2 , to test against H 1 .
dtype=Nag_TestStatisticsDPos
Computes D n 1 n 2 + , to test against H 2 .
dtype=Nag_TestStatisticsDNeg
Computes D n 1 n 2 - , to test against H 3 .
Constraint: dtype=Nag_TestStatisticsDAbs, Nag_TestStatisticsDPos or Nag_TestStatisticsDNeg.
6: d double * Output
On exit: the Kolmogorov–Smirnov test statistic ( D n 1 n 2 , D n 1 n 2 + or D n 1 n 2 - according to the value of dtype).
7: z double * Output
On exit: a standardized value, Z , of the test statistic, D , without any correction for continuity.
8: p double * Output
On exit: the tail probability associated with the observed value of D , where D may be D n 1 , n 2 , D n 1 , n 2 + or D n 1 , n 2 - depending on the value of dtype (see Section 3).
9: fail NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_BAD_PARAM
On entry, argument dtype had an illegal value.
NE_G08CD_CONV
The iterative procedure used in the approximation of the probability for large n1 and n2 did not converge. For the two-sided test, p=1.0 is returned. For the one-sided test, p=0.5 is returned.
NE_INT_ARG_LT
On entry, n1 must not be less than 1: n1=value .
On entry, n2 must not be less than 1: n2=value .
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.

7 Accuracy

The large sample distributions used as approximations to the exact distribution should have a relative error of less than 5% for most cases.

8 Parallelism and Performance

g08cdc is not threaded in any implementation.

9 Further Comments

The time taken by g08cdc increases with n 1 and n 2 , until n 1 n 2 > 10000 or max n 1 , n 2 2500 . At this point one of the approximations is used and the time decreases significantly. The time then increases again modestly with n 1 and n 2 .

10 Example

The following example computes the two-sided Kolmogorov–Smirnov test statistic for two independent samples of size 100 and 50 respectively. The first sample is from a uniform distribution U 0,2 . The second sample is from a uniform distribution U 0.25,2.25 . The test statistic, D n 1 , n 2 , the standardized test statistic, Z , and the tail probability, p , are computed and printed.

10.1 Program Text

Program Text (g08cdce.c)

10.2 Program Data

Program Data (g08cdce.d)

10.3 Program Results

Program Results (g08cdce.r)