naginterfaces.library.nonpar.test_ks_2sample¶

naginterfaces.library.nonpar.test_ks_2sample(x, y, ntype)[source]¶

test_ks_2sample performs the two sample Kolmogorov–Smirnov distribution test.

For full information please refer to the NAG Library document for g08cd

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g08/g08cdf.html

Parameters

xfloat, array-like, shape $(n1)$

The observations from the first sample, $x_{1}, x_{2}, \dots, x_{n_{1}}$ .

yfloat, array-like, shape $(n2)$

The observations from the second sample, $y_{1}, y_{2}, \dots, y_{n_{2}}$ .

ntypeint

The statistic to be computed, i.e., the choice of alternative hypothesis.

$n t y p e = 1$

Computes $D_{n_{1} n_{2}}$ , to test against $H_{1}$ .

$n t y p e = 2$

Computes $D_{n_{1} n_{2}}^{+}$ , to test against $H_{2}$ .

$n t y p e = 3$

Computes $D_{n_{1} n_{2}}^{-}$ , to test against $H_{3}$ .

Returns

dfloat: The Kolmogorov–Smirnov test statistic ( $D_{n_{1} n_{2}}$ , $D_{n_{1} n_{2}}^{+}$ or $D_{n_{1} n_{2}}^{-}$ according to the value of $n t y p e$ ).
zfloat: A standardized value, $Z$ , of the test statistic, $D$ , without any correction for continuity.
pfloat: The tail probability associated with the observed value of $D$ , where $D$ may be $D_{n_{1}, n_{2}}, D_{n_{1}, n_{2}}^{+}$ or $D_{n_{1}, n_{2}}^{-}$ depending on the value of $n t y p e$ (see Notes).
sxfloat, ndarray, shape $(n1)$: The observations from the first sample sorted in ascending order.
syfloat, ndarray, shape $(n2)$: The observations from the second sample sorted in ascending order.

Raises

NagValueError

(errno $1$ )

On entry, $n2 = ⟨ v a l u e ⟩$ .

Constraint: $n2 \geq 1$ .

(errno $1$ )

On entry, $n1 = ⟨ v a l u e ⟩$ .

Constraint: $n1 \geq 1$ .

(errno $2$ )

On entry, $n t y p e = ⟨ v a l u e ⟩$ .

Constraint: $n t y p e = 1$ , $2$ or $3$ .

Warns

NagAlgorithmicWarning

(errno $3$ ): The iterative process used in the approximation of the probability for large $n_{1}$ and $n_{2}$ did not converge. For the two sided test $p = 1$ is returned. For the one-sided test $p = 0.5$ is returned.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

The data consists of two independent samples, one of size $n_{1}$ , denoted by $x_{1}, x_{2}, \dots, x_{n_{1}}$ , and the other of size $n_{2}$ denoted by $y_{1}, y_{2}, \dots, y_{n_{2}}$ . Let $F (x)$ and $G (x)$ represent their respective, unknown, distribution functions. Also let $S_{1} (x)$ and $S_{2} (x)$ denote the values of the sample cumulative distribution functions at the point $x$ for the two samples respectively.

The Kolmogorov–Smirnov test provides a test of the null hypothesis $H_{0}$ : $F (x) = G (x)$ against one of the following alternative hypotheses:

$H_{1}$ : $F (x) \neq G (x)$ .
$H_{2}$ : $F (x) > G (x)$ . This alternative hypothesis is sometimes stated as, ‘The $x$ ’s tend to be smaller than the $y$ ’s’, i.e., it would be demonstrated in practical terms if the values of $S_{1} (x)$ tended to exceed the corresponding values of $S_{2} (x)$ .
$H_{3}$ : $F (x) < G (x)$ . This alternative hypothesis is sometimes stated as, ‘The $x$ ’s tend to be larger than the $y$ ’s’, i.e., it would be demonstrated in practical terms if the values of $S_{2} (x)$ tended to exceed the corresponding values of $S_{1} (x)$ .

One of the following test statistics is computed depending on the particular alternative null hypothesis specified (see the description of the argument $n t y p e$ in Parameters).

For the alternative hypothesis $H_{1}$ .

$D_{n_{1}, n_{2}}$ – the largest absolute deviation between the two sample cumulative distribution functions.

For the alternative hypothesis $H_{2}$ .

$D_{n_{1}, n_{2}}^{+}$ – the largest positive deviation between the sample cumulative distribution function of the first sample, $S_{1} (x)$ , and the sample cumulative distribution function of the second sample, $S_{2} (x)$ . Formally $D_{n_{1}, n_{2}}^{+} = m a x {S_{1} (x) - S_{2} (x), 0}$ .

For the alternative hypothesis $H_{3}$ .

$D_{n_{1}, n_{2}}^{-}$ – the largest positive deviation between the sample cumulative distribution function of the second sample, $S_{2} (x)$ , and the sample cumulative distribution function of the first sample, $S_{1} (x)$ . Formally $D_{n_{1}, n_{2}}^{-} = m a x {S_{2} (x) - S_{1} (x), 0}$ .

test_ks_2sample also returns the standardized statistic $Z = \sqrt{\frac{n_{1} + n_{2}}{n_{1} n_{2}}} \times D$ , where $D$ may be $D_{n_{1}, n_{2}}$ , $D_{n_{1}, n_{2}}^{+}$ or $D_{n_{1}, n_{2}}^{-}$ depending on the choice of the alternative hypothesis. The distribution of this statistic converges asymptotically to a distribution given by Smirnov as $n_{1}$ and $n_{2}$ increase; see Feller (1948), Kendall and Stuart (1973), Kim and Jenrich (1973), Smirnov (1933) and Smirnov (1948).

The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed. If $m a x (n_{1}, n_{2}) \leq 2500$ and $n_{1} n_{2} \leq 10000$ then an exact method given by Kim and Jenrich (see Kim and Jenrich (1973)) is used. Otherwise $p$ is computed using the approximations suggested by Kim and Jenrich (1973). Note that the method used is only exact for continuous theoretical distributions. This method computes the two-sided probability. The one-sided probabilities are estimated by halving the two-sided probability. This is a good estimate for small $p$ , that is $p \leq 0.10$ , but it becomes very poor for larger $p$ .

References

Conover, W J, 1980, Practical Nonparametric Statistics, Wiley

Feller, W, 1948, On the Kolmogorov–Smirnov limit theorems for empirical distributions, Ann. Math. Statist. (19), 179–181

Kendall, M G and Stuart, A, 1973, The Advanced Theory of Statistics (Volume 2), (3rd Edition), Griffin

Kim, P J and Jenrich, R I, 1973, Tables of exact sampling distribution of the two sample Kolmogorov–Smirnov criterion $D_{m n} (m < n)$ , Selected Tables in Mathematical Statistics (1), 80–129, American Mathematical Society

Siegel, S, 1956, Non-parametric Statistics for the Behavioral Sciences, McGraw–Hill

Smirnov, N, 1933, Estimate of deviation between empirical distribution functions in two independent samples, Bull. Moscow Univ. (2(2)), 3–16

Smirnov, N, 1948, Table for estimating the goodness of fit of empirical distributions, Ann. Math. Statist. (19), 279–281

NAG and Python

Return to Front

naginterfaces.library.nonpar.test_ks_2sample¶

naginterfaces.library.nonpar.test_​ks_​2sample¶

naginterfaces.library.nonpar.test_ks_2sample¶