naginterfaces.library.nonpar.test_ks_1sample_user¶

naginterfaces.library.nonpar.test_ks_1sample_user(x, cdf, ntype, data=None)[source]¶

test_ks_1sample_user performs the one sample Kolmogorov–Smirnov distribution test, using a user-specified distribution.

For full information please refer to the NAG Library document for g08cc

https://support.nag.com/numeric/nl/nagdoc_31/flhtml/g08/g08ccf.html

Parameters

xfloat, array-like, shape $(n)$

The sample observations, $x_{1}, x_{2}, \dots, x_{n}$ .

cdfcallable retval = cdf(x, data=None)

$c d f$ must return the value of the theoretical (null) cumulative distribution function for a given value of its argument.

Parameters

xfloat: The argument for which $c d f$ must be evaluated.
dataarbitrary, optional, modifiable in place: User-communication data for callback functions.

Returns

retvalfloat: The value of the theoretical (null) cumulative distribution function evaluated at $x$ .

ntypeint

The statistic to be calculated, i.e., the choice of alternative hypothesis.

$n t y p e = 1$

Computes $D_{n}$ , to test $H_{0}$ against $H_{1}$ .

$n t y p e = 2$

Computes $D_{n}^{+}$ , to test $H_{0}$ against $H_{2}$ .

$n t y p e = 3$

Computes $D_{n}^{-}$ , to test $H_{0}$ against $H_{3}$ .

dataarbitrary, optional

User-communication data for callback functions.

Returns

dfloat: The Kolmogorov–Smirnov test statistic ( $D_{n}$ , $D_{n}^{+}$ or $D_{n}^{-}$ according to the value of $n t y p e$ ).
zfloat: A standardized value, $Z$ , of the test statistic, $D$ , without the continuity correction applied.
pfloat: The probability, $p$ , associated with the observed value of $D$ , where $D$ may $D_{n}$ , $D_{n}^{+}$ or $D_{n}^{-}$ depending on the value of $n t y p e$ (see Notes).

Raises

NagValueError

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 1$ .

(errno $2$ )

On entry, $n t y p e = ⟨ v a l u e ⟩$ .

Constraint: $n t y p e = 1$ , $2$ or $3$ .

(errno $3$ )

On entry, at $x = ⟨ v a l u e ⟩$ , $F_{0} (x) = ⟨ v a l u e ⟩$ .

Constraint: $0.0 \leq F_{0} (x) \leq 1$ .

(errno $4$ )

On entry, at $x = ⟨ v a l u e ⟩$ , $F_{0} (x) = ⟨ v a l u e ⟩$ and at $y = ⟨ v a l u e ⟩$ , $F_{0} (y) = ⟨ v a l u e ⟩$

Constraint: when $x < y$ , $F_{0} (x) \leq F_{0} (y)$ .

Notes

No equivalent traditional C interface for this routine exists in the NAG Library.

The data consists of a single sample of $n$ observations, denoted by $x_{1}, x_{2}, \dots, x_{n}$ . Let $S_{n} (x_{(i)})$ and $F_{0} (x_{(i)})$ represent the sample cumulative distribution function and the theoretical (null) cumulative distribution function respectively at the point $x_{(i)}$ , where $x_{(i)}$ is the $i$ th smallest sample observation.

The Kolmogorov–Smirnov test provides a test of the null hypothesis $H_{0}$ : the data are a random sample of observations from a theoretical distribution specified by you (in $c d f$ ) against one of the following alternative hypotheses.

$H_{1}$ : the data cannot be considered to be a random sample from the specified null distribution.
$H_{2}$ : the data arise from a distribution which dominates the specified null distribution. In practical terms, this would be demonstrated if the values of the sample cumulative distribution function $S_{n} (x)$ tended to exceed the corresponding values of the theoretical cumulative distribution function $F_{0 (x)}$ .
$H_{3}$ : the data arise from a distribution which is dominated by the specified null distribution. In practical terms, this would be demonstrated if the values of the theoretical cumulative distribution function $F_{0} (x)$ tended to exceed the corresponding values of the sample cumulative distribution function $S_{n} (x)$ .

One of the following test statistics is computed depending on the particular alternative hypothesis specified (see the description of the argument $n t y p e$ in Parameters).

For the alternative hypothesis $H_{1}$ :

$D_{n}$ – the largest absolute deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally $D_{n} = m a x {D_{n}^{+}, D_{n}^{-}}$ .

For the alternative hypothesis $H_{2}$ :

$D_{n}^{+}$ – the largest positive deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally $D_{n}^{+} = m a x {S_{n} (x_{(i)}) - F_{0} (x_{(i)}), 0}$ .

For the alternative hypothesis $H_{3}$ :

$D_{n}^{-}$ – the largest positive deviation between the theoretical cumulative distribution function and the sample cumulative distribution function. Formally $D_{n}^{-} = m a x {F_{0} (x_{(i)}) - S_{n} (x_{(i - 1)}), 0}$ . This is only true for continuous distributions. See Further Comments for comments on discrete distributions.

The standardized statistic, $Z = D \times \sqrt{n}$ , is also computed, where $D$ may be $D_{n}, D_{n}^{+}$ or $D_{n}^{-}$ depending on the choice of the alternative hypothesis. This is the standardized value of $D$ with no continuity correction applied and the distribution of $Z$ converges asymptotically to a limiting distribution, first derived by Kolmogorov (1933), and then tabulated by Smirnov (1948). The asymptotic distributions for the one-sided statistics were obtained by Smirnov (1933).

The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed. If $n \leq 100$ , an exact method given by Conover (1980) is used. Note that the method used is only exact for continuous theoretical distributions and does not include Conover’s modification for discrete distributions. This method computes the one-sided probabilities. The two-sided probabilities are estimated by doubling the one-sided probability. This is a good estimate for small $p$ , that is $p \leq 0.10$ , but it becomes very poor for larger $p$ . If $n > 100$ then $p$ is computed using the Kolmogorov–Smirnov limiting distributions; see Feller (1948), Kendall and Stuart (1973), Kolmogorov (1933), Smirnov (1933) and Smirnov (1948).

References

Conover, W J, 1980, Practical Nonparametric Statistics, Wiley

Feller, W, 1948, On the Kolmogorov–Smirnov limit theorems for empirical distributions, Ann. Math. Statist. (19), 179–181

Kendall, M G and Stuart, A, 1973, The Advanced Theory of Statistics (Volume 2), (3rd Edition), Griffin

Kolmogorov, A N, 1933, Sulla determinazione empirica di una legge di distribuzione, Giornale dell’ Istituto Italiano degli Attuari (4), 83–91

Siegel, S, 1956, Non-parametric Statistics for the Behavioral Sciences, McGraw–Hill

Smirnov, N, 1933, Estimate of deviation between empirical distribution functions in two independent samples, Bull. Moscow Univ. (2(2)), 3–16

Smirnov, N, 1948, Table for estimating the goodness of fit of empirical distributions, Ann. Math. Statist. (19), 279–281

NAG and Python

Return to Front

naginterfaces.library.nonpar.test_ks_1sample_user¶

naginterfaces.library.nonpar.test_​ks_​1sample_​user¶

naginterfaces.library.nonpar.test_ks_1sample_user¶