naginterfaces.library.nonpar.test_ks_1sample¶

naginterfaces.library.nonpar.test_ks_1sample(x, dist, par, estima, ntype)[source]¶

test_ks_1sample performs the one sample Kolmogorov–Smirnov test, using one of the distributions provided.

For full information please refer to the NAG Library document for g08cb

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g08/g08cbf.html

Parameters

xfloat, array-like, shape $(n)$

The sample observations $x_{1}, x_{2}, \dots, x_{n}$ .

diststr

The theoretical (null) distribution from which it is suspected the data may arise.

$d i s t = ‘U'$

The uniform distribution over $(a, b)$ .

$d i s t = ‘N'$

The Normal distribution with mean $μ$ and variance $σ^{2}$ .

$d i s t = ‘G'$

The gamma distribution with shape parameter $α$ and scale parameter $β$ , where the mean $= α β$ .

$d i s t = ‘BE'$

The beta distribution with shape parameters $α$ and $β$ , where the mean $= α / (α + β)$ .

$d i s t = ‘BI'$

The binomial distribution with the number of trials, $m$ , and the probability of a success, $p$ .

$d i s t = ‘E'$

The exponential distribution with parameter $λ$ , where the mean $= 1 / λ$ .

$d i s t = ‘P'$

The Poisson distribution with parameter $μ$ , where the mean $= μ$ .

$d i s t = ‘NB'$

The negative binomial distribution with the number of trials, $m$ , and the probability of success, $p$ .

$d i s t = ‘GP'$

The generalized Pareto distribution with shape parameter $ξ$ and scale $β$ .

Any number of characters may be supplied as the actual parameter, however only the characters, maximum $2$ , required to uniquely identify the distribution are referenced.

parfloat, array-like, shape $(2)$

If $e s t i m a = ‘S'$ , $p a r$ must contain the known values of the parameter(s) of the null distribution as follows.

If a uniform distribution is used, $p a r [0]$ and $p a r [1]$ must contain the boundaries $a$ and $b$ respectively.

If a Normal distribution is used, $p a r [0]$ and $p a r [1]$ must contain the mean, $μ$ , and the variance, $σ^{2}$ , respectively.

If a gamma distribution is used, $p a r [0]$ and $p a r [1]$ must contain the parameters $α$ and $β$ respectively.

If a beta distribution is used, $p a r [0]$ and $p a r [1]$ must contain the parameters $α$ and $β$ respectively.

If a binomial distribution is used, $p a r [0]$ and $p a r [1]$ must contain the parameters $m$ and $p$ respectively.

If an exponential distribution is used, $p a r [0]$ must contain the parameter $λ$ .

If a Poisson distribution is used, $p a r [0]$ must contain the parameter $μ$ .

If a negative binomial distribution is used, $p a r [0]$ and $p a r [1]$ must contain the parameters $m$ and $p$ respectively.

If a generalized Pareto distribution is used, $p a r [0]$ and $p a r [1]$ must contain the parameters $ξ$ and $β$ respectively.

If $e s t i m a = ‘E'$ , $p a r$ need not be set except when the null distribution requested is either the binomial or the negative binomial distribution in which case $p a r [0]$ must contain the parameter $m$ .

estimastr, length 1

$e s t i m a$ must specify whether values of the parameters of the null distribution are known or are to be estimated from the data.

$e s t i m a = ‘S'$

Values of the parameters will be supplied in the array $p a r$ described above.

$e s t i m a = ‘E'$

Parameters are to be estimated from the data except when the null distribution requested is the binomial distribution or the negative binomial distribution in which case the first parameter, $m$ , must be supplied in $p a r [0]$ and only the second parameter, $p$ , is estimated from the data.

ntypeint

The test statistic to be calculated, i.e., the choice of alternative hypothesis.

$n t y p e = 1$

Computes $D_{n}$ , to test $H_{0}$ against $H_{1}$ ,

$n t y p e = 2$

Computes $D_{n}^{+}$ , to test $H_{0}$ against $H_{2}$ ,

$n t y p e = 3$

Computes $D_{n}^{-}$ , to test $H_{0}$ against $H_{3}$ .

Returns

parfloat, ndarray, shape $(2)$: If $e s t i m a = ‘S'$ , $p a r$ is unchanged; if $e s t i m a = ‘E'$ , and $d i s t = ‘BI'$ or $d i s t = ‘NB'$ then $p a r [1]$ is estimated from the data; otherwise $p a r [0]$ and $p a r [1]$ are estimated from the data.
dfloat: The Kolmogorov–Smirnov test statistic ( $D_{n}$ , $D_{n}^{+}$ or $D_{n}^{-}$ according to the value of $n t y p e$ ).
zfloat: A standardized value, $Z$ , of the test statistic, $D$ , without any correction for continuity.
pfloat: The probability, $p$ , associated with the observed value of $D$ where $D$ may be $D_{n}, D_{n}^{+}$ or $D_{n}^{-}$ depending on the value of $n t y p e$ (see Notes).
sxfloat, ndarray, shape $(n)$: The sample observations, $x_{1}, x_{2}, \dots, x_{n}$ , sorted in ascending order.

Raises

NagValueError

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 3$ .

(errno $2$ )

On entry, $d i s t = ⟨ v a l u e ⟩$ .

Constraint: $d i s t = ‘U', ‘N', ‘G', ‘BE', ‘BI', ‘E', ‘P', ‘NB' or ‘GP'$ .

(errno $3$ )

On entry, $n t y p e = ⟨ v a l u e ⟩$ .

Constraint: $n t y p e = 1$ , $2$ or $3$ .

(errno $4$ )

On entry, $e s t i m a = ⟨ v a l u e ⟩$ .

Constraint: $e s t i m a = ‘E' or ‘S'$ .

(errno $5$ )

On entry, $e s t i m a = ‘S'$ and $p a r [0] = ⟨ v a l u e ⟩$ ; $p a r [1] = ⟨ v a l u e ⟩$ .

Constraint: for the generalized Pareto distribution with $p a r [0] < 0$ , $0 \leq x [i - 1] \leq - p a r [1] / p a r [0]$ , for $i = 1, 2, \dots, n$ .

(errno $5$ )

On entry, $e s t i m a = ‘S'$ and $p a r [1] = ⟨ v a l u e ⟩$ .

Constraint: for the generalized Pareto distribution, $p a r [1] > 0$ .

(errno $5$ )

On entry, $e s t i m a = ‘S'$ and $p a r [1] = ⟨ v a l u e ⟩$ .

Constraint: for the negative binomial distribution, $0 < p a r [1] < 1$ .

(errno $5$ )

On entry, $d i s t = ‘NB'$ and $m = p a r [0] = ⟨ v a l u e ⟩$ .

Note that $m$ must always be supplied.

Constraint: for the negative binomial distribution, $1 \leq p a r [0] < 1 / e p s$ , where $e p s = machine precision$ , see machine.precision.

(errno $5$ )

On entry, $e s t i m a = ‘S'$ and $p a r [0] = ⟨ v a l u e ⟩$ .

Constraint: for the Poisson distribution, $0 < p a r [0] < 1000000$ .

(errno $5$ )

On entry, $e s t i m a = ‘S'$ and $p a r [0] = ⟨ v a l u e ⟩$ .

Constraint: for the exponential distribution, $p a r [0] > 0$ .

(errno $5$ )

On entry, $e s t i m a = ‘S'$ and $p a r [1] = ⟨ v a l u e ⟩$ .

Constraint: for the binomial distribution, $0 < p a r [1] < 1$ .

(errno $5$ )

On entry, $d i s t = ‘BI'$ and $m = p a r [0] = ⟨ v a l u e ⟩$ .

Note that $m$ must always be supplied.

Constraint: for the binomial distribution, $1 \leq p a r [0] < 1 / e p s$ , where $e p s = machine precision$ , see machine.precision.

(errno $5$ )

On entry, $e s t i m a = ‘S'$ and $p a r [0] = ⟨ v a l u e ⟩$ ; $p a r [1] = ⟨ v a l u e ⟩$ .

Constraint: for the beta distribution, $0 < p a r [0]$ and $p a r [1] \leq 1000000$ .

(errno $5$ )

On entry, $e s t i m a = ‘S'$ and $p a r [0] = ⟨ v a l u e ⟩$ ; $p a r [1] = ⟨ v a l u e ⟩$ .

Constraint: for the gamma distribution, $p a r [0]$ and $p a r [1] > 0$ .

(errno $5$ )

On entry, $e s t i m a = ‘S'$ and $p a r [1] = ⟨ v a l u e ⟩$ .

Constraint: for the Normal distribution, $p a r [1] > 0$ .

(errno $5$ )

On entry, $e s t i m a = ‘S'$ and $p a r [0] = ⟨ v a l u e ⟩$ ; $p a r [1] = ⟨ v a l u e ⟩$ .

Constraint: for the uniform distribution, $p a r [0] < p a r [1]$ .

(errno $6$ )

On entry, $d i s t = ‘GP'$ and $e s t i m a = ‘E'$ .

The parameter estimates are invalid; the data may not be from the generalized Pareto distribution.

(errno $6$ )

On entry, $d i s t = ‘E' or ‘P'$ and all observations are zero.

Constraint: at least one $x [i - 1] > 0$ , for $i = 1, 2, \dots, n$ .

(errno $6$ )

On entry, $d i s t = ‘BI'$ and all observations are zero or $m$ .

Constraint: at least one $0.0 < x [i - 1] < p a r [0]$ , for $i = 1, 2, \dots, n$ .

(errno $6$ )

On entry, $d i s t = ‘G', ‘E', ‘P', ‘NB' or ‘GP'$ and at least one observation is negative.

Constraint: $x [i - 1] \geq 0$ , for $i = 1, 2, \dots, n$ .

(errno $6$ )

On entry, $d i s t = ‘BI'$ and at least one observation is illegal.

Constraint: $0 \leq x [i - 1] \leq p a r [0]$ , for $i = 1, 2, \dots, n$ .

(errno $6$ )

On entry, $d i s t = ‘BE'$ and at least one observation is illegal.

Constraint: $0 \leq x [i - 1] \leq 1$ , for $i = 1, 2, \dots, n$ .

(errno $6$ )

On entry, $d i s t = ‘U'$ and at least one observation is illegal.

Constraint: $p a r [0] \leq x [i - 1] \leq p a r [1]$ , for $i = 1, 2, \dots, n$ .

(errno $7$ )

On entry, $d i s t = ‘U', ‘N', ‘G', ‘BE' or ‘GP'$ , $e s t i m a = ‘E'$ and the whole sample is constant. Thus the variance is zero.

(errno $8$ )

On entry, $d i s t = ‘NB'$ , $p a r [0] = ⟨ v a l u e ⟩$ , $p a r [1] = ⟨ v a l u e ⟩$ .

The variance $p a r [0] \times (1 - p a r [1]) / (p a r [1] \times p a r [1])$ exceeds 1000000.

(errno $8$ )

On entry, $d i s t = ‘BI'$ , $p a r [0] = ⟨ v a l u e ⟩$ , $p a r [1] = ⟨ v a l u e ⟩$ .

The variance $p a r [0] \times p a r [1] \times (1 - p a r [1])$ exceeds 1000000.

(errno $9$ )

On entry, $d i s t = ‘G'$ and in the computation of the incomplete gamma function by specfun.gamma_incomplete the convergence of the Taylor series or Legendre continued fraction fails within $600$ iterations.

Notes

The data consist of a single sample of $n$ observations denoted by $x_{1}, x_{2}, \dots, x_{n}$ . Let $S_{n} (x_{(i)})$ and $F_{0} (x_{(i)})$ represent the sample cumulative distribution function and the theoretical (null) cumulative distribution function respectively at the point $x_{(i)}$ where $x_{(i)}$ is the $i$ th smallest sample observation.

The Kolmogorov–Smirnov test provides a test of the null hypothesis $H_{0}$ : the data are a random sample of observations from a theoretical distribution specified by you against one of the following alternative hypotheses:

$H_{1}$ : the data cannot be considered to be a random sample from the specified null distribution.
$H_{2}$ : the data arise from a distribution which dominates the specified null distribution. In practical terms, this would be demonstrated if the values of the sample cumulative distribution function $S_{n} (x)$ tended to exceed the corresponding values of the theoretical cumulative distribution function $F_{0} (x)$ .
$H_{3}$ : the data arise from a distribution which is dominated by the specified null distribution. In practical terms, this would be demonstrated if the values of the theoretical cumulative distribution function $F_{0} (x)$ tended to exceed the corresponding values of the sample cumulative distribution function $S_{n} (x)$ .

One of the following test statistics is computed depending on the particular alternative null hypothesis specified (see the description of the argument $n t y p e$ in Parameters).

For the alternative hypothesis $H_{1}$ .

$D_{n}$ – the largest absolute deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally $D_{n} = m a x {D_{n}^{+}, D_{n}^{-}}$ .

For the alternative hypothesis $H_{2}$ .

$D_{n}^{+}$ – the largest positive deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally $D_{n}^{+} = m a x {S_{n} (x_{(i)}) - F_{0} (x_{(i)}), 0}$ for both discrete and continuous null distributions.

For the alternative hypothesis $H_{3}$ .

$D_{n}^{-}$ – the largest positive deviation between the theoretical cumulative distribution function and the sample cumulative distribution function. Formally if the null distribution is discrete then $D_{n}^{-} = m a x {F_{0} (x_{(i)}) - S_{n} (x_{(i)}), 0}$ and if the null distribution is continuous then $D_{n}^{-} = m a x {F_{0} (x_{(i)}) - S_{n} (x_{(i - 1)}), 0}$ .

The standardized statistic $Z = D \times \sqrt{n}$ is also computed where $D$ may be $D_{n}, D_{n}^{+}$ or $D_{n}^{-}$ depending on the choice of the alternative hypothesis. This is the standardized value of $D$ with no correction for continuity applied and the distribution of $Z$ converges asymptotically to a limiting distribution, first derived by Kolmogorov (1933), and then tabulated by Smirnov (1948). The asymptotic distributions for the one-sided statistics were obtained by Smirnov (1933).

The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed. If $n \leq 100$ an exact method given by Conover (1980), is used. Note that the method used is only exact for continuous theoretical distributions and does not include Conover’s modification for discrete distributions. This method computes the one-sided probabilities. The two-sided probabilities are estimated by doubling the one-sided probability. This is a good estimate for small $p$ , that is $p \leq 0.10$ , but it becomes very poor for larger $p$ . If $n > 100$ then $p$ is computed using the Kolmogorov–Smirnov limiting distributions, see Feller (1948), Kendall and Stuart (1973), Kolmogorov (1933), Smirnov (1933) and Smirnov (1948).

References

Conover, W J, 1980, Practical Nonparametric Statistics, Wiley

Feller, W, 1948, On the Kolmogorov–Smirnov limit theorems for empirical distributions, Ann. Math. Statist. (19), 179–181

Kendall, M G and Stuart, A, 1973, The Advanced Theory of Statistics (Volume 2), (3rd Edition), Griffin

Kolmogorov, A N, 1933, Sulla determinazione empirica di una legge di distribuzione, Giornale dell’ Istituto Italiano degli Attuari (4), 83–91

Siegel, S, 1956, Non-parametric Statistics for the Behavioral Sciences, McGraw–Hill

Smirnov, N, 1933, Estimate of deviation between empirical distribution functions in two independent samples, Bull. Moscow Univ. (2(2)), 3–16

Smirnov, N, 1948, Table for estimating the goodness of fit of empirical distributions, Ann. Math. Statist. (19), 279–281

NAG and Python

Return to Front

naginterfaces.library.nonpar.test_ks_1sample¶

naginterfaces.library.nonpar.test_​ks_​1sample¶

naginterfaces.library.nonpar.test_ks_1sample¶