# -*- coding: utf-8 -*-
r"""
Module Summary
--------------
Interfaces for the NAG Mark 30.1 `nonpar` Chapter.
``nonpar`` - Nonparametric Statistics
The functions in this module perform nonparametric statistical tests which are based on distribution-free methods of analysis.
For convenience, the module contents are divided into five types of test: tests of location, tests of dispersion, tests of distribution, tests of association and correlation, and tests of randomness.
There are also functions to fit linear regression models using the ranks of the observations.
The emphasis in this module is on testing; if you wish to compute nonparametric correlations you are referred to submodule :mod:`~naginterfaces.library.correg`, which contains several functions for that purpose.
There are a large number of nonparametric tests available.
A selection of some of the more commonly used tests are included in this module.
Functionality Index
-------------------
**Regression using ranks**
right-censored data: :meth:`rank_regsn_censored`
uncensored data: :meth:`rank_regsn`
**Tests of association and correlation**
Kendall's coefficient of concordance: :meth:`concordance_kendall`
**Tests of dispersion**
Mood's and David's tests on two samples of unequal size: :meth:`test_mooddavid`
**Tests of fit**
:math:`\chi^2` goodness-of-fit test: :meth:`test_chisq`
:math:`A^2` and its probability of for a fully-unspecified Normal distribution: :meth:`gofstat_anddar_normal`
:math:`A^2` and its probability of for an unspecified exponential distribution: :meth:`gofstat_anddar_exp`
:math:`A^2` and its probability of for uniformly distributed data: :meth:`gofstat_anddar_unif`
Anderson--Darling test statistic :math:`A^2`: :meth:`gofstat_anddar`
Kolmogorov--Smirnov one-sample distribution test
for a user-supplied distribution: :meth:`test_ks_1sample_user`
for standard distributions: :meth:`test_ks_1sample`
Kolmogorov--Smirnov two-sample distribution test: :meth:`test_ks_2sample`
**Tests of location**
Cochran :math:`Q` test on cross-classified binary data: :meth:`test_cochranq`
exact probabilities for Mann--Whitney :math:`U` statistic
no ties in pooled sample: :meth:`prob_mwu_noties`
ties in pooled sample: :meth:`prob_mwu_ties`
Friedman two-way analysis of variance on :math:`k` matched samples: :meth:`test_friedman`
Kruskal--Wallis one-way analysis of variance on :math:`k` samples of unequal size: :meth:`test_kruskal`
Mann--Whitney :math:`U` test on two samples of possibly unequal size: :meth:`test_mwu`
Median test on two samples of unequal size: :meth:`test_median`
sign test on two paired samples: :meth:`test_sign`
Wilcoxon one sample signed rank test: :meth:`test_wilcoxon`
**Tests of randomness**
Gaps test: :meth:`randtest_gaps`
pairs (serial) test: :meth:`randtest_pairs`
runs up or runs down test: :meth:`randtest_runs`
triplets test: :meth:`randtest_triplets`
For full information please refer to the NAG Library document
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08intro.html
"""
# NAG Copyright 2017-2024.
[docs]def test_sign(x, y):
r"""
``test_sign`` performs the Sign test on two related samples of size :math:`n`.
.. _g08aa-py2-py-doc:
For full information please refer to the NAG Library document for g08aa
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08aaf.html
.. _g08aa-py2-py-parameters:
**Parameters**
**x** : float, array-like, shape :math:`\left(n\right)`
:math:`\mathrm{x}[\textit{i}-1]` and :math:`\mathrm{y}[\textit{i}-1]` must be set to the :math:`\textit{i}`\ th pair of data values, :math:`\left\{x_{\textit{i}},y_{\textit{i}}\right\}`, for :math:`\textit{i} = 1,2,\ldots,n`.
**y** : float, array-like, shape :math:`\left(n\right)`
:math:`\mathrm{x}[\textit{i}-1]` and :math:`\mathrm{y}[\textit{i}-1]` must be set to the :math:`\textit{i}`\ th pair of data values, :math:`\left\{x_{\textit{i}},y_{\textit{i}}\right\}`, for :math:`\textit{i} = 1,2,\ldots,n`.
**Returns**
**isgn** : int
The Sign test statistic, :math:`S`.
**n1** : int
The number of non-tied pairs, :math:`n_1`.
**p** : float
The lower tail probability, :math:`p`, corresponding to :math:`S`.
.. _g08aa-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n \geq 1`.
(`errno` :math:`2`)
On entry, the samples are identical, i.e., :math:`n_1 = 0`.
.. _g08aa-py2-py-notes:
**Notes**
`In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`
The Sign test investigates the median difference between pairs of scores from two matched samples of size :math:`n`, denoted by :math:`\left\{x_{\textit{i}},y_{\textit{i}}\right\}`, for :math:`\textit{i} = 1,2,\ldots,n`.
The hypothesis under test, :math:`H_0`, often called the null hypothesis, is that the medians are the same, and this is to be tested against a one - or two-sided alternative :math:`H_1` (see below).
``test_sign`` computes:
(a) the test statistic :math:`S`, which is the number of pairs for which :math:`x_i < y_i`;
(#) the number :math:`n_1` of non-tied pairs :math:`\left(x_i\neq y_i\right)`;
(#) the lower tail probability :math:`p` corresponding to :math:`S` (adjusted to allow the complement :math:`\left(1-p\right)` to be used in an upper one tailed or a two tailed test). :math:`p` is the probability of observing a value :math:`\text{}\leq S` if :math:`S < \frac{1}{2}n_1`, or of observing a value :math:`\text{} < S` if :math:`S > \frac{1}{2}n_1`, given that :math:`H_0` is true. If :math:`S = \frac{1}{2}n_1`, :math:`p` is set to :math:`0.5`.
Suppose that a significance test of a chosen size :math:`\alpha` is to be performed (i.e., :math:`\alpha` is the probability of rejecting :math:`H_0` when :math:`H_0` is true; typically :math:`\alpha` is a small quantity such as :math:`0.05` or :math:`0.01`).
The returned value of :math:`p` can be used to perform a significance test on the median difference, against various alternative hypotheses :math:`H_1`, as follows
(i) :math:`H_1`: median of :math:`x\neq \text{}` median of :math:`y`. :math:`H_0` is rejected if :math:`2\times \mathrm{min}\left(p, {1-p}\right) < \alpha`.
(#) :math:`H_1`: median of :math:`x > \text{}` median of :math:`y`. :math:`H_0` is rejected if :math:`p < \alpha`.
(#) :math:`H_1`: median of :math:`x < \text{}` median of :math:`y`. :math:`H_0` is rejected if :math:`1-p < \alpha`.
.. _g08aa-py2-py-references:
**References**
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
"""
raise NotImplementedError
[docs]def test_friedman(x):
r"""
``test_friedman`` performs the Friedman two-way analysis of variance by ranks on :math:`k` related samples of size :math:`n`.
.. _g08ae-py2-py-doc:
For full information please refer to the NAG Library document for g08ae
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08aef.html
.. _g08ae-py2-py-parameters:
**Parameters**
**x** : float, array-like, shape :math:`\left(k, n\right)`
:math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to the value, :math:`x_{{\textit{i}\textit{j}}}`, of observation :math:`\textit{j}` in sample :math:`\textit{i}`, for :math:`\textit{j} = 1,2,\ldots,n`, for :math:`\textit{i} = 1,2,\ldots,k`.
**Returns**
**fr** : float
The value of the Friedman test statistic, :math:`F`.
**p** : float
The approximate significance, :math:`p`, of the Friedman test statistic.
.. _g08ae-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n \geq 1`.
(`errno` :math:`3`)
On entry, :math:`k = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`k \geq 2`.
.. _g08ae-py2-py-notes:
**Notes**
`In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`
The Friedman test investigates the score differences between :math:`k` matched samples of size :math:`n`, the scores in the :math:`i`\ th sample being denoted by
.. math::
x_{{i1}},x_{{i2}},\ldots,x_{{in}}\text{.}
(Thus the sample scores may be regarded as a two-way table with :math:`k` rows and :math:`n` columns.) The hypothesis under test, :math:`H_0`, often called the null hypothesis, is that the samples come from the same population, and this is to be tested against the alternative hypothesis :math:`H_1` that they come from different populations.
The test is based on the observed distribution of score rankings between the matched observations in different samples.
The test proceeds as follows
(a) The scores in each column are ranked, :math:`r_{{ij}}` denoting the rank within column :math:`j` of the observation in row :math:`i`. Average ranks are assigned to tied scores.
(#) The ranks are summed over each row to give rank sums :math:`t_{\textit{i}} = \sum_{{j = 1}}^nr_{{\textit{i}j}}`, for :math:`\textit{i} = 1,2,\ldots,k`.
(#) The Friedman test statistic :math:`F` is computed, where
.. math::
F = \frac{12}{{nk\left(k+1\right)}}\sum_{{i = 1}}^k{\left\{t_i-\frac{1}{2}n\left(k+1\right)\right\}}^2\text{.}
``test_friedman`` returns the value of :math:`F`, and also an approximation, :math:`p`, to the significance of this value. (:math:`F` approximately follows a :math:`\chi_{{k-1}}^2` distribution, so large values of :math:`F` imply rejection of :math:`H_0`). :math:`H_0` is rejected by a test of chosen size :math:`\alpha` if :math:`p < \alpha`.
The approximation :math:`p` is acceptable unless :math:`k = 4` and :math:`n < 5`, or :math:`k = 3` and :math:`n < 10`, or :math:`k = 2` and :math:`n < 20`; for :math:`k = 3\text{ or }4`, tables should be consulted (e.g., Siegel (1956)); for :math:`k = 2` the Sign test (see :meth:`test_sign`) or Wilcoxon test (see :meth:`test_wilcoxon`) is in any case more appropriate.
.. _g08ae-py2-py-references:
**References**
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
"""
raise NotImplementedError
[docs]def test_kruskal(x, l):
r"""
``test_kruskal`` performs the Kruskal--Wallis one-way analysis of variance by ranks on :math:`k` independent samples of possibly unequal sizes.
.. _g08af-py2-py-doc:
For full information please refer to the NAG Library document for g08af
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08aff.html
.. _g08af-py2-py-parameters:
**Parameters**
**x** : float, array-like, shape :math:`\left(\textit{lx}\right)`
The elements of :math:`\mathrm{x}` must contain the observations in the :math:`\textit{k}` samples. The first :math:`l_1` elements must contain the scores in the first sample, the next :math:`l_2` those in the second sample, and so on.
**l** : int, array-like, shape :math:`\left(k\right)`
:math:`\mathrm{l}[\textit{i}]` must contain the number of observations :math:`l_{\textit{i}}` in sample :math:`\textit{i}`, for :math:`\textit{i} = 1,2,\ldots,k`.
**Returns**
**h** : float
The value of the Kruskal--Wallis test statistic, :math:`H`.
**p** : float
The approximate significance, :math:`p`, of the Kruskal--Wallis test statistic.
.. _g08af-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`k = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`k \geq 2`.
(`errno` :math:`2`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{l}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{l}[i-1] > 0`.
(`errno` :math:`3`)
On entry, :math:`\sum_i\left(\mathrm{l}[i-1]\right) = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{lx} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\sum_i\left(\mathrm{l}[i-1]\right) = \textit{lx}`.
(`errno` :math:`4`)
On entry, all the observations were equal.
.. _g08af-py2-py-notes:
**Notes**
`In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`
The Kruskal--Wallis test investigates the differences between scores from :math:`k` independent samples of unequal sizes, the :math:`i`\ th sample containing :math:`l_i` observations.
The hypothesis under test, :math:`H_0`, often called the null hypothesis, is that the samples come from the same population, and this is to be tested against the alternative hypothesis :math:`H_1` that they come from different populations.
The test proceeds as follows:
(a) The pooled sample of all the observations is ranked. Average ranks are assigned to tied scores.
(#) The ranks of the observations in each sample are summed, to give the rank sums :math:`R_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,k`.
(#) The Kruskal--Wallis' test statistic :math:`H` is computed as:
.. math::
H = \frac{12}{{N\left(N+1\right)}}\sum_{{i = 1}}^k\frac{R_i^2}{l_i}-3\left(N+1\right)\text{, where }N = \sum_{{i = 1}}^kl_i\text{,}
i.e., :math:`N` is the total number of observations. If there are tied scores, :math:`H` is corrected by dividing by:
.. math::
1-\frac{{\sum \left(t^3-t\right)}}{{N^3-N}}
where :math:`t` is the number of tied scores in a sample and the summation is over all tied samples.
``test_kruskal`` returns the value of :math:`H`, and also an approximation, :math:`p`, to the probability of a value of at least :math:`H` being observed, :math:`H_0` is true. (:math:`H` approximately follows a :math:`\chi_{{k-1}}^2` distribution). :math:`H_0` is rejected by a test of chosen size :math:`\alpha` if :math:`p < \alpha \text{.}` The approximation :math:`p` is acceptable unless :math:`k = 3` and :math:`l_1`, :math:`l_2` or :math:`l_3\leq 5` in which case tables should be consulted (e.g., O of Siegel (1956)) or :math:`k = 2` (in which case the Median test (see :meth:`test_median`) or the Mann--Whitney :math:`U` test (see :meth:`test_mwu`) is more appropriate).
.. _g08af-py2-py-references:
**References**
Moore, P G, Shirley, E A and Edwards, D E, 1972, `Standard Statistical Calculations`, Pitman
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
"""
raise NotImplementedError
[docs]def test_wilcoxon(x, xme, tail, zer):
r"""
``test_wilcoxon`` performs the Wilcoxon signed rank test on a single sample of size :math:`n`.
.. _g08ag-py2-py-doc:
For full information please refer to the NAG Library document for g08ag
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08agf.html
.. _g08ag-py2-py-parameters:
**Parameters**
**x** : float, array-like, shape :math:`\left(n\right)`
The sample observations, :math:`x_1,x_2,\ldots,x_n`.
**xme** : float
The median test value, :math:`X_{\mathrm{med}}`.
**tail** : str, length 1
Indicates the choice of tail probability, and hence the alternative hypothesis.
:math:`\mathrm{tail} = \texttt{'T'}`
A two tailed probability is calculated and the alternative hypothesis is :math:`\mathrm{H}_1`: population median :math:`\text{}\neq X_{\mathrm{med}}`.
:math:`\mathrm{tail} = \texttt{'U'}`
An upper tailed probability is calculated and the alternative hypothesis is :math:`\mathrm{H}_1`: population median :math:`\text{} > X_{\mathrm{med}}`.
:math:`\mathrm{tail} = \texttt{'L'}`
A lower tailed probability is calculated and the alternative hypothesis is :math:`\mathrm{H}_1`: population median :math:`\text{} < X_{\mathrm{med}}`.
**zer** : str, length 1
Indicates whether or not to include the cases where :math:`d_i = 0.0` in the ranking of the :math:`d_i`\ s.
:math:`\mathrm{zer} = \texttt{'Y'}`
All :math:`d_i = 0.0` are included when ranking.
:math:`\mathrm{zer} = \texttt{'N'}`
All :math:`d_i = 0.0`, are ignored, that is all cases where :math:`d_i = 0.0` are removed before ranking.
**Returns**
**w** : float
The Wilcoxon rank sum statistic, :math:`W`, being the sum of the positive ranks.
**wnor** : float
The approximate Normal test statistic, :math:`z`, as described in :ref:`Notes <g08ag-py2-py-notes>`.
**p** : float
The tail probability, :math:`p`, as specified by the argument :math:`\mathrm{tail}`.
**n1** : int
The number of nonzero :math:`d_i`'s, :math:`n_1`.
.. _g08ag-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{zer} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{zer} = \texttt{'Y'}` or :math:`\texttt{'N'}`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{tail} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{tail} = \texttt{'T'}`, :math:`\texttt{'U'}` or :math:`\texttt{'L'}`.
(`errno` :math:`2`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n \geq 1`.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`3`)
All elements of the sample are equal to :math:`\mathrm{xme}`, i.e., :math:`\text{variance} = 0`.
.. _g08ag-py2-py-notes:
**Notes**
`In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`
The Wilcoxon one-sample signed rank test may be used to test whether a particular sample came from a population with a specified median.
It is assumed that the population distribution is symmetric.
The data consists of a single sample of :math:`n` observations denoted by :math:`x_1,x_2,\ldots,x_n`.
This sample may arise from the difference between pairs of observations from two matched samples of equal size taken from two populations, in which case the test may be used to test whether the median of the first population is the same as that of the second population.
The hypothesis under test, :math:`\mathrm{H}_0`, often called the null hypothesis, is that the median is equal to some given value :math:`\left(X_{\mathrm{med}}\right)`, and this is to be tested against an alternative hypothesis :math:`H_1` which is
:math:`H_1`: population median :math:`\text{}\neq X_{\mathrm{med}}`; or
:math:`H_1`: population median :math:`\text{} > X_{\mathrm{med}}`; or
:math:`H_1`: population median :math:`\text{} < X_{\mathrm{med}}`,
using a two tailed, upper tailed or lower tailed probability respectively.
You select the alternative hypothesis by choosing the appropriate tail probability to be computed (see the description of argument :math:`\mathrm{tail}` in :ref:`Parameters <g08ag-py2-py-parameters>`).
The Wilcoxon test differs from the Sign test (see :meth:`test_sign`) in that the magnitude of the scores is taken into account, rather than simply the direction of such scores.
The test procedure is as follows
(a) For each :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`, the signed difference :math:`d_i = x_i-X_{\mathrm{med}}` is found, where :math:`X_{\mathrm{med}}` is a given test value for the median of the sample.
(#) The absolute differences :math:`\left\lvert d_i\right\rvert` are ranked with rank :math:`r_i` and any tied values of :math:`\left\lvert d_i\right\rvert` are assigned the average of the tied ranks. You may choose whether or not to ignore any cases where :math:`d_i = 0` by removing them before or after ranking (see the description of the argument :math:`\mathrm{zer}` in :ref:`Parameters <g08ag-py2-py-parameters>`).
(#) The number of nonzero :math:`d_i` is found.
(#) To each rank is affixed the sign of the :math:`d_i` to which it corresponds. Let :math:`s_i = \mathrm{sign}\left(d_i\right)r_i`.
(#) The sum of the positive-signed ranks, :math:`W = \sum_{{s_i > 0}}s_i = \sum_{{i = 1}}^n\mathrm{max}\left(s_i, 0.0\right)`, is calculated.
``test_wilcoxon`` returns
(a) the test statistic :math:`W`;
(#) the number :math:`n_1` of nonzero :math:`d_i`;
(#) the approximate Normal test statistic :math:`z`, where
.. math::
z = \frac{{\left(W-\frac{{n_1\left(n_1+1\right)}}{4}\right)-\mathrm{sign}\left(W-\frac{{n_1\left(n_1+1\right)}}{4}\right)\times \frac{1}{2}}}{{\sqrt{\frac{1}{4}\sum_{{i = 1}}^ns_i^2}}}\text{;}
(#) the tail probability, :math:`p`, corresponding to :math:`W`, depending on the choice of the alternative hypothesis, :math:`H_1`.
If :math:`n_1\leq 80`, :math:`p` is computed exactly; otherwise, an approximation to :math:`p` is returned based on an approximate Normal statistic corrected for continuity according to the tail specified.
The value of :math:`p` can be used to perform a significance test on the median against the alternative hypothesis.
Let :math:`\alpha` be the size of the significance test (that is, :math:`\alpha` is the probability of rejecting :math:`H_0` when :math:`H_0` is true).
If :math:`p < \alpha` then the null hypothesis is rejected.
Typically :math:`\alpha` might be :math:`0.05` or :math:`0.01`.
.. _g08ag-py2-py-references:
**References**
Conover, W J, 1980, `Practical Nonparametric Statistics`, Wiley
Neumann, N, 1988, `Some procedures for calculating the distributions of elementary nonparametric teststatistics`, Statistical Software Newsletter (14(3)), 120--126
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
"""
raise NotImplementedError
[docs]def test_mwu(x, y, tail):
r"""
``test_mwu`` performs the Mann--Whitney :math:`U` test on two independent samples of possibly unequal size.
.. _g08ah-py2-py-doc:
For full information please refer to the NAG Library document for g08ah
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08ahf.html
.. _g08ah-py2-py-parameters:
**Parameters**
**x** : float, array-like, shape :math:`\left(\textit{n1}\right)`
The first vector of observations, :math:`x_1,x_2,\ldots,x_{n_1}`.
**y** : float, array-like, shape :math:`\left(\textit{n2}\right)`
The second vector of observations. :math:`y_1,y_2,\ldots,y_{n_2}`.
**tail** : str, length 1
Indicates the choice of tail probability, and hence the alternative hypothesis.
:math:`\mathrm{tail} = \texttt{'T'}`
A two tailed probability is calculated and the alternative hypothesis is :math:`H_1:F\left(x\right)\neq G\left(y\right)`.
:math:`\mathrm{tail} = \texttt{'U'}`
An upper tailed probability is calculated and the alternative hypothesis :math:`H_1:F\left(x\right) < G\left(y\right)`, i.e., the :math:`x`'s tend to be greater than the :math:`y`'s.
:math:`\mathrm{tail} = \texttt{'L'}`
A lower tailed probability is calculated and the alternative hypothesis :math:`H_1:F\left(x\right) > G\left(y\right)`, i.e., the :math:`x`'s tend to be less than the :math:`y`'s.
**Returns**
**u** : float
The Mann--Whitney rank sum statistic, :math:`U`.
**unor** : float
The approximate Normal test statistic, :math:`z`, as described in :ref:`Notes <g08ah-py2-py-notes>`.
**p** : float
The tail probability, :math:`p`, as specified by the argument :math:`\mathrm{tail}`.
**ties** : bool
Indicates whether the pooled sample contained ties or not. This will be useful in checking which function to use should one wish to calculate an exact tail probability.
:math:`\mathrm{ties} = \mathbf{False}`, no ties were present (use :meth:`prob_mwu_noties` for an exact probability).
:math:`\mathrm{ties} = \mathbf{True}`, ties were present (use :meth:`prob_mwu_ties` for an exact probability).
**ranks** : float, ndarray, shape :math:`\left(\textit{n1}+\textit{n2}\right)`
Contains the ranks of the pooled sample. The ranks of the first sample are contained in the first :math:`\textit{n1}` elements and those of the second sample are contained in the next :math:`\textit{n2}` elements.
.. _g08ah-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\textit{n2} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{n2}\geq 1`.
(`errno` :math:`1`)
On entry, :math:`\textit{n1} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{n1}\geq 1`.
(`errno` :math:`2`)
On entry, :math:`\mathrm{tail} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{tail} = \texttt{'T'}`, :math:`\texttt{'U'}` or :math:`\texttt{'L'}`.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`3`)
The pooled sample values are all the same, i.e., the variance of :math:`\mathrm{u} = 0.0`.
.. _g08ah-py2-py-notes:
**Notes**
`No equivalent traditional C interface for this routine exists in the NAG Library.`
The Mann--Whitney :math:`U` test investigates the difference between two populations defined by the distribution functions :math:`F\left(x\right)` and :math:`G\left(y\right)` respectively.
The data consist of two independent samples of size :math:`n_1` and :math:`n_2`, denoted by :math:`x_1,x_2,\ldots,x_{n_1}` and :math:`y_1,y_2,\ldots,y_{n_2}`, taken from the two populations.
The hypothesis under test, :math:`H_0`, often called the null hypothesis, is that the two distributions are the same, that is :math:`F\left(x\right) = G\left(x\right)`, and this is to be tested against an alternative hypothesis :math:`H_1` which is
:math:`H_1`: :math:`F\left(x\right)\neq G\left(y\right)`; or
:math:`H_1`: :math:`F\left(x\right) < G\left(y\right)`, i.e., the :math:`x`'s tend to be greater than the :math:`y`'s; or
:math:`H_1`: :math:`F\left(x\right) > G\left(y\right)`, i.e., the :math:`x`'s tend to be less than the :math:`y`'s,
using a two tailed, upper tailed or lower tailed probability respectively.
You select the alternative hypothesis by choosing the appropriate tail probability to be computed (see the description of argument :math:`\mathrm{tail}` in :ref:`Parameters <g08ah-py2-py-parameters>`).
Note that when using this test to test for differences in the distributions one is primarily detecting differences in the location of the two distributions.
That is to say, if we reject the null hypothesis :math:`H_0` in favour of the alternative hypothesis :math:`H_1`: :math:`F\left(x\right) > G\left(y\right)` we have evidence to suggest that the location, of the distribution defined by :math:`F\left(x\right)`, is less than the location, of the distribution defined by :math:`G\left(y\right)`.
The Mann--Whitney :math:`U` test differs from the Median test (see :meth:`test_median`) in that the ranking of the individual scores within the pooled sample is taken into account, rather than simply the position of a score relative to the median of the pooled sample.
It is, therefore, a more powerful test if score differences are meaningful.
The test procedure involves ranking the pooled sample, average ranks being used for ties.
Let :math:`r_{{1i}}` be the rank assigned to :math:`x_i`, :math:`i = 1,2,\ldots,n_1` and :math:`r_{{2j}}` the rank assigned to :math:`y_j`, :math:`j = 1,2,\ldots,n_2`.
Then the test statistic :math:`U` is defined as follows;
.. math::
U = \sum_{{i = 1}}^{n_1}r_{{1i}}-\frac{{n_1\left(n_1+1\right)}}{2}
:math:`U` is also the number of times a score in the second sample precedes a score in the first sample (where we only count a half if a score in the second sample actually equals a score in the first sample).
``test_mwu`` returns:
(a) The test statistic :math:`U`.
(#) The approximate Normal test statistic,
.. math::
z = \frac{{U-\mathrm{mean}\left(U\right)\pm \frac{1}{2}}}{{\sqrt{\mathrm{var}\left(U\right)}}}
where
.. math::
\mathrm{mean}\left(U\right) = \frac{{n_1n_2}}{2}
and
.. math::
\mathrm{var}\left(U\right) = \frac{{n_1n_2\left(n_1+n_2+1\right)}}{12}-\frac{{n_1n_2}}{{\left(n_1+n_2\right)\left(n_1+n_2-1\right)}}\times TS
where
.. math::
TS = \sum_{{j = 1}}^{{\tau }}\frac{{\left(t_j\right)\left(t_j-1\right)\left(t_j+1\right)}}{12}
:math:`\tau` is the number of groups of ties in the sample and :math:`t_j` is the number of ties in the :math:`j`\ th group.
Note that if no ties are present the variance of :math:`U` reduces to :math:`\frac{{n_1n_2}}{12}\left(n_1+n_2+1\right)`.
(#) An indicator as to whether ties were present in the pooled sample or not.
(#) The tail probability, :math:`p`, corresponding to :math:`U` (adjusted to allow the complement to be used in an upper one tailed or a two tailed test), depending on the choice of :math:`\mathrm{tail}`, i.e., the choice of alternative hypothesis, :math:`H_1`. The tail probability returned is an approximation of :math:`p` is based on an approximate Normal statistic corrected for continuity according to the tail specified. If :math:`n_1` and :math:`n_2` are not very large an exact probability may be desired. For the calculation of the exact probability see :meth:`prob_mwu_noties` (no ties in the pooled sample) or :meth:`prob_mwu_ties` (ties in the pooled sample).
The value of :math:`p` can be used to perform a significance test on the null hypothesis :math:`H_0` against the alternative hypothesis :math:`H_1`.
Let :math:`\alpha` be the size of the significance test (that is, :math:`\alpha` is the probability of rejecting :math:`H_0` when :math:`H_0` is true).
If :math:`p < \alpha` then the null hypothesis is rejected.
Typically :math:`\alpha` might be :math:`0.05` or :math:`0.01`.
.. _g08ah-py2-py-references:
**References**
Conover, W J, 1980, `Practical Nonparametric Statistics`, Wiley
Neumann, N, 1988, `Some procedures for calculating the distributions of elementary nonparametric teststatistics`, Statistical Software Newsletter (14(3)), 120--126
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
"""
raise NotImplementedError
[docs]def prob_mwu_noties(n1, n2, tail, u):
r"""
``prob_mwu_noties`` calculates the exact tail probability for the Mann--Whitney rank sum test statistic for the case where there are no ties in the two samples pooled together.
.. _g08aj-py2-py-doc:
For full information please refer to the NAG Library document for g08aj
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08ajf.html
.. _g08aj-py2-py-parameters:
**Parameters**
**n1** : int
The number of non-tied pairs, :math:`\textit{n}_1`.
**n2** : int
The size of the second sample, :math:`\textit{n}_2`.
**tail** : str, length 1
Indicates the choice of tail probability, and hence the alternative hypothesis.
:math:`\mathrm{tail} = \texttt{'T'}`
A two tailed probability is calculated and the alternative hypothesis is :math:`H_1:F\left(x\right)\neq G\left(y\right)`.
:math:`\mathrm{tail} = \texttt{'U'}`
An upper tailed probability is calculated and the alternative hypothesis :math:`H_1:F\left(x\right) < G\left(y\right)`, i.e., the :math:`x`'s tend to be greater than the :math:`y`'s.
:math:`\mathrm{tail} = \texttt{'L'}`
A lower tailed probability is calculated and the alternative hypothesis :math:`H_1:F\left(x\right) > G\left(y\right)`, i.e., the :math:`x`'s tend to be less than the :math:`y`'s.
**u** : float
:math:`U`, the value of the Mann--Whitney rank sum test statistic. This is the statistic returned through the argument :math:`\mathrm{u}` by :meth:`test_mwu`.
**Returns**
**p** : float
The exact tail probability, :math:`p`, as specified by the argument :math:`\mathrm{tail}`.
.. _g08aj-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{n2} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{n2}\geq 1`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{n1} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{n1}\geq 1`.
(`errno` :math:`2`)
On entry, :math:`\mathrm{tail} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{tail} = \texttt{'T'}`, :math:`\texttt{'U'}` or :math:`\texttt{'L'}`.
(`errno` :math:`3`)
On entry, :math:`\mathrm{u} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{u}\geq 0.0`.
(`errno` :math:`4`)
On entry, :math:`\textit{lwrk} = \langle\mathit{\boldsymbol{value}}\rangle` and the minimum workspace is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{lwrk}\geq \left(\mathrm{n1}\times \mathrm{n2}\right)/2+1`.
.. _g08aj-py2-py-notes:
**Notes**
`No equivalent traditional C interface for this routine exists in the NAG Library.`
``prob_mwu_noties`` computes the exact tail probability for the Mann--Whitney :math:`U` test statistic (calculated by :meth:`test_mwu` and returned through the argument :math:`\mathrm{u}`) using a method based on an algorithm developed by Harding (1983), and presented by Neumann (1988), for the case where there are no ties in the pooled sample.
The Mann--Whitney :math:`U` test investigates the difference between two populations defined by the distribution functions :math:`F\left(x\right)` and :math:`G\left(y\right)` respectively.
The data consist of two independent samples of size :math:`n_1` and :math:`n_2`, denoted by :math:`x_1,x_2,\ldots,x_{n_1}` and :math:`y_1,y_2,\ldots,y_{n_2}`, taken from the two populations.
The hypothesis under test, :math:`H_0`, often called the null hypothesis, is that the two distributions are the same, that is :math:`F\left(x\right) = G\left(x\right)`, and this is to be tested against an alternative hypothesis :math:`H_1` which is
:math:`H_1`: :math:`F\left(x\right)\neq G\left(y\right)`; or
:math:`H_1`: :math:`F\left(x\right) < G\left(y\right)`, i.e., the :math:`x`'s tend to be greater than the :math:`y`'s; or
:math:`H_1`: :math:`F\left(x\right) > G\left(y\right)`, i.e., the :math:`x`'s tend to be less than the :math:`y`'s,
using a two tailed, upper tailed or lower tailed probability respectively.
You select the alternative hypothesis by choosing the appropriate tail probability to be computed (see the description of argument :math:`\mathrm{tail}` in :ref:`Parameters <g08aj-py2-py-parameters>`).
Note that when using this test to test for differences in the distributions one is primarily detecting differences in the location of the two distributions.
That is to say, if we reject the null hypothesis :math:`H_0` in favour of the alternative hypothesis :math:`H_1`: :math:`F\left(x\right) > G\left(y\right)` we have evidence to suggest that the location, of the distribution defined by :math:`F\left(x\right)`, is less than the location, of the distribution defined by :math:`G\left(y\right)`.
``prob_mwu_noties`` returns the exact tail probability, :math:`p`, corresponding to :math:`U`, depending on the choice of alternative hypothesis, :math:`H_1`.
The value of :math:`p` can be used to perform a significance test on the null hypothesis :math:`H_0` against the alternative hypothesis :math:`H_1`.
Let :math:`\alpha` be the size of the significance test (that is, :math:`\alpha` is the probability of rejecting :math:`H_0` when :math:`H_0` is true).
If :math:`p < \alpha` then the null hypothesis is rejected.
Typically :math:`\alpha` might be :math:`0.05` or :math:`0.01`.
.. _g08aj-py2-py-references:
**References**
Conover, W J, 1980, `Practical Nonparametric Statistics`, Wiley
Harding, E F, 1983, `An efficient minimal-storage procedure for calculating the Mann--Whitney U, generalised U and similar distributions`, Appl. Statist. (33), 1--6
Neumann, N, 1988, `Some procedures for calculating the distributions of elementary nonparametric teststatistics`, Statistical Software Newsletter (14(3)), 120--126
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
"""
raise NotImplementedError
[docs]def prob_mwu_ties(n1, n2, tail, ranks, u):
r"""
``prob_mwu_ties`` calculates the exact tail probability for the Mann--Whitney rank sum test statistic for the case where there are ties in the two samples pooled together.
.. _g08ak-py2-py-doc:
For full information please refer to the NAG Library document for g08ak
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08akf.html
.. _g08ak-py2-py-parameters:
**Parameters**
**n1** : int
The number of non-tied pairs, :math:`\textit{n}_1`.
**n2** : int
The size of the second sample, :math:`\textit{n}_2`.
**tail** : str, length 1
Indicates the choice of tail probability, and hence the alternative hypothesis.
:math:`\mathrm{tail} = \texttt{'T'}`
A two tailed probability is calculated and the alternative hypothesis is :math:`H_1:F\left(x\right)\neq G\left(y\right)`.
:math:`\mathrm{tail} = \texttt{'U'}`
An upper tailed probability is calculated and the alternative hypothesis :math:`H_1:F\left(x\right) < G\left(y\right)`, i.e., the :math:`x`'s tend to be greater than the :math:`y`'s.
:math:`\mathrm{tail} = \texttt{'L'}`
A lower tailed probability is calculated and the alternative hypothesis :math:`H_1:F\left(x\right) > G\left(y\right)`, i.e., the :math:`x`'s tend to be less than the :math:`y`'s.
**ranks** : float, array-like, shape :math:`\left(\mathrm{n1}+\mathrm{n2}\right)`
The ranks of the pooled sample. These ranks are output in the array :math:`\mathrm{ranks}` by :meth:`test_mwu` and should not be altered in any way if you are using the same :math:`\textit{n}_1`, :math:`\textit{n}_2` and :math:`\mathrm{u}` as used in :meth:`test_mwu`.
**u** : float
:math:`U`, the value of the Mann--Whitney rank sum test statistic. This is the statistic returned through the argument :math:`\mathrm{u}` by :meth:`test_mwu`.
**Returns**
**p** : float
The tail probability, :math:`p`, as specified by the argument :math:`\mathrm{tail}`.
.. _g08ak-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{n2} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{n2}\geq 1`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{n1} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{n1}\geq 1`.
(`errno` :math:`2`)
On entry, :math:`\mathrm{tail} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{tail} = \texttt{'T'}`, :math:`\texttt{'U'}` or :math:`\texttt{'L'}`.
(`errno` :math:`3`)
On entry, :math:`\mathrm{u} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{u}\geq 0.0`.
(`errno` :math:`5`)
On entry, at least one rank, given in :math:`\mathrm{ranks}`, was outside the expected range.
.. _g08ak-py2-py-notes:
**Notes**
`No equivalent traditional C interface for this routine exists in the NAG Library.`
``prob_mwu_ties`` computes the exact tail probability for the Mann--Whitney :math:`U` test statistic (calculated by :meth:`test_mwu` and returned through the argument :math:`\mathrm{u}`) using a method based on an algorithm developed by Neumann (1988), for the case where there are ties in the pooled sample.
The Mann--Whitney :math:`U` test investigates the difference between two populations defined by the distribution functions :math:`F\left(x\right)` and :math:`G\left(y\right)` respectively.
The data consist of two independent samples of size :math:`\textit{n}_1` and :math:`\textit{n}_2`, denoted by :math:`x_1,x_2,\ldots,x_{\textit{n}_1}` and :math:`y_1,y_2,\ldots,y_{\textit{n}_2}`, taken from the two populations.
The hypothesis under test, :math:`H_0`, often called the null hypothesis, is that the two distributions are the same, that is :math:`F\left(x\right) = G\left(x\right)`, and this is to be tested against an alternative hypothesis :math:`H_1` which is
:math:`H_1`: :math:`F\left(x\right)\neq G\left(y\right)`; or
:math:`H_1`: :math:`F\left(x\right) < G\left(y\right)`, i.e., the :math:`x`'s tend to be greater than the :math:`y`'s; or
:math:`H_1`: :math:`F\left(x\right) > G\left(y\right)`, i.e., the :math:`x`'s tend to be less than the :math:`y`'s,
using a two tailed, upper tailed or lower tailed probability respectively.
You select the alternative hypothesis by choosing the appropriate tail probability to be computed (see the description of argument :math:`\mathrm{tail}` in :ref:`Parameters <g08ak-py2-py-parameters>`).
Note that when using this test to test for differences in the distributions one is primarily detecting differences in the location of the two distributions.
That is to say, if we reject the null hypothesis :math:`H_0` in favour of the alternative hypothesis :math:`H_1`: :math:`F\left(x\right) > G\left(y\right)` we have evidence to suggest that the location, of the distribution defined by :math:`F\left(x\right)`, is less than the location of the distribution defined by :math:`G\left(y\right)`.
``prob_mwu_ties`` returns the exact tail probability, :math:`p`, corresponding to :math:`U`, depending on the choice of alternative hypothesis, :math:`H_1`.
The value of :math:`p` can be used to perform a significance test on the null hypothesis :math:`H_0` against the alternative hypothesis :math:`H_1`.
Let :math:`\alpha` be the size of the significance test (that is :math:`\alpha` is the probability of rejecting :math:`H_0` when :math:`H_0` is true).
If :math:`p < \alpha` then the null hypothesis is rejected.
Typically :math:`\alpha` might be :math:`0.05` or :math:`0.01`.
.. _g08ak-py2-py-references:
**References**
Conover, W J, 1980, `Practical Nonparametric Statistics`, Wiley
Neumann, N, 1988, `Some procedures for calculating the distributions of elementary nonparametric teststatistics`, Statistical Software Newsletter (14(3)), 120--126
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
"""
raise NotImplementedError
[docs]def test_cochranq(x):
r"""
``test_cochranq`` performs the Cochran :math:`Q`-test on cross-classified binary data.
.. _g08al-py2-py-doc:
For full information please refer to the NAG Library document for g08al
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08alf.html
.. _g08al-py2-py-parameters:
**Parameters**
**x** : float, array-like, shape :math:`\left(n, k\right)`
The matrix of observed zero-one data. :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the value :math:`x_{{\textit{i}\textit{j}}}`, for :math:`\textit{j} = 1,2,\ldots,k`, for :math:`\textit{i} = 1,2,\ldots,n`.
**Returns**
**q** : float
The value of the Cochran :math:`Q`-test statistic.
**prob** : float
The upper tail probability, :math:`p`, associated with the Cochran :math:`Q`-test statistic, that is the probability of obtaining a value greater than or equal to the observed value (the output value of :math:`\mathrm{q}`).
.. _g08al-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`k = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`k \geq 2`.
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n \geq 2`.
(`errno` :math:`2`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{x}[i-1,j-1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{x}[i-1,j-1] = 0.0` or :math:`1.0`.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`3`)
The solution has failed to converge while calculating the tail probability.
.. _g08al-py2-py-notes:
**Notes**
`No equivalent traditional C interface for this routine exists in the NAG Library.`
Cochran's :math:`Q`-test may be used to test for differences between :math:`k` treatments applied independently to :math:`n` individuals or blocks (:math:`k` related samples of equal size :math:`n`), where the observed response can take only one of two possible values; for example a treatment may result in a 'success' or 'failure'.
The data is recorded as either :math:`1` or :math:`0` to represent this dichotomization.
The use of this 'randomized block design' allows the effect of differences between the blocks to be separated from the differences between the treatments.
The test assumes that the blocks were randomly selected from all possible blocks and that the result may be one of two possible outcomes common to all treatments within blocks.
The null and alternative hypotheses to be tested may be stated as follows.
.. rst-class:: nag-rules-none nag-align-left
+------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|:math:`H_0`:|the treatments are equally effective, that is the probability of obtaining a :math:`1` within a block is the same for each treatment. |
+------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|:math:`H_1`:|there is a difference between the treatments, that is the probability of obtaining a :math:`1` is not the same for different treatments within blocks.|
+------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
The data is often represented in the form of a table with the :math:`n` rows representing the blocks and the :math:`k` columns the treatments.
Let :math:`R_{\textit{i}}` represent the row totals, for :math:`\textit{i} = 1,2,\ldots,n`, and :math:`C_{\textit{j}}` represent the column totals, for :math:`\textit{j} = 1,2,\ldots,k`.
Let :math:`x_{{ij}}` represent the response or result where :math:`x_{{ij}} = 0` or :math:`1`.
[table omitted]
If :math:`p_{{ij}} = \mathrm{Pr}\left(x_{{ij}} = 1\right)`, for :math:`i = 1,2,\ldots,n` and :math:`j = 1,2,\ldots,k`, then the hypotheses may be restated as follows
.. rst-class:: nag-rules-none nag-align-left
+------------+----------------------------------------------------------------------------------------+
|:math:`H_0`:|:math:`p_{{i1}} = p_{{i2}} = \cdots = p_{{ik}}`, for each :math:`i = 1,2,\ldots,n`. |
+------------+----------------------------------------------------------------------------------------+
|:math:`H_1`:|:math:`p_{{ij}}\neq p_{{ik}}`, for some :math:`j` and :math:`k`, and for some :math:`i`.|
+------------+----------------------------------------------------------------------------------------+
The test statistic is defined as
.. math::
Q = \frac{{k\left(k-1\right)\sum_{{j = 1}}^k\left(C_j-\frac{N}{k}\right)^2}}{{\sum_{{i = 1}}^nR_i\left(k-R_i\right)}}\text{.}
When the number of blocks, :math:`n`, is large relative to the number of treatments, :math:`k`, :math:`Q` has an approximate :math:`\chi^2`-distribution with :math:`k-1` degrees of freedom.
This is used to find the probability, :math:`p`, of obtaining a statistic greater than or equal to the computed value of :math:`Q`.
Thus :math:`p` is the upper tail probability associated with the computed value of :math:`Q`, where the :math:`\chi^2`-distribution is used to approximate the true distribution of :math:`Q`.
.. _g08al-py2-py-references:
**References**
Conover, W J, 1980, `Practical Nonparametric Statistics`, Wiley
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
"""
raise NotImplementedError
[docs]def test_mooddavid(x, n1, itest):
r"""
``test_mooddavid`` performs Mood's and David's tests for dispersion differences between two independent samples of possibly unequal size.
.. _g08ba-py2-py-doc:
For full information please refer to the NAG Library document for g08ba
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08baf.html
.. _g08ba-py2-py-parameters:
**Parameters**
**x** : float, array-like, shape :math:`\left(n\right)`
The first :math:`n_1` elements of :math:`\mathrm{x}` must be set to the data values in the first sample, and the next :math:`n_2` (:math:`\text{} = n-n_1`) elements to the data values in the second sample.
**n1** : int
The size of the first sample, :math:`n_1`.
**itest** : int
The test(s) to be carried out.
:math:`\mathrm{itest} = 0`
Both Mood's and David's tests.
:math:`\mathrm{itest} = 1`
David's test only.
:math:`\mathrm{itest} = 2`
Mood's test only.
**Returns**
**r** : float, ndarray, shape :math:`\left(n\right)`
The ranks :math:`r_{\textit{i}}`, assigned to the data values :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`.
**w** : float
Mood's test statistic, :math:`W`, if requested.
**v** : float
David's test statistic, :math:`V`, if requested.
**pw** : float
The lower tail probability, :math:`p_w`, corresponding to the value of :math:`W`, if Mood's test was requested.
**pv** : float
The lower tail probability, :math:`p_v`, corresponding to the value of :math:`V`, if David's test was requested.
.. _g08ba-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n > 2`.
(`errno` :math:`2`)
On entry, :math:`\mathrm{n1} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`1 < \mathrm{n1} < n`.
(`errno` :math:`3`)
On entry, :math:`\mathrm{itest} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{itest} = 0`, :math:`1` or :math:`2`.
.. _g08ba-py2-py-notes:
**Notes**
`No equivalent traditional C interface for this routine exists in the NAG Library.`
Mood's and David's tests investigate the difference between the dispersions of two independent samples of sizes :math:`n_1` and :math:`n_2`, denoted by
.. math::
x_1,x_2,\ldots,x_{n_1}
and
.. math::
x_{{n_1+1}},x_{{n_1+2}},\ldots,x_n\text{, }\quad n = n_1+n_2\text{.}
The hypothesis under test, :math:`H_0`, often called the null hypothesis, is that the dispersion difference is zero, and this is to be tested against a one - or two-sided alternative hypothesis :math:`H_1` (see below).
Both tests are based on the rankings of the sample members within the pooled sample formed by combining both samples.
If there is some difference in dispersion, more of the extreme ranks will tend to be found in one sample than in the other.
Let the rank of :math:`x_{\textit{i}}` be denoted by :math:`r_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`.
(a) Mood's test.
The test statistic :math:`W = {\sum_{{i = 1}}^{n_1}}\left(r_i-\frac{{n+1}}{2}\right)^2` is found.
:math:`W` is the sum of squared deviations from the average rank in the pooled sample.
For large :math:`n`, :math:`W` approaches normality, and so an approximation, :math:`p_w`, to the probability of observing :math:`W` not greater than the computed value, may be found.
``test_mooddavid`` returns :math:`W` and :math:`p_w` if Mood's test is selected.
(#) David's test.
The disadvantage of Mood's test is that it assumes that the means of the two samples are equal.
If this assumption is unjustified a high value of :math:`W` could merely reflect the difference in means.
David's test reduces this effect by using the variance of the ranks of the first sample about their mean rank, rather than the overall mean rank.
The test statistic for David's test is
.. math::
V = \frac{1}{{n_1-1}}\sum_{{i = 1}}^{n_1}\left(r_i-\bar{r}\right)^2
where
.. math::
\bar{r} = \frac{{\sum_{{i = 1}}^{n_1}r_i}}{n_1}\text{.}
For large :math:`n`, :math:`V` approaches normality, enabling an approximate probability :math:`p_v` to be computed, similarly to :math:`p_w`.
``test_mooddavid`` returns :math:`V` and :math:`p_v` if David's test is selected.
Suppose that a significance test of a chosen size :math:`\alpha` is to be performed (i.e., :math:`\alpha` is the probability of rejecting :math:`H_0` when :math:`H_0` is true; typically :math:`\alpha` is a small quantity such as :math:`0.05` or :math:`0.01`).
The returned value :math:`p` (:math:`{} = p_v` or :math:`p_w`) can be used to perform a significance test, against various alternative hypotheses :math:`H_1`, as follows.
(i) :math:`H_1`: dispersions are unequal. :math:`H_0` is rejected if :math:`2\times \mathrm{min}\left(p, {1-p}\right) < \alpha`.
(#) :math:`H_1`: dispersion of sample :math:`1 > \text{}` dispersion of sample :math:`2`. :math:`H_0` is rejected if :math:`1-p < \alpha`.
(#) :math:`H_1`: dispersion of sample :math:`2 > \text{}` dispersion of sample :math:`1`. :math:`H_0` is rejected if :math:`p < \alpha`.
.. _g08ba-py2-py-references:
**References**
Cooper, B E, 1975, `Statistics for Experimentalists`, Pergamon Press
"""
raise NotImplementedError
[docs]def test_ks_1sample(x, dist, par, estima, ntype):
r"""
``test_ks_1sample`` performs the one sample Kolmogorov--Smirnov test, using one of the distributions provided.
.. _g08cb-py2-py-doc:
For full information please refer to the NAG Library document for g08cb
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08cbf.html
.. _g08cb-py2-py-parameters:
**Parameters**
**x** : float, array-like, shape :math:`\left(n\right)`
The sample observations :math:`x_1,x_2,\ldots,x_n`.
**dist** : str
The theoretical (null) distribution from which it is suspected the data may arise.
:math:`\mathrm{dist} = \text{‘U'}`
The uniform distribution over :math:`\left(a, b\right)`.
:math:`\mathrm{dist} = \text{‘N'}`
The Normal distribution with mean :math:`\mu` and variance :math:`\sigma^2`.
:math:`\mathrm{dist} = \text{‘G'}`
The gamma distribution with shape parameter\ :math:`\alpha` and scale parameter :math:`\beta`, where the mean :math:`\text{} = \alpha \beta`.
:math:`\mathrm{dist} = \text{‘BE'}`
The beta distribution with shape parameters :math:`\alpha` and :math:`\beta`, where the mean :math:`\text{} = \alpha /\left(\alpha +\beta \right)`.
:math:`\mathrm{dist} = \text{‘BI'}`
The binomial distribution with the number of trials, :math:`m`, and the probability of a success, :math:`p`.
:math:`\mathrm{dist} = \text{‘E'}`
The exponential distribution with parameter :math:`\lambda`, where the mean :math:`\text{} = 1/\lambda`.
:math:`\mathrm{dist} = \text{‘P'}`
The Poisson distribution with parameter :math:`\mu`, where the mean :math:`\text{} = \mu`.
:math:`\mathrm{dist} = \text{‘NB'}`
The negative binomial distribution with the number of trials, :math:`m`, and the probability of success, :math:`p`.
:math:`\mathrm{dist} = \text{‘GP'}`
The generalized Pareto distribution with shape parameter :math:`\xi` and scale :math:`\beta`.
Any number of characters may be supplied as the actual parameter, however only the characters, maximum :math:`2`, required to uniquely identify the distribution are referenced.
**par** : float, array-like, shape :math:`\left(2\right)`
If :math:`\mathrm{estima} = \text{‘S'}`, :math:`\mathrm{par}` must contain the known values of the parameter(s) of the null distribution as follows.
If a uniform distribution is used, :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1]` must contain the boundaries :math:`a` and :math:`b` respectively.
If a Normal distribution is used, :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1]` must contain the mean, :math:`\mu`, and the variance, :math:`\sigma^2`, respectively.
If a gamma distribution is used, :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1]` must contain the parameters :math:`\alpha` and :math:`\beta` respectively.
If a beta distribution is used, :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1]` must contain the parameters :math:`\alpha` and :math:`\beta` respectively.
If a binomial distribution is used, :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1]` must contain the parameters :math:`m` and :math:`p` respectively.
If an exponential distribution is used, :math:`\mathrm{par}[0]` must contain the parameter :math:`\lambda`.
If a Poisson distribution is used, :math:`\mathrm{par}[0]` must contain the parameter :math:`\mu`.
If a negative binomial distribution is used, :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1]` must contain the parameters :math:`m` and :math:`p` respectively.
If a generalized Pareto distribution is used, :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1]` must contain the parameters :math:`\xi` and :math:`\beta` respectively.
If :math:`\mathrm{estima} = \text{‘E'}`, :math:`\mathrm{par}` need not be set except when the null distribution requested is either the binomial or the negative binomial distribution in which case :math:`\mathrm{par}[0]` must contain the parameter :math:`m`.
**estima** : str, length 1
:math:`\mathrm{estima}` must specify whether values of the parameters of the null distribution are known or are to be estimated from the data.
:math:`\mathrm{estima} = \text{‘S'}`
Values of the parameters will be supplied in the array :math:`\mathrm{par}` described above.
:math:`\mathrm{estima} = \text{‘E'}`
Parameters are to be estimated from the data except when the null distribution requested is the binomial distribution or the negative binomial distribution in which case the first parameter, :math:`m`, must be supplied in :math:`\mathrm{par}[0]` and only the second parameter, :math:`p`, is estimated from the data.
**ntype** : int
The test statistic to be calculated, i.e., the choice of alternative hypothesis.
:math:`\mathrm{ntype} = 1`
Computes :math:`D_n`, to test :math:`H_0` against :math:`H_1`,
:math:`\mathrm{ntype} = 2`
Computes :math:`D_n^+`, to test :math:`H_0` against :math:`H_2`,
:math:`\mathrm{ntype} = 3`
Computes :math:`D_n^-`, to test :math:`H_0` against :math:`H_3`.
**Returns**
**par** : float, ndarray, shape :math:`\left(2\right)`
If :math:`\mathrm{estima} = \text{‘S'}`, :math:`\mathrm{par}` is unchanged; if :math:`\mathrm{estima} = \text{‘E'}`, and :math:`\mathrm{dist} = \text{‘BI'}` or :math:`\mathrm{dist} = \text{‘NB'}` then :math:`\mathrm{par}[1]` is estimated from the data; otherwise :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1]` are estimated from the data.
**d** : float
The Kolmogorov--Smirnov test statistic (:math:`D_n`, :math:`D_n^+` or :math:`D_n^-` according to the value of :math:`\mathrm{ntype}`).
**z** : float
A standardized value, :math:`Z`, of the test statistic, :math:`D`, without any correction for continuity.
**p** : float
The probability, :math:`p`, associated with the observed value of :math:`D` where :math:`D` may be :math:`D_n,D_n^+` or :math:`D_n^-` depending on the value of :math:`\mathrm{ntype}` (see :ref:`Notes <g08cb-py2-py-notes>`).
**sx** : float, ndarray, shape :math:`\left(n\right)`
The sample observations, :math:`x_1,x_2,\ldots,x_n`, sorted in ascending order.
.. _g08cb-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n \geq 3`.
(`errno` :math:`2`)
On entry, :math:`\mathrm{dist} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{dist} = \text{‘U'}, \text{‘N'}, \text{‘G'}, \text{‘BE'}, \text{‘BI'}, \text{‘E'}, \text{‘P'}, \text{‘NB'} \text{ or } \text{‘GP'}`.
(`errno` :math:`3`)
On entry, :math:`\mathrm{ntype} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{ntype} = 1`, :math:`2` or :math:`3`.
(`errno` :math:`4`)
On entry, :math:`\mathrm{estima} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{estima} = \text{‘E'} \text{ or } \text{‘S'}`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{estima} = \text{‘S'}` and :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`; :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the generalized Pareto distribution with :math:`\mathrm{par}[0] < 0`, :math:`0\leq \mathrm{x}[\textit{i}-1]\leq -\mathrm{par}[1]/\mathrm{par}[0]`, for :math:`\textit{i} = 1,2,\ldots,n`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{estima} = \text{‘S'}` and :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the generalized Pareto distribution, :math:`\mathrm{par}[1] > 0`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{estima} = \text{‘S'}` and :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the negative binomial distribution, :math:`0 < \mathrm{par}[1] < 1`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{dist} = \text{‘NB'}` and :math:`m = \mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`.
Note that :math:`m` must always be supplied.
Constraint: for the negative binomial distribution, :math:`1\leq \mathrm{par}[0] < 1/\mathrm{eps}`, where :math:`\mathrm{eps} = \text{machine precision}`, see :meth:`machine.precision <naginterfaces.library.machine.precision>`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{estima} = \text{‘S'}` and :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the Poisson distribution, :math:`0 < \mathrm{par}[0] < 1000000`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{estima} = \text{‘S'}` and :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the exponential distribution, :math:`\mathrm{par}[0] > 0`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{estima} = \text{‘S'}` and :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the binomial distribution, :math:`0 < \mathrm{par}[1] < 1`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{dist} = \text{‘BI'}` and :math:`m = \mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`.
Note that :math:`m` must always be supplied.
Constraint: for the binomial distribution, :math:`1\leq \mathrm{par}[0] < 1/\mathrm{eps}`, where :math:`\mathrm{eps} = \text{machine precision}`, see :meth:`machine.precision <naginterfaces.library.machine.precision>`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{estima} = \text{‘S'}` and :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`; :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the beta distribution, :math:`0 < \mathrm{par}[0]` and :math:`\mathrm{par}[1]\leq 1000000`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{estima} = \text{‘S'}` and :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`; :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the gamma distribution, :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1] > 0`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{estima} = \text{‘S'}` and :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the Normal distribution, :math:`\mathrm{par}[1] > 0`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{estima} = \text{‘S'}` and :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`; :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the uniform distribution, :math:`\mathrm{par}[0] < \mathrm{par}[1]`.
(`errno` :math:`6`)
On entry, :math:`\mathrm{dist} = \text{‘GP'}` and :math:`\mathrm{estima} = \text{‘E'}`.
The parameter estimates are invalid; the data may not be from the generalized Pareto distribution.
(`errno` :math:`6`)
On entry, :math:`\mathrm{dist} = \text{‘E'} \text{ or } \text{‘P'}` and all observations are zero.
Constraint: at least one :math:`\mathrm{x}[\textit{i}-1] > 0`, for :math:`\textit{i} = 1,2,\ldots,n`.
(`errno` :math:`6`)
On entry, :math:`\mathrm{dist} = \text{‘BI'}` and all observations are zero or :math:`m`.
Constraint: at least one :math:`0.0 < \mathrm{x}[\textit{i}-1] < \mathrm{par}[0]`, for :math:`\textit{i} = 1,2,\ldots,n`.
(`errno` :math:`6`)
On entry, :math:`\mathrm{dist} = \text{‘G'}, \text{‘E'}, \text{‘P'}, \text{‘NB'} \text{ or } \text{‘GP'}` and at least one observation is negative.
Constraint: :math:`\mathrm{x}[\textit{i}-1]\geq 0`, for :math:`\textit{i} = 1,2,\ldots,n`.
(`errno` :math:`6`)
On entry, :math:`\mathrm{dist} = \text{‘BI'}` and at least one observation is illegal.
Constraint: :math:`0\leq \mathrm{x}[\textit{i}-1]\leq \mathrm{par}[0]`, for :math:`\textit{i} = 1,2,\ldots,n`.
(`errno` :math:`6`)
On entry, :math:`\mathrm{dist} = \text{‘BE'}` and at least one observation is illegal.
Constraint: :math:`0\leq \mathrm{x}[\textit{i}-1]\leq 1`, for :math:`\textit{i} = 1,2,\ldots,n`.
(`errno` :math:`6`)
On entry, :math:`\mathrm{dist} = \text{‘U'}` and at least one observation is illegal.
Constraint: :math:`\mathrm{par}[0]\leq \mathrm{x}[\textit{i}-1]\leq \mathrm{par}[1]`, for :math:`\textit{i} = 1,2,\ldots,n`.
(`errno` :math:`7`)
On entry, :math:`\mathrm{dist} = \text{‘U'}, \text{‘N'}, \text{‘G'}, \text{‘BE'} \text{ or } \text{‘GP'}`, :math:`\mathrm{estima} = \text{‘E'}` and the whole sample is constant. Thus the variance is zero.
(`errno` :math:`8`)
On entry, :math:`\mathrm{dist} = \text{‘NB'}`, :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
The variance :math:`\mathrm{par}[0]\times \left(1-\mathrm{par}[1]\right)/\left(\mathrm{par}[1]\times \mathrm{par}[1]\right)` exceeds 1000000.
(`errno` :math:`8`)
On entry, :math:`\mathrm{dist} = \text{‘BI'}`, :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
The variance :math:`\mathrm{par}[0]\times \mathrm{par}[1]\times \left(1-\mathrm{par}[1]\right)` exceeds 1000000.
(`errno` :math:`9`)
On entry, :math:`\mathrm{dist} = \text{‘G'}` and in the computation of the incomplete gamma function by :meth:`specfun.gamma_incomplete <naginterfaces.library.specfun.gamma_incomplete>` the convergence of the Taylor series or Legendre continued fraction fails within :math:`600` iterations.
.. _g08cb-py2-py-notes:
**Notes**
The data consist of a single sample of :math:`n` observations denoted by :math:`x_1,x_2,\ldots,x_n`.
Let :math:`S_n\left(x_{\left(i\right)}\right)` and :math:`F_0\left(x_{\left(i\right)}\right)` represent the sample cumulative distribution function and the theoretical (null) cumulative distribution function respectively at the point :math:`x_{\left(i\right)}` where :math:`x_{\left(i\right)}` is the :math:`i`\ th smallest sample observation.
The Kolmogorov--Smirnov test provides a test of the null hypothesis :math:`H_0`: the data are a random sample of observations from a theoretical distribution specified by you against one of the following alternative hypotheses:
(i) :math:`H_1`: the data cannot be considered to be a random sample from the specified null distribution.
(#) :math:`H_2`: the data arise from a distribution which dominates the specified null distribution. In practical terms, this would be demonstrated if the values of the sample cumulative distribution function :math:`S_n\left(x\right)` tended to exceed the corresponding values of the theoretical cumulative distribution function :math:`F_0\left(x\right)`.
(#) :math:`H_3`: the data arise from a distribution which is dominated by the specified null distribution. In practical terms, this would be demonstrated if the values of the theoretical cumulative distribution function :math:`F_0\left(x\right)` tended to exceed the corresponding values of the sample cumulative distribution function :math:`S_n\left(x\right)`.
One of the following test statistics is computed depending on the particular alternative null hypothesis specified (see the description of the argument :math:`\mathrm{ntype}` in :ref:`Parameters <g08cb-py2-py-parameters>`).
For the alternative hypothesis :math:`H_1`.
:math:`D_n` -- the largest absolute deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally :math:`D_n = \mathrm{max}\left\{D_n^+, D_n^-\right\}`.
For the alternative hypothesis :math:`H_2`.
:math:`D_n^+` -- the largest positive deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally :math:`D_n^+ = \mathrm{max}\left\{{S_n\left(x_{\left(i\right)}\right)-F_0\left(x_{\left(i\right)}\right)}, 0\right\}` for both discrete and continuous null distributions.
For the alternative hypothesis :math:`H_3`.
:math:`D_n^-` -- the largest positive deviation between the theoretical cumulative distribution function and the sample cumulative distribution function. Formally if the null distribution is discrete then :math:`D_n^- = \mathrm{max}\left\{{F_0\left(x_{\left(i\right)}\right)-S_n\left(x_{\left(i\right)}\right)}, 0\right\}` and if the null distribution is continuous then :math:`D_n^- = \mathrm{max}\left\{{F_0\left(x_{\left(i\right)}\right)-S_n\left(x_{\left(i-1\right)}\right)}, 0\right\}`.
The standardized statistic :math:`Z = D\times \sqrt{n}` is also computed where :math:`D` may be :math:`D_n,D_n^+` or :math:`D_n^-` depending on the choice of the alternative hypothesis.
This is the standardized value of :math:`D` with no correction for continuity applied and the distribution of :math:`Z` converges asymptotically to a limiting distribution, first derived by Kolmogorov (1933), and then tabulated by Smirnov (1948).
The asymptotic distributions for the one-sided statistics were obtained by Smirnov (1933).
The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed.
If :math:`n\leq 100` an exact method given by Conover (1980), is used.
Note that the method used is only exact for continuous theoretical distributions and does not include Conover's modification for discrete distributions.
This method computes the one-sided probabilities.
The two-sided probabilities are estimated by doubling the one-sided probability.
This is a good estimate for small :math:`p`, that is :math:`p\leq 0.10`, but it becomes very poor for larger :math:`p`.
If :math:`n > 100` then :math:`p` is computed using the Kolmogorov--Smirnov limiting distributions, see Feller (1948), Kendall and Stuart (1973), Kolmogorov (1933), Smirnov (1933) and Smirnov (1948).
.. _g08cb-py2-py-references:
**References**
Conover, W J, 1980, `Practical Nonparametric Statistics`, Wiley
Feller, W, 1948, `On the Kolmogorov--Smirnov limit theorems for empirical distributions`, Ann. Math. Statist. (19), 179--181
Kendall, M G and Stuart, A, 1973, `The Advanced Theory of Statistics (Volume 2)`, (3rd Edition), Griffin
Kolmogorov, A N, 1933, `Sulla determinazione empirica di una legge di distribuzione`, Giornale dell' Istituto Italiano degli Attuari (4), 83--91
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
Smirnov, N, 1933, `Estimate of deviation between empirical distribution functions in two independent samples`, Bull. Moscow Univ. (2(2)), 3--16
Smirnov, N, 1948, `Table for estimating the goodness of fit of empirical distributions`, Ann. Math. Statist. (19), 279--281
"""
raise NotImplementedError
[docs]def test_ks_1sample_user(x, cdf, ntype, data=None):
r"""
``test_ks_1sample_user`` performs the one sample Kolmogorov--Smirnov distribution test, using a user-specified distribution.
.. _g08cc-py2-py-doc:
For full information please refer to the NAG Library document for g08cc
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08ccf.html
.. _g08cc-py2-py-parameters:
**Parameters**
**x** : float, array-like, shape :math:`\left(n\right)`
The sample observations, :math:`x_1,x_2,\ldots,x_n`.
**cdf** : callable retval = cdf(x, data=None)
:math:`\mathrm{cdf}` must return the value of the theoretical (null) cumulative distribution function for a given value of its argument.
**Parameters**
**x** : float
The argument for which :math:`\mathrm{cdf}` must be evaluated.
**data** : arbitrary, optional, modifiable in place
User-communication data for callback functions.
**Returns**
**retval** : float
The value of the theoretical (null) cumulative distribution function evaluated at :math:`\mathrm{x}`.
**ntype** : int
The statistic to be calculated, i.e., the choice of alternative hypothesis.
:math:`\mathrm{ntype} = 1`
Computes :math:`D_n`, to test :math:`H_0` against :math:`H_1`.
:math:`\mathrm{ntype} = 2`
Computes :math:`D_n^+`, to test :math:`H_0` against :math:`H_2`.
:math:`\mathrm{ntype} = 3`
Computes :math:`D_n^-`, to test :math:`H_0` against :math:`H_3`.
**data** : arbitrary, optional
User-communication data for callback functions.
**Returns**
**d** : float
The Kolmogorov--Smirnov test statistic (:math:`D_n`, :math:`D_n^+` or :math:`D_n^-` according to the value of :math:`\mathrm{ntype}`).
**z** : float
A standardized value, :math:`Z`, of the test statistic, :math:`D`, without the continuity correction applied.
**p** : float
The probability, :math:`p`, associated with the observed value of :math:`D`, where :math:`D` may :math:`D_n`, :math:`D_n^+` or :math:`D_n^-` depending on the value of :math:`\mathrm{ntype}` (see :ref:`Notes <g08cc-py2-py-notes>`).
.. _g08cc-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n \geq 1`.
(`errno` :math:`2`)
On entry, :math:`\mathrm{ntype} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{ntype} = 1`, :math:`2` or :math:`3`.
(`errno` :math:`3`)
On entry, at :math:`x = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`F_0\left(x\right) = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`0.0\leq F_0\left(x\right)\leq 1`.
(`errno` :math:`4`)
On entry, at :math:`x = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`F_0\left(x\right) = \langle\mathit{\boldsymbol{value}}\rangle` and at :math:`y = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`F_0\left(y\right) = \langle\mathit{\boldsymbol{value}}\rangle`
Constraint: when :math:`x < y`, :math:`F_0\left(x\right)\leq F_0\left(y\right)`.
.. _g08cc-py2-py-notes:
**Notes**
`No equivalent traditional C interface for this routine exists in the NAG Library.`
The data consists of a single sample of :math:`n` observations, denoted by :math:`x_1,x_2,\ldots,x_n`.
Let :math:`S_n\left(x_{\left(i\right)}\right)` and :math:`F_0\left(x_{\left(i\right)}\right)` represent the sample cumulative distribution function and the theoretical (null) cumulative distribution function respectively at the point :math:`x_{\left(i\right)}`, where :math:`x_{\left(i\right)}` is the :math:`i`\ th smallest sample observation.
The Kolmogorov--Smirnov test provides a test of the null hypothesis :math:`H_0`: the data are a random sample of observations from a theoretical distribution specified by you (in :math:`\mathrm{cdf}`) against one of the following alternative hypotheses.
(i) :math:`H_1`: the data cannot be considered to be a random sample from the specified null distribution.
(#) :math:`H_2`: the data arise from a distribution which dominates the specified null distribution. In practical terms, this would be demonstrated if the values of the sample cumulative distribution function :math:`S_n\left(x\right)` tended to exceed the corresponding values of the theoretical cumulative distribution function :math:`F_{{0\left(x\right)}}`.
(#) :math:`H_3`: the data arise from a distribution which is dominated by the specified null distribution. In practical terms, this would be demonstrated if the values of the theoretical cumulative distribution function :math:`F_0\left(x\right)` tended to exceed the corresponding values of the sample cumulative distribution function :math:`S_n\left(x\right)`.
One of the following test statistics is computed depending on the particular alternative hypothesis specified (see the description of the argument :math:`\mathrm{ntype}` in :ref:`Parameters <g08cc-py2-py-parameters>`).
For the alternative hypothesis :math:`H_1`:
:math:`D_n` -- the largest absolute deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally :math:`D_n = \mathrm{max}\left\{D_n^+,D_n^-\right\}`.
For the alternative hypothesis :math:`H_2`:
:math:`D_n^+` -- the largest positive deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally :math:`D_n^+ = \mathrm{max}\left\{S_n\left(x_{\left(i\right)}\right)-F_0\left(x_{\left(i\right)}\right),0\right\}`.
For the alternative hypothesis :math:`H_3`:
:math:`D_n^-` -- the largest positive deviation between the theoretical cumulative distribution function and the sample cumulative distribution function. Formally :math:`D_n^- = \mathrm{max}\left\{F_0\left(x_{\left(i\right)}\right)-S_n\left(x_{\left(i-1\right)}\right),0\right\}`. This is only true for continuous distributions. See `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08ccf.html#fcomments>`__ for comments on discrete distributions.
The standardized statistic, :math:`Z = D\times \sqrt{n}`, is also computed, where :math:`D` may be :math:`D_n,D_n^+` or :math:`D_n^-` depending on the choice of the alternative hypothesis.
This is the standardized value of :math:`D` with no continuity correction applied and the distribution of :math:`Z` converges asymptotically to a limiting distribution, first derived by Kolmogorov (1933), and then tabulated by Smirnov (1948).
The asymptotic distributions for the one-sided statistics were obtained by Smirnov (1933).
The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed.
If :math:`n\leq 100`, an exact method given by Conover (1980) is used.
Note that the method used is only exact for continuous theoretical distributions and does not include Conover's modification for discrete distributions.
This method computes the one-sided probabilities.
The two-sided probabilities are estimated by doubling the one-sided probability.
This is a good estimate for small :math:`p`, that is :math:`p\leq 0.10`, but it becomes very poor for larger :math:`p`.
If :math:`n > 100` then :math:`p` is computed using the Kolmogorov--Smirnov limiting distributions; see Feller (1948), Kendall and Stuart (1973), Kolmogorov (1933), Smirnov (1933) and Smirnov (1948).
.. _g08cc-py2-py-references:
**References**
Conover, W J, 1980, `Practical Nonparametric Statistics`, Wiley
Feller, W, 1948, `On the Kolmogorov--Smirnov limit theorems for empirical distributions`, Ann. Math. Statist. (19), 179--181
Kendall, M G and Stuart, A, 1973, `The Advanced Theory of Statistics (Volume 2)`, (3rd Edition), Griffin
Kolmogorov, A N, 1933, `Sulla determinazione empirica di una legge di distribuzione`, Giornale dell' Istituto Italiano degli Attuari (4), 83--91
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
Smirnov, N, 1933, `Estimate of deviation between empirical distribution functions in two independent samples`, Bull. Moscow Univ. (2(2)), 3--16
Smirnov, N, 1948, `Table for estimating the goodness of fit of empirical distributions`, Ann. Math. Statist. (19), 279--281
"""
raise NotImplementedError
[docs]def test_ks_2sample(x, y, ntype):
r"""
``test_ks_2sample`` performs the two sample Kolmogorov--Smirnov distribution test.
.. _g08cd-py2-py-doc:
For full information please refer to the NAG Library document for g08cd
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08cdf.html
.. _g08cd-py2-py-parameters:
**Parameters**
**x** : float, array-like, shape :math:`\left(\textit{n1}\right)`
The observations from the first sample, :math:`x_1,x_2,\ldots,x_{n_1}`.
**y** : float, array-like, shape :math:`\left(\textit{n2}\right)`
The observations from the second sample, :math:`y_1,y_2,\ldots,y_{n_2}`.
**ntype** : int
The statistic to be computed, i.e., the choice of alternative hypothesis.
:math:`\mathrm{ntype} = 1`
Computes :math:`D_{{n_1n_2}}`, to test against :math:`H_1`.
:math:`\mathrm{ntype} = 2`
Computes :math:`D_{{n_1n_2}}^+`, to test against :math:`H_2`.
:math:`\mathrm{ntype} = 3`
Computes :math:`D_{{n_1n_2}}^-`, to test against :math:`H_3`.
**Returns**
**d** : float
The Kolmogorov--Smirnov test statistic (:math:`D_{{n_1n_2}}`, :math:`D_{{n_1n_2}}^+` or :math:`D_{{n_1n_2}}^-` according to the value of :math:`\mathrm{ntype}`).
**z** : float
A standardized value, :math:`Z`, of the test statistic, :math:`D`, without any correction for continuity.
**p** : float
The tail probability associated with the observed value of :math:`D`, where :math:`D` may be :math:`D_{{n_1,n_2}},D_{{n_1,n_2}}^+` or :math:`D_{{n_1,n_2}}^-` depending on the value of :math:`\mathrm{ntype}` (see :ref:`Notes <g08cd-py2-py-notes>`).
**sx** : float, ndarray, shape :math:`\left(\textit{n1}\right)`
The observations from the first sample sorted in ascending order.
**sy** : float, ndarray, shape :math:`\left(\textit{n2}\right)`
The observations from the second sample sorted in ascending order.
.. _g08cd-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\textit{n2} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{n2}\geq 1`.
(`errno` :math:`1`)
On entry, :math:`\textit{n1} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{n1}\geq 1`.
(`errno` :math:`2`)
On entry, :math:`\mathrm{ntype} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{ntype} = 1`, :math:`2` or :math:`3`.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`3`)
The iterative process used in the approximation of the probability for large :math:`n_1` and :math:`n_2` did not converge. For the two sided test :math:`p = 1` is returned. For the one-sided test :math:`p = 0.5` is returned.
.. _g08cd-py2-py-notes:
**Notes**
`In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`
The data consists of two independent samples, one of size :math:`n_1`, denoted by :math:`x_1,x_2,\ldots,x_{n_1}`, and the other of size :math:`n_2` denoted by :math:`y_1,y_2,\ldots,y_{n_2}`.
Let :math:`F\left(x\right)` and :math:`G\left(x\right)` represent their respective, unknown, distribution functions.
Also let :math:`S_1\left(x\right)` and :math:`S_2\left(x\right)` denote the values of the sample cumulative distribution functions at the point :math:`x` for the two samples respectively.
The Kolmogorov--Smirnov test provides a test of the null hypothesis :math:`H_0`: :math:`F\left(x\right) = G\left(x\right)` against one of the following alternative hypotheses:
(i) :math:`H_1`: :math:`F\left(x\right)\neq G\left(x\right)`.
(#) :math:`H_2`: :math:`F\left(x\right) > G\left(x\right)`. This alternative hypothesis is sometimes stated as, 'The :math:`x`'s tend to be smaller than the :math:`y`'s', i.e., it would be demonstrated in practical terms if the values of :math:`S_1\left(x\right)` tended to exceed the corresponding values of :math:`S_2\left(x\right)`.
(#) :math:`H_3`: :math:`F\left(x\right) < G\left(x\right)`. This alternative hypothesis is sometimes stated as, 'The :math:`x`'s tend to be larger than the :math:`y`'s', i.e., it would be demonstrated in practical terms if the values of :math:`S_2\left(x\right)` tended to exceed the corresponding values of :math:`S_1\left(x\right)`.
One of the following test statistics is computed depending on the particular alternative null hypothesis specified (see the description of the argument :math:`\mathrm{ntype}` in :ref:`Parameters <g08cd-py2-py-parameters>`).
For the alternative hypothesis :math:`H_1`.
:math:`D_{{n_1,n_2}}` -- the largest absolute deviation between the two sample cumulative distribution functions.
For the alternative hypothesis :math:`H_2`.
:math:`D_{{n_1,n_2}}^+` -- the largest positive deviation between the sample cumulative distribution function of the first sample, :math:`S_1\left(x\right)`, and the sample cumulative distribution function of the second sample, :math:`S_2\left(x\right)`. Formally :math:`D_{{n_1,n_2}}^+ = \mathrm{max}\left\{S_1\left(x\right)-S_2\left(x\right),0\right\}`.
For the alternative hypothesis :math:`H_3`.
:math:`D_{{n_1,n_2}}^-` -- the largest positive deviation between the sample cumulative distribution function of the second sample, :math:`S_2\left(x\right)`, and the sample cumulative distribution function of the first sample, :math:`S_1\left(x\right)`. Formally :math:`D_{{n_1,n_2}}^- = \mathrm{max}\left\{S_2\left(x\right)-S_1\left(x\right),0\right\}`.
``test_ks_2sample`` also returns the standardized statistic :math:`Z = \sqrt{\frac{{n_1+n_2}}{{n_1n_2}}}\times D`, where :math:`D` may be :math:`D_{{n_1,n_2}}`, :math:`D_{{n_1,n_2}}^+` or :math:`D_{{n_1,n_2}}^-` depending on the choice of the alternative hypothesis.
The distribution of this statistic converges asymptotically to a distribution given by Smirnov as :math:`n_1` and :math:`n_2` increase; see Feller (1948), Kendall and Stuart (1973), Kim and Jenrich (1973), Smirnov (1933) and Smirnov (1948).
The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed.
If :math:`\mathrm{max}\left(n_1, n_2\right)\leq 2500` and :math:`n_1n_2\leq 10000` then an exact method given by Kim and Jenrich (see Kim and Jenrich (1973)) is used.
Otherwise :math:`p` is computed using the approximations suggested by Kim and Jenrich (1973).
Note that the method used is only exact for continuous theoretical distributions.
This method computes the two-sided probability.
The one-sided probabilities are estimated by halving the two-sided probability.
This is a good estimate for small :math:`p`, that is :math:`p\leq 0.10`, but it becomes very poor for larger :math:`p`.
.. _g08cd-py2-py-references:
**References**
Conover, W J, 1980, `Practical Nonparametric Statistics`, Wiley
Feller, W, 1948, `On the Kolmogorov--Smirnov limit theorems for empirical distributions`, Ann. Math. Statist. (19), 179--181
Kendall, M G and Stuart, A, 1973, `The Advanced Theory of Statistics (Volume 2)`, (3rd Edition), Griffin
Kim, P J and Jenrich, R I, 1973, `Tables of exact sampling distribution of the two sample Kolmogorov--Smirnov criterion` :math:`D_{{mn}}\left(m < n\right)`, Selected Tables in Mathematical Statistics (1), 80--129, American Mathematical Society
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
Smirnov, N, 1933, `Estimate of deviation between empirical distribution functions in two independent samples`, Bull. Moscow Univ. (2(2)), 3--16
Smirnov, N, 1948, `Table for estimating the goodness of fit of empirical distributions`, Ann. Math. Statist. (19), 279--281
"""
raise NotImplementedError
[docs]def test_chisq(ifreq, cb, dist, par, npest, prob):
r"""
``test_chisq`` computes the test statistic for the :math:`\chi^2` goodness-of-fit test for data with a chosen number of class intervals.
.. _g08cg-py2-py-doc:
For full information please refer to the NAG Library document for g08cg
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08cgf.html
.. _g08cg-py2-py-parameters:
**Parameters**
**ifreq** : int, array-like, shape :math:`\left(\textit{nclass}\right)`
:math:`\mathrm{ifreq}[\textit{i}]` must specify the frequency of the :math:`\textit{i}`\ th class, :math:`O_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,k`.
**cb** : float, array-like, shape :math:`\left(\textit{nclass}-1\right)`
:math:`\mathrm{cb}[\textit{i}-1]` must specify the upper boundary value for the :math:`\textit{i}`\ th class, for :math:`\textit{i} = 1,2,\ldots,k-1`.
**dist** : str, length 1
Indicates for which distribution the test is to be carried out.
:math:`\mathrm{dist} = \texttt{'N'}`
The Normal distribution is used.
:math:`\mathrm{dist} = \texttt{'U'}`
The uniform distribution is used.
:math:`\mathrm{dist} = \texttt{'E'}`
The exponential distribution is used.
:math:`\mathrm{dist} = \texttt{'C'}`
The :math:`\chi^2`-distribution is used.
:math:`\mathrm{dist} = \texttt{'G'}`
The gamma distribution is used.
:math:`\mathrm{dist} = \texttt{'A'}`
You must supply the class probabilities in the array :math:`\mathrm{prob}`.
**par** : float, array-like, shape :math:`\left(2\right)`
Must contain the parameters of the distribution which is being tested. If you supply the probabilities (i.e., :math:`\mathrm{dist} = \texttt{'A'}`) the array :math:`\mathrm{par}` is not referenced.
If a Normal distribution is used then :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1]` must contain the mean, :math:`\mu`, and the variance, :math:`\sigma^2`, respectively.
If a uniform distribution is used then :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1]` must contain the boundaries :math:`a` and :math:`b` respectively.
If an exponential distribution is used then :math:`\mathrm{par}[0]` must contain the parameter :math:`\lambda`. :math:`\mathrm{par}[1]` is not used.
If a :math:`\chi^2`-distribution is used then :math:`\mathrm{par}[0]` must contain the number of degrees of freedom. :math:`\mathrm{par}[1]` is not used.
If a gamma distribution is used :math:`\mathrm{par}[0]` and :math:`\mathrm{par}[1]` must contain the parameters :math:`\alpha` and :math:`\beta` respectively.
**npest** : int
The number of estimated parameters of the distribution.
**prob** : float, array-like, shape :math:`\left(\textit{nclass}\right)`
If you are supplying the probability distribution (i.e., :math:`\mathrm{dist} = \texttt{'A'}`) then :math:`\mathrm{prob}[i]` must contain the probability that :math:`X` lies in the :math:`i`\ th class.
If :math:`\mathrm{dist}\neq \texttt{'A'}`, :math:`\mathrm{prob}` is not referenced.
**Returns**
**chisq** : float
The test statistic, :math:`X^2`, for the :math:`\chi^2` goodness-of-fit test.
**p** : float
The upper tail probability from the :math:`\chi^2`-distribution associated with the test statistic, :math:`X^2`, and the number of degrees of freedom.
**ndf** : int
Contains :math:`\left(\textit{nclass}-1-\mathrm{npest}\right)`, the degrees of freedom associated with the test.
**efreq** : float, ndarray, shape :math:`\left(\textit{nclass}\right)`
:math:`\mathrm{efreq}[\textit{i}-1]` contains the expected frequency for the :math:`\textit{i}`\ th class, :math:`E_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,k`.
**chisqi** : float, ndarray, shape :math:`\left(\textit{nclass}\right)`
:math:`\mathrm{chisqi}[\textit{i}-1]` contains the contribution from the :math:`\textit{i}`\ th class to the test statistic, that is, :math:`\left(O_{\textit{i}}-E_{\textit{i}}\right)^2/E_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,k`.
.. _g08cg-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\textit{nclass} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nclass} \geq 2`.
(`errno` :math:`2`)
On entry, :math:`\mathrm{dist} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{dist} = \texttt{'N'}`, :math:`\texttt{'U'}`, :math:`\texttt{'E'}`, :math:`\texttt{'C'}`, :math:`\texttt{'G'}` or :math:`\texttt{'A'}`.
(`errno` :math:`3`)
On entry, :math:`\mathrm{npest} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`0\leq \mathrm{npest} < \textit{nclass}-1`.
(`errno` :math:`4`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ifreq}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{ifreq}[i-1]\geq 0`.
(`errno` :math:`5`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{cb}[i-2] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{cb}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{cb}[i-2] < \mathrm{cb}[i-1]`.
(`errno` :math:`6`)
On entry, :math:`\mathrm{cb}[0] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{cb}[0]\geq 0.0`.
(`errno` :math:`7`)
On entry, :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the gamma distribution, :math:`\mathrm{par}[0] > 0.0` and :math:`\mathrm{par}[1] > 0.0`.
(`errno` :math:`7`)
On entry, :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the :math:`\chi^2` distribution, :math:`\mathrm{par}[0] > 0.0`.
(`errno` :math:`7`)
On entry, :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the exponential distribution, :math:`\mathrm{par}[0] > 0.0`.
(`errno` :math:`7`)
On entry, :math:`\mathrm{par}[0] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the uniform distribution, :math:`\mathrm{par}[0] < \mathrm{par}[1]`, :math:`\mathrm{par}[0]\leq \mathrm{cb}[0]` and :math:`\mathrm{par}[1]\geq \mathrm{cb}[\textit{nclass}-2]`.
(`errno` :math:`7`)
On entry, :math:`\mathrm{par}[1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: for the Normal distribution, :math:`\mathrm{par}[1] > 0.0`.
(`errno` :math:`8`)
On entry, :math:`\sum_i\left(\mathrm{prob}[i]\right) = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\sum_i\left(\mathrm{prob}[i]\right) = 1.0`.
(`errno` :math:`8`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{prob}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{prob} > 0.0`
(`errno` :math:`9`)
An expected frequency equals zero, when the observed frequency was not.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`10`)
At least one class has an expected frequency less than :math:`1`.
(`errno` :math:`11`)
The solution has failed to converge whilst computing the expected values. The returned solution may be an adequate approximation.
.. _g08cg-py2-py-notes:
**Notes**
`In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`
The :math:`\chi^2` goodness-of-fit test performed by ``test_chisq`` is used to test the null hypothesis that a random sample arises from a specified distribution against the alternative hypothesis that the sample does not arise from the specified distribution.
Given a sample of size :math:`n`, denoted by :math:`x_1,x_2,\ldots,x_n`, drawn from a random variable :math:`X`, and that the data has been grouped into :math:`k` classes,
.. math::
\begin{array}{cc}x\leq c_1\text{,}&\\c_{{i-1}} < x\leq c_i\text{,}&i = 2,3,\ldots,k-1\text{,}\\x > c_{{k-1}}\text{,}&\end{array}
then the :math:`\chi^2` goodness-of-fit test statistic is defined by
.. math::
X^2 = \sum_{{i = 1}}^k\frac{\left(O_i-E_i\right)^2}{E_i}\text{,}
where :math:`O_i` is the observed frequency of the :math:`i`\ th class, and :math:`E_i` is the expected frequency of the :math:`i`\ th class.
The expected frequencies are computed as
.. math::
E_i = p_i\times n\text{,}
where :math:`p_i` is the probability that :math:`X` lies in the :math:`i`\ th class, that is
.. math::
\begin{array}{cc}p_1 = P\left(X\leq c_1\right)\text{,}&\\p_i = P\left(c_{{i-1}} < X\leq c_i\right)\text{,}&i = 2,3,\ldots,k-1\text{,}\\p_k = P\left(X > c_{{k-1}}\right)\text{.}&\end{array}
These probabilities are either taken from a common probability distribution or are supplied by you.
The available probability distributions within this function are:
Normal distribution with mean :math:`\mu`, variance :math:`\sigma^2`;
uniform distribution on the interval :math:`\left[a, b\right]`;
exponential distribution with probability density function :math:`\left(\mathrm{pdf}\right) = \lambda e^{{-\lambda x}}`;
:math:`\chi^2`-distribution with :math:`f` degrees of freedom; and
gamma distribution with :math:`\mathrm{pdf} = \frac{{x^{{\alpha -1}}e^{{-x/\beta }}}}{{\Gamma \left(\alpha \right)\beta^{\alpha }}}`.
You must supply the frequencies and classes.
Given a set of data and classes the frequencies may be calculated using :meth:`stat.frequency_table <naginterfaces.library.stat.frequency_table>`.
``test_chisq`` returns the :math:`\chi^2` test statistic, :math:`X^2`, together with its degrees of freedom and the upper tail probability from the :math:`\chi^2`-distribution associated with the test statistic.
Note that the use of the :math:`\chi^2`-distribution as an approximation to the distribution of the test statistic improves as the expected values in each class increase.
.. _g08cg-py2-py-references:
**References**
Conover, W J, 1980, `Practical Nonparametric Statistics`, Wiley
Kendall, M G and Stuart, A, 1973, `The Advanced Theory of Statistics (Volume 2)`, (3rd Edition), Griffin
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
"""
raise NotImplementedError
[docs]def gofstat_anddar(issort, y):
r"""
``gofstat_anddar`` calculates the Anderson--Darling goodness-of-fit test statistic.
.. _g08ch-py2-py-doc:
For full information please refer to the NAG Library document for g08ch
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08chf.html
.. _g08ch-py2-py-parameters:
**Parameters**
**issort** : bool
Set :math:`\mathrm{issort} = \mathbf{True}` if the observations are sorted in ascending order; otherwise the function will sort the observations.
**y** : float, array-like, shape :math:`\left(n\right)`
:math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`, the :math:`n` observations.
**Returns**
**gofs** : float
The Anderson--Darling goodness-of-fit test statistic.
**y** : float, ndarray, shape :math:`\left(n\right)`
If :math:`\mathrm{issort} = \mathbf{False}`, the data sorted in ascending order; otherwise the array is unchanged.
.. _g08ch-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n > 1`.
(`errno` :math:`3`)
:math:`\mathrm{issort} = \mathbf{True}` and the data in :math:`\mathrm{y}` is not sorted in ascending order.
(`errno` :math:`9`)
The data in :math:`\mathrm{y}` must lie in the interval :math:`\left(0, 1\right)`.
.. _g08ch-py2-py-notes:
**Notes**
Denote by :math:`A^2` the Anderson--Darling test statistic for :math:`n` observations :math:`y_1,y_2,\ldots,y_n` of a variable :math:`Y` assumed to be standard uniform and sorted in ascending order, then:
.. math::
A^2 = {-n}-S\text{;}
where:
.. math::
S = \sum_{1}^{n}{\frac{{2i-1}}{n}}\left[\mathrm{ln}\left(y_i\right)+\mathrm{ln}\left(1-y_{{n-i+1}}\right)\right]\text{.}
When observations of a random variable :math:`X` are non-uniformly distributed, the probability integral transformation (PIT):
.. math::
Y = F\left(X\right)\text{,}
where :math:`F` is the cumulative distribution function of the distribution of interest, yields a uniformly distributed random variable :math:`Y`.
The PIT is true only if all parameters of a distribution are known as opposed to estimated; otherwise it is an approximation.
.. _g08ch-py2-py-references:
**References**
Anderson, T W and Darling, D A, 1952, `Asymptotic theory of certain 'goodness-of-fit' criteria based on stochastic processes`, Annals of Mathematical Statistics (23), 193--212
"""
raise NotImplementedError
[docs]def gofstat_anddar_unif(issort, y):
r"""
``gofstat_anddar_unif`` calculates the Anderson--Darling goodness-of-fit test statistic and its probability for the case of standard uniformly distributed data.
.. _g08cj-py2-py-doc:
For full information please refer to the NAG Library document for g08cj
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08cjf.html
.. _g08cj-py2-py-parameters:
**Parameters**
**issort** : bool
Set :math:`\mathrm{issort} = \mathbf{True}` if the observations are sorted in ascending order; otherwise the function will sort the observations.
**y** : float, array-like, shape :math:`\left(n\right)`
:math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`, the :math:`n` observations.
**Returns**
**y** : float, ndarray, shape :math:`\left(n\right)`
If :math:`\mathrm{issort} = \mathbf{False}`, the data sorted in ascending order; otherwise the array is unchanged.
**a2** : float
:math:`A^2`, the Anderson--Darling test statistic.
**p** : float
:math:`p`, the upper tail probability for :math:`A^2`.
.. _g08cj-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n > 1`.
(`errno` :math:`3`)
:math:`\mathrm{issort} = \mathbf{True}` and the data in :math:`\mathrm{y}` is not sorted in ascending order.
(`errno` :math:`9`)
The data in :math:`\mathrm{y}` must lie in the interval :math:`\left(0, 1\right)`.
.. _g08cj-py2-py-notes:
**Notes**
Calculates the Anderson--Darling test statistic :math:`A^2` (see :meth:`gofstat_anddar`) and its upper tail probability by using the approximation method of Marsaglia and Marsaglia (2004) for the case of uniformly distributed data.
.. _g08cj-py2-py-references:
**References**
Anderson, T W and Darling, D A, 1952, `Asymptotic theory of certain 'goodness-of-fit' criteria based on stochastic processes`, Annals of Mathematical Statistics (23), 193--212
Marsaglia, G and Marsaglia, J, 2004, `Evaluating the Anderson--Darling distribution`, J. Statist. Software (9(2))
"""
raise NotImplementedError
[docs]def gofstat_anddar_normal(issort, y):
r"""
``gofstat_anddar_normal`` calculates the Anderson--Darling goodness-of-fit test statistic and its probability for the case of a fully-unspecified Normal distribution.
.. _g08ck-py2-py-doc:
For full information please refer to the NAG Library document for g08ck
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08ckf.html
.. _g08ck-py2-py-parameters:
**Parameters**
**issort** : bool
Set :math:`\mathrm{issort} = \mathbf{True}` if the observations are sorted in ascending order; otherwise the function will sort the observations.
**y** : float, array-like, shape :math:`\left(n\right)`
:math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`, the :math:`n` observations.
**Returns**
**ybar** : float
The maximum likelihood estimate of mean.
**yvar** : float
The maximum likelihood estimate of variance.
**a2** : float
:math:`A^2`, the Anderson--Darling test statistic.
**aa2** : float
The adjusted :math:`A^2`.
**p** : float
:math:`p`, the upper tail probability for the adjusted :math:`A^2`.
.. _g08ck-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n > 1`.
(`errno` :math:`3`)
:math:`\mathrm{issort} = \mathbf{True}` and the data in :math:`\mathrm{y}` is not sorted in ascending order.
.. _g08ck-py2-py-notes:
**Notes**
Calculates the Anderson--Darling test statistic :math:`A^2` (see :meth:`gofstat_anddar`) and its upper tail probability for the small sample correction:
.. math::
\text{Adjusted }A^2 = A^2\left(1+0.75/n+{2.25/n^2}\right)\text{,}
for :math:`n` observations.
.. _g08ck-py2-py-references:
**References**
Anderson, T W and Darling, D A, 1952, `Asymptotic theory of certain 'goodness-of-fit' criteria based on stochastic processes`, Annals of Mathematical Statistics (23), 193--212
Stephens, M A and D'Agostino, R B, 1986, `Goodness-of-Fit Techniques`, Marcel Dekker, New York
"""
raise NotImplementedError
[docs]def gofstat_anddar_exp(issort, y):
r"""
``gofstat_anddar_exp`` calculates the Anderson--Darling goodness-of-fit test statistic and its probability for the case of an unspecified exponential distribution.
.. _g08cl-py2-py-doc:
For full information please refer to the NAG Library document for g08cl
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08clf.html
.. _g08cl-py2-py-parameters:
**Parameters**
**issort** : bool
Set :math:`\mathrm{issort} = \mathbf{True}` if the observations are sorted in ascending order; otherwise the function will sort the observations.
**y** : float, array-like, shape :math:`\left(n\right)`
:math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`, the :math:`n` observations.
**Returns**
**ybar** : float
The maximum likelihood estimate of mean.
**a2** : float
:math:`A^2`, the Anderson--Darling test statistic.
**aa2** : float
The adjusted :math:`A^2`.
**p** : float
:math:`p`, the upper tail probability for the adjusted :math:`A^2`.
.. _g08cl-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n > 1`.
(`errno` :math:`3`)
:math:`\mathrm{issort} = \mathbf{True}` and the data in :math:`\mathrm{y}` is not sorted in ascending order.
(`errno` :math:`9`)
The data in :math:`\mathrm{y}` must be greater than zero.
.. _g08cl-py2-py-notes:
**Notes**
Calculates the Anderson--Darling test statistic :math:`A^2` (see :meth:`gofstat_anddar`) and its upper tail probability for the small sample correction:
.. math::
\text{Adjusted }A^2 = A^2\left(1+0.6/n\right)\text{,}
for :math:`n` observations.
.. _g08cl-py2-py-references:
**References**
Anderson, T W and Darling, D A, 1952, `Asymptotic theory of certain 'goodness-of-fit' criteria based on stochastic processes`, Annals of Mathematical Statistics (23), 193--212
Stephens, M A and D'Agostino, R B, 1986, `Goodness-of-Fit Techniques`, Marcel Dekker, New York
"""
raise NotImplementedError
[docs]def concordance_kendall(x):
r"""
``concordance_kendall`` calculates Kendall's coefficient of concordance on :math:`k` independent rankings of :math:`n` objects or individuals.
.. _g08da-py2-py-doc:
For full information please refer to the NAG Library document for g08da
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08daf.html
.. _g08da-py2-py-parameters:
**Parameters**
**x** : float, array-like, shape :math:`\left(k, n\right)`
:math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to the value :math:`x_{{\textit{i}\textit{j}}}` of object :math:`\textit{j}` in comparison :math:`\textit{i}`, for :math:`\textit{j} = 1,2,\ldots,n`, for :math:`\textit{i} = 1,2,\ldots,k`.
**Returns**
**w** : float
The value of Kendall's coefficient of concordance, :math:`W`.
**p** : float
The approximate significance, :math:`p`, of :math:`W`.
.. _g08da-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n \geq 2`.
(`errno` :math:`3`)
On entry, :math:`k = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`k \geq 2`.
.. _g08da-py2-py-notes:
**Notes**
`No equivalent traditional C interface for this routine exists in the NAG Library.`
Kendall's coefficient of concordance measures the degree of agreement between :math:`k` comparisons of :math:`n` objects, the scores in the :math:`i`\ th comparison being denoted by
.. math::
x_{{i1}},x_{{i2}},\ldots,x_{{in}}\text{.}
The hypothesis under test, :math:`H_0`, often called the null hypothesis, is that there is no agreement between the comparisons, and this is to be tested against the alternative hypothesis, :math:`H_1`, that there is some agreement.
The :math:`n` scores for each comparison are ranked, the rank :math:`r_{{ij}}` denoting the rank of object :math:`j` in comparison :math:`i`, and all ranks lying between :math:`1` and :math:`n`.
Average ranks are assigned to tied scores.
For each of the :math:`n` objects, the :math:`k` ranks are totalled, giving rank sums :math:`R_j`, for :math:`j = 1,2,\ldots,n`.
Under :math:`H_0`, all the :math:`R_j` would be approximately equal to the average rank sum :math:`k\left(n+1\right)/2`.
The total squared deviation of the :math:`R_j` from this average value is, therefore, a measure of the departure from :math:`H_0` exhibited by the data.
If there were complete agreement between the comparisons, the rank sums :math:`R_j` would have the values :math:`k,2k,\ldots,nk` (or some permutation thereof).
The total squared deviation of these values is :math:`k^2\left(n^3-n\right)/12`.
Kendall's coefficient of concordance is the ratio
.. math::
W = \frac{{\sum_{{j = 1}}^n{\left(R_j-\frac{1}{2}k\left(n+1\right)\right)}^2}}{{\frac{1}{12}k^2\left(n^3-n\right)}}
and lies between :math:`0` and :math:`1`, the value :math:`0` indicating complete disagreement, and :math:`1` indicating complete agreement.
If there are tied rankings within comparisons, :math:`W` is corrected by subtracting :math:`k\sum T` from the denominator, where :math:`T = \sum \left(t^3-t\right)/12`, each :math:`t` being the number of occurrences of each tied rank within a comparison, and the summation of :math:`T` being over all comparisons containing ties.
``concordance_kendall`` returns the value of :math:`W`, and also an approximation, :math:`p`, of the significance of the observed :math:`W`. (For :math:`n > 7,k\left(n-1\right)W` approximately follows a :math:`\chi_{{n-1}}^2` distribution, so large values of :math:`W` imply rejection of :math:`H_0`.) :math:`H_0` is rejected by a test of chosen size :math:`\alpha` if :math:`p < \alpha`.
If :math:`n\leq 7`, tables should be used to establish the significance of :math:`W` (e.g., Table R of Siegel (1956)).
.. _g08da-py2-py-references:
**References**
Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill
"""
raise NotImplementedError
[docs]def randtest_runs(cl, x, m, nruns, ncount, comm):
r"""
``randtest_runs`` performs a runs up (or a runs down) test on a sequence of observations.
.. _g08ea-py2-py-doc:
For full information please refer to the NAG Library document for g08ea
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08eaf.html
.. _g08ea-py2-py-parameters:
**Parameters**
**cl** : str, length 1
Must specify the type of call to ``randtest_runs``.
:math:`\mathrm{cl} = \texttt{'S'}`
This is the one and only call to ``randtest_runs`` (single call mode). All data are to be input at once. All test statistics are computed after the counting of runs is complete.
:math:`\mathrm{cl} = \texttt{'F'}`
This is the first call to the function. All initializations are carried out and the counting of runs begins. The final test statistics are not computed since further calls will be made to ``randtest_runs``.
:math:`\mathrm{cl} = \texttt{'I'}`
This is an intermediate call during which the counts of runs are updated. The final test statistics are not computed since further calls will be made to ``randtest_runs``.
:math:`\mathrm{cl} = \texttt{'L'}`
This is the last call to ``randtest_runs``. The test statistics are computed after the final counting of runs is completed.
**x** : float, array-like, shape :math:`\left(n\right)`
The sequence of observations.
**m** : int
The maximum number of runs to be sought. If :math:`\mathrm{m}\leq 0` then no limit is placed on the number of runs that are found.
:math:`\mathrm{m}` must not be changed between calls to ``randtest_runs``.
**nruns** : int
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'F'}`, :math:`\mathrm{nruns}` need not be set.
If :math:`\mathrm{cl} = \texttt{'I'}` or :math:`\texttt{'L'}`, :math:`\mathrm{nruns}` must contain the value returned by the previous call to ``randtest_runs``.
**ncount** : int, array-like, shape :math:`\left(\textit{maxr}\right)`
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'F'}`, :math:`\mathrm{ncount}` need not be set.
If :math:`\mathrm{cl} = \texttt{'I'}` or :math:`\texttt{'L'}`, :math:`\mathrm{ncount}` must contain the values returned by the previous call to ``randtest_runs``.
**comm** : dict, communication object, modified in place
Communication structure.
`On initial entry`: need not be set.
**Returns**
**nruns** : int
The number of runs actually found.
**ncount** : int, ndarray, shape :math:`\left(\textit{maxr}\right)`
The counts of runs of the different lengths, :math:`c_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,r`.
**ex** : float, ndarray, shape :math:`\left(\textit{maxr}\right)`
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}`, (i.e., if it is the final exit) then :math:`\mathrm{ex}` contains the expected values of the counts, :math:`e_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,r`.
Otherwise the elements of :math:`\mathrm{ex}` are not set.
**cov** : float, ndarray, shape :math:`\left(\textit{maxr}, \textit{maxr}\right)`
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is the final exit) then :math:`\mathrm{cov}` contains the covariance matrix of the counts, :math:`\Sigma_c`.
Otherwise the elements of :math:`\mathrm{cov}` are not set.
**chi** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is the final exit), :math:`\mathrm{chi}` contains the approximate :math:`\chi^2` test statistic, :math:`X^2`.
Otherwise :math:`\mathrm{chi}` is not set.
**df** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is the final exit) then :math:`\mathrm{df}` contains the degrees of freedom of the :math:`\chi^2` statistic.
Otherwise :math:`\mathrm{df}` is not set.
**prob** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}`, (i.e., if it is the final exit) then :math:`\mathrm{prob}` contains the upper tail probability corresponding to the :math:`\chi^2` test statistic, i.e., the significance level.
Otherwise :math:`\mathrm{prob}` is not set.
.. _g08ea-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{cl} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{cl} = \texttt{'S'}`, :math:`\texttt{'F'}`, :math:`\texttt{'I'}` or :math:`\texttt{'L'}`.
(`errno` :math:`2`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: if :math:`\mathrm{cl} = \texttt{'S'}`, :math:`n\geq 3`, otherwise :math:`n\geq 1`.
(`errno` :math:`3`)
On entry, :math:`\mathrm{m} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: if :math:`\mathrm{cl} = \texttt{'S'}`, :math:`\mathrm{m}\leq n`.
(`errno` :math:`4`)
On entry, :math:`\textit{maxr} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: if :math:`\mathrm{cl} = \texttt{'S'}`, :math:`\textit{maxr} < n`.
(`errno` :math:`4`)
On entry, :math:`\textit{maxr} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{maxr}\geq 1`.
(`errno` :math:`7`)
There is a tie in the sequence of observations.
(`errno` :math:`8`)
The total length of the runs found is less than :math:`\textit{maxr}`.
:math:`\textit{maxr} = \langle\mathit{\boldsymbol{value}}\rangle` whereas the total length of all runs is :math:`\langle\mathit{\boldsymbol{value}}\rangle`.
(`errno` :math:`9`)
The covariance matrix stored in :math:`\mathrm{cov}` is not positive definite, thus the approximate :math:`\chi^2` test statistic cannot be computed.
This may be because :math:`\textit{maxr}` is too large relative to the length of the full sequence.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`10`)
The number of runs requested were not found, only :math:`\langle\mathit{\boldsymbol{value}}\rangle` out of the requested :math:`\langle\mathit{\boldsymbol{value}}\rangle` where found.
All statistics are returned and may still be of use.
.. _g08ea-py2-py-notes:
**Notes**
Runs tests may be used to investigate for trends in a sequence of observations. ``randtest_runs`` computes statistics for the runs up test.
If the runs down test is desired then each observation must be multiplied by :math:`-1` before ``randtest_runs`` is called with the modified vector of observations. ``randtest_runs`` may be used in two different modes:
(i) a single call to ``randtest_runs`` which computes all test statistics after counting the runs;
(#) multiple calls to ``randtest_runs`` with the final test statistics only being computed in the last call.
The second mode is necessary if all the data do not fit into the memory.
See argument :math:`\mathrm{cl}` in :ref:`Parameters <g08ea-py2-py-parameters>` for details on how to invoke each mode.
A run up is a sequence of numbers in increasing order.
A run up ends at :math:`x_k` when :math:`x_k > x_{{k+1}}` and the new run then begins at :math:`x_{{k+1}}`. ``randtest_runs`` counts the number of runs up of different lengths.
Let :math:`c_{\textit{i}}` denote the number of runs of length :math:`\textit{i}`, for :math:`\textit{i} = 1,2,\ldots,r-1`.
The number of runs of length :math:`r` or greater is then denoted by :math:`c_r`.
An unfinished run at the end of a sequence is not counted unless the sequence is part of an initial or intermediate call to ``randtest_runs`` (i.e., unless there is another call to ``randtest_runs`` to follow) in which case the unfinished run is used together with the beginning of the next sequence of numbers input to ``randtest_runs`` in the next call.
The following is a trivial example.
Suppose we called ``randtest_runs`` twice with the following two sequences:
(:math:`0.20` :math:`0.40` :math:`0.45` :math:`0.40` :math:`0.15` :math:`0.75` :math:`0.95` :math:`0.23`) and
(:math:`0.27` :math:`0.40` :math:`0.25` :math:`0.10` :math:`0.34` :math:`0.39` :math:`0.61` :math:`0.12`).
Then after the second call ``randtest_runs`` would have counted the runs up of the following lengths:
:math:`3`, :math:`1`, :math:`3`, :math:`3`, :math:`1`, and :math:`4`.
When the counting of runs is complete ``randtest_runs`` computes the expected values and covariances of the counts, :math:`c_i`.
For the details of the method used see Knuth (1981).
An approximate :math:`\chi^2` statistic with :math:`r` degrees of freedom is computed, where
.. math::
X^2 = \left(c-\mu_c\right)^\mathrm{T}\Sigma_c^{-1}\left(c-\mu_c\right)\text{,}
where
:math:`c` is the vector of counts, :math:`c_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,r`,
:math:`\mu_c` is the vector of expected values,
:math:`e_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,r`, where :math:`e_i` is the expected value for :math:`c_i` under the null hypothesis of randomness, and
:math:`\Sigma_c` is the covariance matrix of :math:`c` under the null hypothesis.
The use of the :math:`\chi^2`-distribution as an approximation to the exact distribution of the test statistic, :math:`X^2`, improves as the length of the sequence relative to :math:`m` increases and hence the expected value, :math:`e`, increases.
You may specify the total number of runs to be found.
If the specified number of runs is found before the end of a sequence ``randtest_runs`` will exit before counting any further runs.
The number of runs actually counted and used to compute the test statistic is returned via :math:`\mathrm{nruns}`.
.. _g08ea-py2-py-references:
**References**
Dagpunar, J, 1988, `Principles of Random Variate Generation`, Oxford University Press
Knuth, D E, 1981, `The Art of Computer Programming (Volume 2)`, (2nd Edition), Addison--Wesley
Morgan, B J T, 1984, `Elements of Simulation`, Chapman and Hall
Ripley, B D, 1987, `Stochastic Simulation`, Wiley
"""
raise NotImplementedError
[docs]def randtest_pairs(cl, x, lag, ncount, comm):
r"""
``randtest_pairs`` performs a pairs test on a sequence of observations in the interval :math:`\left[0, 1\right]`.
.. _g08eb-py2-py-doc:
For full information please refer to the NAG Library document for g08eb
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08ebf.html
.. _g08eb-py2-py-parameters:
**Parameters**
**cl** : str, length 1
Indicates the type of call to ``randtest_pairs``.
:math:`\mathrm{cl} = \texttt{'S'}`
This is the one and only call to ``randtest_pairs`` (single call mode). All data are to be input at once. All test statistics are computed after the counting of pairs is complete.
:math:`\mathrm{cl} = \texttt{'F'}`
This is the first call to the function. All initializations are carried out and the counting of pairs begins. The final test statistics are not computed since further calls will be made to ``randtest_pairs``.
:math:`\mathrm{cl} = \texttt{'I'}`
This is an intermediate call during which the counts of pairs are updated. The final test statistics are not computed since further calls will be made to ``randtest_pairs``.
:math:`\mathrm{cl} = \texttt{'L'}`
This is the last call to ``randtest_pairs``. The test statistics are computed after the final counting of runs is complete.
**x** : float, array-like, shape :math:`\left(n\right)`
The sequence of observations.
**lag** : int
:math:`l`, the lag to be used in choosing pairs.
If :math:`\mathrm{lag} = 1`, then we consider the pairs :math:`\left({\mathrm{x}[\textit{i}-1]}, {\mathrm{x}[\textit{i}]}\right)`, for :math:`\textit{i} = 1,3,\ldots,n-1`, where :math:`n` is the number of observations.
If :math:`\mathrm{lag} > 1`, then we consider the pairs :math:`\left({\mathrm{x}[i-1]}, {\mathrm{x}[i+l-1]}\right)`, for :math:`i = 1,2,\ldots,l,2l+1,2l+2,\ldots,3l,4l+1,\ldots,n-l`, where :math:`n` is the number of observations.
:math:`\mathrm{lag}` must not be changed between calls to ``randtest_pairs``.
**ncount** : int, array-like, shape :math:`\left(\textit{msize}, \textit{msize}\right)`
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'F'}`, :math:`\mathrm{ncount}` need not be set.
If :math:`\mathrm{cl} = \texttt{'I'}` or :math:`\texttt{'L'}`, :math:`\mathrm{ncount}` must contain the values returned by the previous call to ``randtest_pairs``.
**comm** : dict, communication object, modified in place
Communication structure.
`On initial entry`: need not be set.
**Returns**
**ncount** : int, ndarray, shape :math:`\left(\textit{msize}, \textit{msize}\right)`
Is an :math:`\textit{msize}` by :math:`\textit{msize}` matrix containing the counts of the number of pairs in each cell, :math:`c_{{ij}}`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,m`.
**ex** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{ex}` contains the expected number of counts in each cell, :math:`e`.
Otherwise :math:`\mathrm{ex}` is not set.
**chi** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{chi}` contains the :math:`\chi^2` test statistic, :math:`X^2`, for testing the null hypothesis of randomness.
Otherwise :math:`\mathrm{chi}` is not set.
**df** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{df}` contains the degrees of freedom for the :math:`\chi^2` statistic.
Otherwise :math:`\mathrm{df}` is not set.
**prob** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{prob}` contains the upper tail probability associated with the :math:`\chi^2` test statistic, i.e., the significance level.
Otherwise :math:`\mathrm{prob}` is not set.
.. _g08eb-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{cl} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{cl} = \texttt{'S'}`, :math:`\texttt{'F'}`, :math:`\texttt{'I'}` or :math:`\texttt{'L'}`.
(`errno` :math:`2`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: if :math:`\mathrm{cl} = \texttt{'S'}`, :math:`n\geq 2`, otherwise :math:`n\geq 1`.
(`errno` :math:`3`)
On entry, :math:`\textit{msize} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{msize}\geq 2`
(`errno` :math:`4`)
On entry, :math:`\mathrm{lag} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{lag} > 0` and if :math:`\mathrm{cl} = \texttt{'S'}`, :math:`\mathrm{lag} < n`.
(`errno` :math:`6`)
On entry, at least one element of :math:`\mathrm{x}` is out of range.
Constraint: :math:`0\leq \mathrm{x}[i-1]\leq 1`, for :math:`i = 1,2,\ldots,n`.
(`errno` :math:`7`)
No pairs were found. This will occur if the value of :math:`\mathrm{lag}` is greater than or equal to the total number of observations.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`8`)
:math:`\textit{msize}` is too large relative to the number of pairs, therefore, the expected value for at least one cell is less than or equal to :math:`5.0`.
This implies that the :math:`\chi^2` distribution may not be a very good approximation to the distribution of test statistic.
:math:`\textit{msize} = \langle\mathit{\boldsymbol{value}}\rangle`, number of pairs :math:`\text{} = \langle\mathit{\boldsymbol{value}}\rangle` and expected value :math:`= \langle\mathit{\boldsymbol{value}}\rangle`.
All statistics are returned and may still be of use.
.. _g08eb-py2-py-notes:
**Notes**
``randtest_pairs`` computes the statistics for performing a pairs test which may be used to investigate deviations from randomness in a sequence, :math:`x = \left\{x_i:i = 1,2,\ldots,n\right\}`, of :math:`\left[0, 1\right]` observations.
For a given lag, :math:`l\geq 1`, an :math:`m\times m` matrix, :math:`C`, of counts is formed as follows.
The element :math:`c_{{jk}}` of :math:`C` is the number of pairs :math:`\left(x_i, x_{{i+l}}\right)` such that
.. math::
\frac{{j-1}}{m}\leq x_i < \frac{j}{m}
.. math::
\frac{{k-1}}{m}\leq x_{{i+l}} < \frac{k}{m}
where :math:`i = 1,3,5,\ldots,n-1` if :math:`l = 1`, and :math:`i = 1,2,\ldots,l,2l+1,2l+2,\ldots 3l,4l+1,\ldots,n-l`, if :math:`l > 1`.
Note that all pairs formed are non-overlapping pairs and are thus independent under the assumption of randomness.
Under the assumption that the sequence is random, the expected number of pairs for each class (i.e., each element of the matrix of counts) is the same; that is, the pairs should be uniformly distributed over the unit square :math:`\left[0, 1\right]^2`.
Thus the expected number of pairs for each class is just the total number of pairs, :math:`\sum_{{j,k = 1}}^mc_{{jk}}`, divided by the number of classes, :math:`m^2`.
The :math:`\chi^2` test statistic used to test the hypothesis of randomness is defined as
.. math::
X^2 = \sum_{{j,k = 1}}^m\frac{\left(c_{{jk}}-e\right)^2}{e}\text{,}
where :math:`e = \sum_{{j,k = 1}}^mc_{{jk}}/m^2 = \text{}` expected number of pairs in each class.
The use of the :math:`\chi^2`-distribution as an approximation to the exact distribution of the test statistic, :math:`X^2`, improves as the length of the sequence relative to :math:`m` increases and hence the expected value, :math:`e`, increases.
``randtest_pairs`` may be used in two different modes:
(i) a single call to ``randtest_pairs`` which computes all test statistics after counting the pairs;
(#) multiple calls to ``randtest_pairs`` with the final test statistics only being computed in the last call.
The second mode is necessary if all the data do not fit into the memory.
See argument :math:`\mathrm{cl}` in :ref:`Parameters <g08eb-py2-py-parameters>` for details on how to invoke each mode.
.. _g08eb-py2-py-references:
**References**
Dagpunar, J, 1988, `Principles of Random Variate Generation`, Oxford University Press
Knuth, D E, 1981, `The Art of Computer Programming (Volume 2)`, (2nd Edition), Addison--Wesley
Morgan, B J T, 1984, `Elements of Simulation`, Chapman and Hall
Ripley, B D, 1987, `Stochastic Simulation`, Wiley
"""
raise NotImplementedError
[docs]def randtest_triplets(cl, x, ncount, comm):
r"""
``randtest_triplets`` performs the triplets test on a sequence of observations from the interval :math:`\left[0, 1\right]`.
.. _g08ec-py2-py-doc:
For full information please refer to the NAG Library document for g08ec
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08ecf.html
.. _g08ec-py2-py-parameters:
**Parameters**
**cl** : str, length 1
Indicates the type of call to ``randtest_triplets``.
:math:`\mathrm{cl} = \texttt{'S'}`
This is the one and only call to ``randtest_triplets`` (single call mode). All data are to be input at once. All test statistics are computed after counting of the triplets is complete.
:math:`\mathrm{cl} = \texttt{'F'}`
This is the first call to the function. All initializations are carried out and the counting of triplets begins. The final test statistics are not computed since further calls will be made to ``randtest_triplets``.
:math:`\mathrm{cl} = \texttt{'I'}`
This is an intermediate call during which counts of the triplets are updated. The final test statistics are not computed since further calls will be made to ``randtest_triplets``.
:math:`\mathrm{cl} = \texttt{'L'}`
This is the last call to ``randtest_triplets``. The test statistics are computed after the final counting of the triplets is complete.
**x** : float, array-like, shape :math:`\left(n\right)`
The sequence of observations.
**ncount** : int, array-like, shape :math:`\left(\textit{msize}, \textit{msize}, \textit{msize}\right)`
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'F'}`, :math:`\mathrm{ncount}` need not be set.
If :math:`\mathrm{cl} = \texttt{'I'}` or :math:`\texttt{'L'}`, :math:`\mathrm{ncount}` must contain the values returned by the previous call to ``randtest_triplets``.
**comm** : dict, communication object, modified in place
Communication structure.
`On initial entry`: need not be set.
**Returns**
**ncount** : int, ndarray, shape :math:`\left(\textit{msize}, \textit{msize}, \textit{msize}\right)`
Is an :math:`\textit{msize}` by :math:`\textit{msize}` by :math:`\textit{msize}` matrix containing the counts of the number of triplets, :math:`c_{{jkl}}`, for :math:`\textit{l} = 1,2,\ldots,m`, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`.
**ex** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{ex}` contains the expected number of counts for each element of the count matrix.
Otherwise :math:`\mathrm{ex}` is not set.
**chi** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{chi}` contains the :math:`\chi^2` test statistic, :math:`X^2`, for testing the null hypothesis of randomness.
Otherwise :math:`\mathrm{chi}` is not set.
**df** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{df}` contains the degrees of freedom for the :math:`\chi^2` statistic.
Otherwise :math:`\mathrm{df}` is not set.
**prob** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{prob}` contains the upper tail probability associated with the :math:`\chi^2` test statistic, i.e., the significance level.
Otherwise :math:`\mathrm{prob}` is not set.
.. _g08ec-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{cl} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{cl} = \texttt{'S'}`, :math:`\texttt{'F'}`, :math:`\texttt{'I'}` or :math:`\texttt{'L'}`.
(`errno` :math:`2`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: if :math:`\mathrm{cl} = \texttt{'S'}`, :math:`n\geq 3`, otherwise :math:`n\geq 1`.
(`errno` :math:`3`)
On entry, :math:`\textit{msize} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{msize}\geq 2`.
(`errno` :math:`5`)
On entry, at least one element of :math:`\mathrm{x}` is out of range.
Constraint: :math:`0\leq \mathrm{x}[i-1]\leq 1`, for :math:`i = 1,2,\ldots,n`.
(`errno` :math:`6`)
No triplets were found because less than :math:`3` observations were provided in total.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`7`)
:math:`\textit{msize}` is too large relative to the number of triplets, therefore, the expected value for at least one cell is less than or equal to :math:`5.0`.
This implies that the :math:`\chi^2` distribution may not be a very good approximation to the distribution of the test statistic.
:math:`\textit{msize} = \langle\mathit{\boldsymbol{value}}\rangle`, number of triplets :math:`\text{} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\text{expected value} = \langle\mathit{\boldsymbol{value}}\rangle`.
All statistics are returned and may still be of use.
.. _g08ec-py2-py-notes:
**Notes**
``randtest_triplets`` computes the statistics for performing a triplets test which may be used to investigate deviations from randomness in a sequence, :math:`x = \left\{x_i:i = 1,2,\ldots,n\right\}`, of :math:`\left[0, 1\right]` observations.
An :math:`m\times m` matrix, :math:`C`, of counts is formed as follows.
The element :math:`c_{\mathrm{jkl}}` of :math:`C` is the number of triplets :math:`\left(x_i, x_{{i+1}}, x_{{i+2}}\right)` for :math:`i = 1,4,7,\ldots,n-2`, such that
.. math::
\frac{{j-1}}{m}\leq x_i < \frac{j}{m}
.. math::
\frac{{k-1}}{m}\leq x_{{i+1}} < \frac{k}{m}
.. math::
\frac{{l-1}}{m}\leq x_{{i+2}} < \frac{l}{m}\text{.}
Note that all triplets formed are non-overlapping and are thus independent under the assumption of randomness.
Under the assumption that the sequence is random, the expected number of triplets for each class (i.e., each element of the count matrix) is the same; that is, the triplets should be uniformly distributed over the unit cube :math:`\left[0, 1\right]^3`.
Thus the expected number of triplets for each class is just the total number of triplets, :math:`\sum_{{j,k,l = 1}}^mc_{\mathrm{jkl}}`, divided by the number of classes, :math:`m^3`.
The :math:`\chi^2` test statistic used to test the hypothesis of randomness is defined as
.. math::
X^2 = \sum_{{j,k,l = 1}}^m\frac{\left(c_{\mathrm{jkl}}-e\right)^2}{e}\text{,}
where :math:`e = \sum_{{j,k,l = 1}}^mc_{\mathrm{jkl}}/m^3 = \text{}` expected number of triplets in each class.
The use of the :math:`\chi^2`-distribution as an approximation to the exact distribution of the test statistic, :math:`X^2`, improves as the length of the sequence relative to :math:`m` increases and hence the expected value, :math:`e`, increases.
``randtest_triplets`` may be used in two different modes:
(i) a single call to ``randtest_triplets`` which computes all test statistics after counting the triplets;
(#) multiple calls to ``randtest_triplets`` with the final test statistics only being computed in the last call.
The second mode is necessary if all the data do not fit into the memory.
See argument :math:`\mathrm{cl}` in :ref:`Parameters <g08ec-py2-py-parameters>` for details on how to invoke each mode.
.. _g08ec-py2-py-references:
**References**
Dagpunar, J, 1988, `Principles of Random Variate Generation`, Oxford University Press
Knuth, D E, 1981, `The Art of Computer Programming (Volume 2)`, (2nd Edition), Addison--Wesley
Morgan, B J T, 1984, `Elements of Simulation`, Chapman and Hall
Ripley, B D, 1987, `Stochastic Simulation`, Wiley
"""
raise NotImplementedError
[docs]def randtest_gaps(cl, x, m, rlo, rup, totlen, ngaps, ncount, comm):
r"""
``randtest_gaps`` performs a gaps test on a sequence of observations.
.. _g08ed-py2-py-doc:
For full information please refer to the NAG Library document for g08ed
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08edf.html
.. _g08ed-py2-py-parameters:
**Parameters**
**cl** : str, length 1
Indicates the type of call to ``randtest_gaps``.
:math:`\mathrm{cl} = \texttt{'S'}`
This is the one and only call to ``randtest_gaps`` (single call mode). All data are to be input at once. All test statistics are computed after the counting of gaps is complete.
:math:`\mathrm{cl} = \texttt{'F'}`
This is the first call to the function. All initializations are carried out before the counting of gaps begins. The final test statistics are not computed since further calls will be made to ``randtest_gaps``.
:math:`\mathrm{cl} = \texttt{'I'}`
This is an intermediate call during which the counts of gaps are updated. The final test statistics are not computed since further calls will be made to ``randtest_gaps``.
:math:`\mathrm{cl} = \texttt{'L'}`
This is the last call to ``randtest_gaps``. The test statistics are computed after the final counting of gaps is complete.
**x** : float, array-like, shape :math:`\left(n\right)`
The sequence of observations.
**m** : int
The maximum number of gaps to be sought. If :math:`\mathrm{m}\leq 0` then there is no limit placed on the number of gaps that are found.
:math:`\mathrm{m}` should not be changed between calls to ``randtest_gaps``.
**rlo** : float
The lower limit of the interval to be used to define the gaps, :math:`r_l`.
:math:`\mathrm{rlo}` must not be changed between calls to ``randtest_gaps``.
**rup** : float
The upper limit of the interval to be used to define the gaps, :math:`r_u`.
:math:`\mathrm{rup}` must not be changed between calls to ``randtest_gaps``.
**totlen** : float
The total length of the interval which contains all possible numbers that may arise in the sequence.
**ngaps** : int
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'F'}`, :math:`\mathrm{ngaps}` need not be set.
If :math:`\mathrm{cl} = \texttt{'I'}` or :math:`\texttt{'L'}`, :math:`\mathrm{ngaps}` must contain the value returned by the previous call to ``randtest_gaps``.
**ncount** : int, array-like, shape :math:`\left(\textit{maxg}\right)`
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'F'}`, :math:`\mathrm{ncount}` need not be set.
If :math:`\mathrm{cl} = \texttt{'I'}` or :math:`\texttt{'L'}`, :math:`\mathrm{ncount}` must contain the values returned by the previous call to ``randtest_gaps``.
**comm** : dict, communication object, modified in place
Communication structure.
`On initial entry`: need not be set.
**Returns**
**ngaps** : int
The number of gaps actually found, :math:`\textit{ngaps}`.
**ncount** : int, ndarray, shape :math:`\left(\textit{maxg}\right)`
The counts of gaps of the different lengths, :math:`c_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,k`.
**ex** : float, ndarray, shape :math:`\left(\textit{maxg}\right)`
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{ex}` contains the expected values of the counts.
Otherwise the elements of :math:`\mathrm{ex}` are not set.
**chi** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{chi}` contains the :math:`\chi^2` test statistic, :math:`X^2`, for testing the null hypothesis of randomness.
Otherwise :math:`\mathrm{chi}` is not set.
**df** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{df}` contains the degrees of freedom for the :math:`\chi^2` statistic.
Otherwise :math:`\mathrm{df}` is not set.
**prob** : float
If :math:`\mathrm{cl} = \texttt{'S'}` or :math:`\texttt{'L'}` (i.e., if it is a final exit) then :math:`\mathrm{prob}` contains the upper tail probability associated with the :math:`\chi^2` test statistic, i.e., the significance level.
Otherwise :math:`\mathrm{prob}` is not set.
.. _g08ed-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{cl} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{cl} = \texttt{'S'}`, :math:`\texttt{'F'}`, :math:`\texttt{'I'}` or :math:`\texttt{'L'}`.
(`errno` :math:`2`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n \geq 1`.
(`errno` :math:`3`)
On entry, :math:`\mathrm{m} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: if :math:`\mathrm{cl} = \texttt{'S'}`, :math:`\mathrm{m}\leq n`.
(`errno` :math:`4`)
On entry, :math:`\textit{maxg} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: if :math:`\mathrm{cl} = \texttt{'S'}`, :math:`\textit{maxg}\leq n`.
(`errno` :math:`4`)
On entry, :math:`\textit{maxg} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{maxg} > 1`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{rlo} = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{rup} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{totlen} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{rup}-\mathrm{rlo} < \mathrm{totlen}`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{totlen} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{totlen} > 0.0`.
(`errno` :math:`5`)
On entry, :math:`\mathrm{rlo} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{rup} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{rup} > \mathrm{rlo}`.
(`errno` :math:`6`)
No gaps were found. Try using a longer sequence, or increase the size of the interval :math:`\mathrm{rup}-\mathrm{rlo}`.
(`errno` :math:`7`)
The expected frequency in class :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` is zero. The value of :math:`\left(\mathrm{rup}-\mathrm{rlo}\right)/\mathrm{totlen}` may be too close to :math:`0.0` or :math:`1.0`. or :math:`\textit{maxg}` is too large relative to the number of gaps found.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`8`)
The number of gaps requested were not found, only :math:`\langle\mathit{\boldsymbol{value}}\rangle` out of the requested :math:`\langle\mathit{\boldsymbol{value}}\rangle` where found.
All statistics are returned and may still be of use.
(`errno` :math:`9`)
The expected frequency of at least one class is less than :math:`1`.
This implies that the :math:`\chi^2` may not be a very good approximation to the distribution of the test statistics.
All statistics are returned and may still be of use.
.. _g08ed-py2-py-notes:
**Notes**
Gaps tests are used to test for cyclical trend in a sequence of observations. ``randtest_gaps`` computes certain statistics for the gaps test.
``randtest_gaps`` may be used in two different modes:
(i) a single call to ``randtest_gaps`` which computes all test statistics after counting the gaps;
(#) multiple calls to ``randtest_gaps`` with the final test statistics only being computed in the last call.
The second mode is necessary if all the data does not fit into the memory.
See argument :math:`\mathrm{cl}` in :ref:`Parameters <g08ed-py2-py-parameters>` for details on how to invoke each mode.
The term gap is used to describe the distance between two numbers in the sequence that lie in the interval :math:`\left(r_l, r_u\right)`.
That is, a gap ends at :math:`x_j` if :math:`r_l\leq x_j\leq r_u`.
The next gap then begins at :math:`x_{{j+1}}`.
The interval :math:`\left(r_l, r_u\right)` should lie within the region of all possible numbers.
For example if the test is carried out on a sequence of :math:`\left(0, 1\right)` random numbers then the interval :math:`\left(r_l, r_u\right)` must be contained in the whole interval :math:`\left(0, 1\right)`.
Let :math:`t_{\text{len}}` be the length of the interval which specifies all possible numbers.
``randtest_gaps`` counts the number of gaps of different lengths.
Let :math:`c_{\textit{i}}` denote the number of gaps of length :math:`\textit{i}`, for :math:`\textit{i} = 1,2,\ldots,k-1`.
The number of gaps of length :math:`k` or greater is then denoted by :math:`c_k`.
An unfinished gap at the end of a sequence is not counted unless the sequence is part of an initial or intermediate call to ``randtest_gaps`` (i.e., unless there is another call to ``randtest_gaps`` to follow) in which case the unfinished gap is used.
The following is a trivial example.
Suppose we called ``randtest_gaps`` twice (i.e., with :math:`\mathrm{cl} = \texttt{'F'}` and then with :math:`\mathrm{cl} = \texttt{'L'}`) with the following two sequences and with :math:`\mathrm{rlo} = 0.30` and :math:`\mathrm{rup} = 0.60`:
(:math:`0.20` :math:`0.40` :math:`0.45` :math:`0.40` :math:`0.15` :math:`0.75` :math:`0.95` :math:`0.23`) and
(:math:`0.27` :math:`0.40` :math:`0.25` :math:`0.10` :math:`0.34` :math:`0.39` :math:`0.61` :math:`0.12`).
Then after the second call ``randtest_gaps`` would have counted the gaps of the following lengths:
:math:`2`, :math:`1`, :math:`1`, :math:`6`, :math:`3` and :math:`1`.
When the counting of gaps is complete ``randtest_gaps`` computes the expected values of the counts.
An approximate :math:`\chi^2` statistic with :math:`k` degrees of freedom is computed where
.. math::
X^2 = \frac{{\sum_{{i = 1}}^k\left(c_i-e_i\right)^2}}{e_i}\text{,}
where
:math:`e_i = \textit{ngaps}\times p\times \left(1-p\right)^{{i-1}}`, if :math:`i < k`;
:math:`e_i = \textit{ngaps}\times \left(1-p\right)^{{i-1}}`, if :math:`i = k`;
:math:`\textit{ngaps} = \text{}` the number of gaps found and
:math:`p = \left(r_u-r_l\right)/t_{\text{len}}`.
The use of the :math:`\chi^2`-distribution as an approximation to the exact distribution of the test statistic improves as the expected values increase.
You may specify the total number of gaps to be found.
If the specified number of gaps is found before the end of a sequence ``randtest_gaps`` will exit before counting any further gaps.
.. _g08ed-py2-py-references:
**References**
Dagpunar, J, 1988, `Principles of Random Variate Generation`, Oxford University Press
Knuth, D E, 1981, `The Art of Computer Programming (Volume 2)`, (2nd Edition), Addison--Wesley
Morgan, B J T, 1984, `Elements of Simulation`, Chapman and Hall
Ripley, B D, 1987, `Stochastic Simulation`, Wiley
"""
raise NotImplementedError
[docs]def rank_regsn(nv, y, x, idist, nmax, tol):
r"""
``rank_regsn`` calculates the parameter estimates, score statistics and their variance-covariance matrices for the linear model using a likelihood based on the ranks of the observations.
.. _g08ra-py2-py-doc:
For full information please refer to the NAG Library document for g08ra
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08raf.html
.. _g08ra-py2-py-parameters:
**Parameters**
**nv** : int, array-like, shape :math:`\left(\textit{ns}\right)`
The number of observations in the :math:`\textit{i}`\ th sample, for :math:`\textit{i} = 1,2,\ldots,\textit{ns}`.
**y** : float, array-like, shape :math:`\left(\mathrm{sum}\left(\mathrm{nv}\right)\right)`
The observations in each sample. Specifically, :math:`\mathrm{y}[ {\sum_{{k = 1}}^{{i-1}}\mathrm{nv}[k-1]+j} -1]` must contain the :math:`j`\ th observation in the :math:`i`\ th sample.
**x** : float, array-like, shape :math:`\left(\textit{nsum}, \textit{ip}\right)`
The design matrices for each sample. Specifically, :math:`\mathrm{x}[ \sum_{{k = 1}}^{{i-1}} \mathrm{nv}[k-1] +j -1,l-1]` must contain the value of the :math:`l`\ th explanatory variable for the :math:`j`\ th observation in the :math:`i`\ th sample.
**idist** : int
The error distribution to be used in the analysis.
:math:`\mathrm{idist} = 1`
Normal.
:math:`\mathrm{idist} = 2`
Logistic.
:math:`\mathrm{idist} = 3`
Extreme value.
:math:`\mathrm{idist} = 4`
Double-exponential.
**nmax** : int
The value of the largest sample size.
**tol** : float
The tolerance for judging whether two observations are tied. Thus, observations :math:`Y_i` and :math:`Y_j` are adjudged to be tied if :math:`\left\lvert Y_i-Y_j\right\rvert < \mathrm{tol}`.
**Returns**
**prvr** : float, ndarray, shape :math:`\left(\textit{ip}+1, \textit{ip}\right)`
The variance-covariance matrices of the score statistics and the parameter estimates, the former being stored in the upper triangle and the latter in the lower triangle. Thus for :math:`1\leq i\leq j\leq \textit{ip}`, :math:`\mathrm{prvr}[i-1,j-1]` contains an estimate of the covariance between the :math:`i`\ th and :math:`j`\ th score statistics. For :math:`1\leq j\leq i\leq \textit{ip}-1`, :math:`\mathrm{prvr}[i,j-1]` contains an estimate of the covariance between the :math:`i`\ th and :math:`j`\ th parameter estimates.
**irank** : int, ndarray, shape :math:`\left(\mathrm{nmax}\right)`
For the one sample case, :math:`\mathrm{irank}` contains the ranks of the observations.
**zin** : float, ndarray, shape :math:`\left(\mathrm{nmax}\right)`
For the one sample case, :math:`\mathrm{zin}` contains the expected values of the function :math:`g\left(.\right)` of the order statistics.
**eta** : float, ndarray, shape :math:`\left(\mathrm{nmax}\right)`
For the one sample case, :math:`\mathrm{eta}` contains the expected values of the function :math:`g\prime \left(.\right)` of the order statistics.
**vapvec** : float, ndarray, shape :math:`\left(\mathrm{nmax}\times \left(\mathrm{nmax}+1\right)/2\right)`
For the one sample case, :math:`\mathrm{vapvec}` contains the upper triangle of the variance-covariance matrix of the function :math:`g\left(.\right)` of the order statistics stored column-wise.
**parest** : float, ndarray, shape :math:`\left(4\times \textit{ip}+1\right)`
The statistics calculated by the function.
The first :math:`\textit{ip}` components of :math:`\mathrm{parest}` contain the score statistics.
The next :math:`\textit{ip}` elements contain the parameter estimates.
:math:`\mathrm{parest}[2\times \textit{ip}]` contains the value of the :math:`\chi^2` statistic.
The next :math:`\textit{ip}` elements of :math:`\mathrm{parest}` contain the standard errors of the parameter estimates.
Finally, the remaining :math:`\textit{ip}` elements of :math:`\mathrm{parest}` contain the :math:`z`-statistics.
.. _g08ra-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\textit{max}_i\left(\mathrm{nv}[i]\right) = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{nmax} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{max}_i\left(\mathrm{nv}[i]\right) = \mathrm{nmax}`.
(`errno` :math:`1`)
On entry, :math:`\sum_i\left(\mathrm{nv}[i]\right) = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{nsum} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\sum_i\left(\mathrm{nv}[i]\right) = \textit{nsum}`.
(`errno` :math:`1`)
On entry, :math:`\langle\mathit{\boldsymbol{value}}\rangle` elements of :math:`\mathrm{nv}\text{ are } < 1`.
Constraint: :math:`\mathrm{nv}[i]\geq 1`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{nmax} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{nmax} > \textit{ip}`.
(`errno` :math:`1`)
On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{ip}\geq 1`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{tol} > 0.0`.
(`errno` :math:`1`)
On entry, :math:`\textit{ns} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{ns}\geq 1`.
(`errno` :math:`2`)
On entry, :math:`\mathrm{idist} = \langle\mathit{\boldsymbol{value}}\rangle`.
On entry, :math:`\mathrm{idist} = 1`, :math:`2`, :math:`3` or :math:`4`.
(`errno` :math:`3`)
On entry, all the observations were adjudged to be tied.
(`errno` :math:`4`)
The matrix :math:`X^\mathrm{T}\left(B-A\right)X` is either ill-conditioned or not positive definite.
(`errno` :math:`5`)
On entry, for :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{x}[i,j-1] = \langle\mathit{\boldsymbol{value}}\rangle` for all :math:`i`.
Constraint: :math:`\mathrm{x}[i,j-1]\neq \mathrm{x}[i+1,j-1]` for at least one :math:`i`.
.. _g08ra-py2-py-notes:
**Notes**
Analysis of data can be made by replacing observations by their ranks.
The analysis produces inference for regression parameters arising from the following model.
For random variables :math:`Y_1,Y_2,\ldots,Y_n` we assume that, after an arbitrary monotone increasing differentiable transformation, :math:`h\left(.\right)`, the model
.. math::
h\left(Y_i\right) = x_i^\mathrm{T}\beta +\epsilon_i
holds, where :math:`x_i` is a known vector of explanatory variables and :math:`\beta` is a vector of :math:`p` unknown regression coefficients.
The :math:`\epsilon_i` are random variables assumed to be independent and identically distributed with a completely known distribution which can be one of the following: Normal, logistic, extreme value or double-exponential.
In Pettitt (1982) an estimate for :math:`\beta` is proposed as :math:`\hat{\beta } = MX^\mathrm{T}a` with estimated variance-covariance matrix :math:`M`.
The statistics :math:`a` and :math:`M` depend on the ranks :math:`r_i` of the observations :math:`Y_i` and the density chosen for :math:`\epsilon_i`.
The matrix :math:`X` is the :math:`n\times p` matrix of explanatory variables.
It is assumed that :math:`X` is of rank :math:`p` and that a column or a linear combination of columns of :math:`X` is not equal to the column vector of :math:`1` or a multiple of it.
This means that a constant term cannot be included in the model `(1) <https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08raf.html#eqn1>`__.
The statistics :math:`a` and :math:`M` are found as follows.
Let :math:`\epsilon_i` have pdf :math:`f\left(\epsilon \right)` and let :math:`g = -f^{\prime }/f`.
Let :math:`W_1,W_2,\ldots,W_n` be order statistics for a random sample of size :math:`n` with the density :math:`f\left(.\right)`.
Define :math:`Z_i = g\left(W_i\right)`, then :math:`a_i = E\left(Z_{r_i}\right)`.
To define :math:`M` we need :math:`M^{-1} = X^\mathrm{T}\left(B-A\right)X`, where :math:`B` is an :math:`n\times n` diagonal matrix with :math:`B_{{ii}} = E\left(g^{\prime }\left(W_{r_i}\right)\right)` and :math:`A` is a symmetric matrix with :math:`A_{{ij}} = \mathrm{cov}\left(Z_{r_i},Z_{r_j}\right)`.
In the case of the Normal distribution, the :math:`Z_1 < \cdots < Z_n` are standard Normal order statistics and :math:`E\left(g^{\prime }\left(W_i\right)\right) = 1`, for :math:`i = 1,2,\ldots,n`.
The analysis can also deal with ties in the data.
Two observations are adjudged to be tied if :math:`\left\lvert Y_i-Y_j\right\rvert < \mathrm{tol}`, where :math:`\mathrm{tol}` is a user-supplied tolerance level.
Various statistics can be found from the analysis:
(a) The score statistic :math:`X^\mathrm{T}a`. This statistic is used to test the hypothesis :math:`H_0:\beta = 0`, see (e).
(#) The estimated variance-covariance matrix :math:`X^\mathrm{T}\left(B-A\right)X` of the score statistic in (a).
(#) The estimate :math:`\hat{\beta } = MX^\mathrm{T}a`.
(#) The estimated variance-covariance matrix :math:`M = {\left(X^\mathrm{T}\left(B-A\right)X\right)}^{-1}` of the estimate :math:`\hat{\beta }`.
(#) The :math:`\chi^2` statistic :math:`Q = \hat{\beta }^\mathrm{T}M^{-1}\hat{\beta } = a^\mathrm{T}X{\left(X^\mathrm{T}\left(B-A\right)X\right)}^{-1}X^\mathrm{T}a` used to test :math:`H_0:\beta = 0`. Under :math:`H_0`, :math:`Q` has an approximate :math:`\chi^2`-distribution with :math:`p` degrees of freedom.
(#) The standard errors :math:`M_{{ii}}^{{1/2}}` of the estimates given in (c).
(#) Approximate :math:`z`-statistics, i.e., :math:`Z_i = \hat{\beta }_i/se\left(\hat{\beta }_i\right)` for testing :math:`H_0:\beta_i = 0`. For :math:`i = 1,2,\ldots,n`, :math:`Z_i` has an approximate :math:`N\left(0, 1\right)` distribution.
In many situations, more than one sample of observations will be available.
In this case we assume the model
.. math::
h_k\left(Y_k\right) = X_k^\mathrm{T}\beta +e_k\text{, }\quad k = 1,2,\ldots,\textit{ns}\text{,}
where :math:`\textit{ns}` is the number of samples.
In an obvious manner, :math:`Y_k` and :math:`X_k` are the vector of observations and the design matrix for the :math:`k`\ th sample respectively.
Note that the arbitrary transformation :math:`h_k` can be assumed different for each sample since observations are ranked within the sample.
The earlier analysis can be extended to give a combined estimate of :math:`\beta` as :math:`\hat{\beta } = Dd`, where
.. math::
D^{-1} = \sum_{{k = 1}}^{\textit{ns}}X_k^\mathrm{T}\left(B_k-A_k\right)X_k
and
.. math::
d = \sum_{{k = 1}}^{\textit{ns}}X_k^\mathrm{T}a_k\text{,}
with :math:`a_k`, :math:`B_k` and :math:`A_k` defined as :math:`a`, :math:`B` and :math:`A` above but for the :math:`k`\ th sample.
The remaining statistics are calculated as for the one sample case.
.. _g08ra-py2-py-references:
**References**
Pettitt, A N, 1982, `Inference for the linear model using a likelihood based on ranks`, J. Roy. Statist. Soc. Ser. B (44), 234--243
"""
raise NotImplementedError
[docs]def rank_regsn_censored(nv, y, x, icen, gamma, nmax, tol):
r"""
``rank_regsn_censored`` calculates the parameter estimates, score statistics and their variance-covariance matrices for the linear model using a likelihood based on the ranks of the observations when some of the observations may be right-censored.
.. _g08rb-py2-py-doc:
For full information please refer to the NAG Library document for g08rb
https://support.nag.com/numeric/nl/nagdoc_30.1/flhtml/g08/g08rbf.html
.. _g08rb-py2-py-parameters:
**Parameters**
**nv** : int, array-like, shape :math:`\left(\textit{ns}\right)`
The number of observations in the :math:`\textit{i}`\ th sample, for :math:`\textit{i} = 1,2,\ldots,\textit{ns}`.
**y** : float, array-like, shape :math:`\left(\mathrm{sum}\left(\mathrm{nv}\right)\right)`
The observations in each sample. Specifically, :math:`\mathrm{y}[ {\sum_{{k = 1}}^{{i-1}}\mathrm{nv}[k-1]+j} -1]` must contain the :math:`j`\ th observation in the :math:`i`\ th sample.
**x** : float, array-like, shape :math:`\left(\textit{nsum}, \textit{ip}\right)`
The design matrices for each sample. Specifically, :math:`\mathrm{x}[ \sum_{{k = 1}}^{{i-1}} \mathrm{nv}[k-1] + j -1,l-1]` must contain the value of the :math:`l`\ th explanatory variable for the :math:`j`\ th observations in the :math:`i`\ th sample.
**icen** : int, array-like, shape :math:`\left(\mathrm{sum}\left(\mathrm{nv}\right)\right)`
Defines the censoring variable for the observations in :math:`\mathrm{y}`.
:math:`\mathrm{icen}[i-1] = 0`
If :math:`\mathrm{y}[i-1]` is uncensored.
:math:`\mathrm{icen}[i-1] = 1`
If :math:`\mathrm{y}[i-1]` is censored.
**gamma** : float
The value of the parameter defining the generalized logistic distribution. For :math:`\mathrm{gamma}\leq 0.0001`, the limiting extreme value distribution is assumed.
**nmax** : int
The value of the largest sample size.
**tol** : float
The tolerance for judging whether two observations are tied. Thus, observations :math:`Y_i` and :math:`Y_j` are adjudged to be tied if :math:`\left\lvert Y_i-Y_j\right\rvert < \mathrm{tol}`.
**Returns**
**prvr** : float, ndarray, shape :math:`\left(\textit{ip}+1, \textit{ip}\right)`
The variance-covariance matrices of the score statistics and the parameter estimates, the former being stored in the upper triangle and the latter in the lower triangle. Thus for :math:`1\leq i\leq j\leq \textit{ip}`, :math:`\mathrm{prvr}[i-1,j-1]` contains an estimate of the covariance between the :math:`i`\ th and :math:`j`\ th score statistics. For :math:`1\leq j\leq i\leq \textit{ip}-1`, :math:`\mathrm{prvr}[i,j-1]` contains an estimate of the covariance between the :math:`i`\ th and :math:`j`\ th parameter estimates.
**irank** : int, ndarray, shape :math:`\left(\mathrm{nmax}\right)`
For the one sample case, :math:`\mathrm{irank}` contains the ranks of the observations.
**zin** : float, ndarray, shape :math:`\left(\mathrm{nmax}\right)`
For the one sample case, :math:`\mathrm{zin}` contains the expected values of the function :math:`g\left(.\right)` of the order statistics.
**eta** : float, ndarray, shape :math:`\left(\mathrm{nmax}\right)`
For the one sample case, :math:`\mathrm{eta}` contains the expected values of the function :math:`g\prime \left(.\right)` of the order statistics.
**vapvec** : float, ndarray, shape :math:`\left(\mathrm{nmax}\times \left(\mathrm{nmax}+1\right)/2\right)`
For the one sample case, :math:`\mathrm{vapvec}` contains the upper triangle of the variance-covariance matrix of the function :math:`g\left(.\right)` of the order statistics stored column-wise.
**parest** : float, ndarray, shape :math:`\left(4\times \textit{ip}+1\right)`
The statistics calculated by the function.
The first :math:`\textit{ip}` components of :math:`\mathrm{parest}` contain the score statistics.
The next :math:`\textit{ip}` elements contain the parameter estimates.
:math:`\mathrm{parest}[2\times \textit{ip}]` contains the value of the :math:`\chi^2` statistic.
The next :math:`\textit{ip}` elements of :math:`\mathrm{parest}` contain the standard errors of the parameter estimates.
Finally, the remaining :math:`\textit{ip}` elements of :math:`\mathrm{parest}` contain the :math:`z`-statistics.
.. _g08rb-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{gamma} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{gamma}\geq 0.0`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{nmax} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{nmax} > \textit{ip}`.
(`errno` :math:`1`)
On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{ip}\geq 1`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{tol} > 0.0`.
(`errno` :math:`1`)
On entry, :math:`\textit{ns} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{ns}\geq 1`.
(`errno` :math:`1`)
On entry, :math:`\textit{max}_i\left(\mathrm{nv}[i]\right) = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{nmax} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{max}_i\left(\mathrm{nv}[i]\right) = \mathrm{nmax}`.
(`errno` :math:`1`)
On entry, :math:`\sum_i\left(\mathrm{nv}[i]\right) = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{nsum} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\sum_i\left(\mathrm{nv}[i]\right) = \textit{nsum}`.
(`errno` :math:`1`)
On entry, :math:`\langle\mathit{\boldsymbol{value}}\rangle` elements of :math:`\mathrm{nv}\text{ are } < 1`.
Constraint: :math:`\mathrm{nv}[i]\geq 1`.
(`errno` :math:`2`)
On entry, :math:`\langle\mathit{\boldsymbol{value}}\rangle` elements of :math:`\mathrm{icen}` are out of range.
Constraint: :math:`\mathrm{icen}[i] = 0` or :math:`1`, for all :math:`i`.
(`errno` :math:`3`)
On entry, all the observations were adjudged to be tied.
(`errno` :math:`4`)
The matrix :math:`X^\mathrm{T}\left(B-A\right)X` is either ill-conditioned or not positive definite.
(`errno` :math:`5`)
On entry, for :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{x}[i,j-1] = \langle\mathit{\boldsymbol{value}}\rangle` for all :math:`i`.
Constraint: :math:`\mathrm{x}[i,j-1]\neq \mathrm{x}[i+1,j-1]` for at least one :math:`i`.
.. _g08rb-py2-py-notes:
**Notes**
Analysis of data can be made by replacing observations by their ranks.
The analysis produces inference for the regression model where the location parameters of the observations, :math:`\theta_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`, are related by :math:`\theta = X\beta`.
Here :math:`X` is an :math:`n\times p` matrix of explanatory variables and :math:`\beta` is a vector of :math:`p` unknown regression parameters.
The observations are replaced by their ranks and an approximation, based on Taylor's series expansion, made to the rank marginal likelihood.
For details of the approximation see Pettitt (1982).
An observation is said to be right-censored if we can only observe :math:`Y_j^*` with :math:`Y_j^*\leq Y_j`.
We rank censored and uncensored observations as follows.
Suppose we can observe :math:`Y_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,n`, directly but :math:`Y_j^*`, for :math:`\textit{j} = n+1,\ldots,q` and :math:`n\leq q`, are censored on the right.
We define the rank :math:`r_j` of :math:`Y_j`, for :math:`j = 1,2,\ldots,n`, in the usual way; :math:`r_j` equals :math:`i` if and only if :math:`Y_j` is the :math:`i`\ th smallest amongst the :math:`Y_1,Y_2,\ldots,Y_n`.
The right-censored :math:`Y_j^*`, for :math:`j = n+1,n+2,\ldots,q`, has rank :math:`r_j` if and only if :math:`Y_j^*` lies in the interval :math:`\left[{Y_{\left(r_j\right)}}, {Y_{\left(r_j+1\right)}}\right]`, with :math:`Y_0 = {-\infty }`, :math:`Y_{\left(n+1\right)} = {+\infty }` and :math:`Y_{\left(1\right)} < \cdots < Y_{\left(n\right)}` the ordered :math:`Y_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,n`.
The distribution of the :math:`Y` is assumed to be of the following form.
Let :math:`F_L\left(y\right) = e^y/\left(1+e^y\right)`, the logistic distribution function, and consider the distribution function :math:`F_{\gamma }\left(y\right)` defined by :math:`1-F_{\gamma } = {\left[1-F_L\left(y\right)\right]}^{{1/\gamma }}`.
This distribution function can be thought of as either the distribution function of the minimum, :math:`X_{{1,\gamma }}`, of a random sample of size :math:`\gamma^{-1}` from the logistic distribution, or as the :math:`F_{\gamma }\left(y-\log\left(\gamma \right)\right)` being the distribution function of a random variable having the :math:`F`-distribution with :math:`2` and :math:`2\gamma^{-1}` degrees of freedom.
This family of generalized logistic distribution functions :math:`\left[F_{\gamma }\left(.\right)\text{;}0\leq \gamma < \infty \right]` naturally links the symmetric logistic distribution :math:`\left(\gamma = 1\right)` with the skew extreme value distribution (:math:`\mathrm{lim}\left(\gamma \right)→0`) and with the limiting negative exponential distribution (:math:`\mathrm{lim}\left(\gamma \right)→\infty`).
For this family explicit results are available for right-censored data.
See Pettitt (1983) for details.
Let :math:`l_R` denote the logarithm of the rank marginal likelihood of the observations and define the :math:`q\times 1` vector :math:`a` by :math:`a = l_R^{\prime }\left(\theta = 0\right)`, and let the :math:`q\times q` diagonal matrix :math:`B` and :math:`q\times q` symmetric matrix :math:`A` be given by :math:`B-A = -l_R^{{\prime \prime }}\left(\theta = 0\right)`.
Then various statistics can be found from the analysis.
(a) The score statistic :math:`X^\mathrm{T}a`. This statistic is used to test the hypothesis :math:`H_0:\beta = 0` (see \(e)).
(#) The estimated variance-covariance matrix of the score statistic in \(a).
(#) The estimate :math:`\hat{\beta }_R = MX^\mathrm{T}a`.
(#) The estimated variance-covariance matrix :math:`M = {\left(X^\mathrm{T}\left(B-A\right)X\right)}^{-1}` of the estimate :math:`\hat{\beta }_R`.
(#) The :math:`\chi^2` statistic :math:`Q = \hat{\beta }_RM^{-1}\text{ }\hat{\beta }_r = a^\mathrm{T}X{\left(X^\mathrm{T}\left(B-A\right)X\right)}^{-1}X^\mathrm{T}a`, used to test :math:`H_0:\beta = 0`. Under :math:`H_0`, :math:`Q` has an approximate :math:`\chi^2`-distribution with :math:`p` degrees of freedom.
(#) The standard errors :math:`M_{{ii}}^{{1/2}}` of the estimates given in \(c).
(#) Approximate :math:`z`-statistics, i.e., :math:`Z_i = \hat{\beta }_{R_i}/se\left(\hat{\beta }_{R_i}\right)` for testing :math:`H_0:\beta_i = 0`. For :math:`i = 1,2,\ldots,n`, :math:`Z_i` has an approximate :math:`N\left(0, 1\right)` distribution.
In many situations, more than one sample of observations will be available.
In this case we assume the model,
.. math::
h_k\left(Y_k\right) = X_k^\mathrm{T}\beta +e_k\text{, }\quad k = 1,2,\ldots,\textit{ns}\text{,}
where :math:`\textit{ns}` is the number of samples.
In an obvious manner, :math:`Y_k` and :math:`X_k` are the vector of observations and the design matrix for the :math:`k`\ th sample respectively.
Note that the arbitrary transformation :math:`h_k` can be assumed different for each sample since observations are ranked within the sample.
The earlier analysis can be extended to give a combined estimate of :math:`\beta` as :math:`\hat{\beta } = Dd`, where
.. math::
D^{-1} = \sum_{{k = 1}}^{\textit{ns}}X^\mathrm{T}\left(B_k-A_k\right)X_k
and
.. math::
d = \sum_{{k = 1}}^{\textit{ns}}X_k^\mathrm{T}a_k\text{,}
with :math:`a_k`, :math:`B_k` and :math:`A_k` defined as :math:`a`, :math:`B` and :math:`A` above but for the :math:`k`\ th sample.
The remaining statistics are calculated as for the one sample case.
.. _g08rb-py2-py-references:
**References**
Kalbfleisch, J D and Prentice, R L, 1980, `The Statistical Analysis of Failure Time Data`, Wiley
Pettitt, A N, 1982, `Inference for the linear model using a likelihood based on ranks`, J. Roy. Statist. Soc. Ser. B (44), 234--243
Pettitt, A N, 1983, `Approximate methods using ranks for regression with censored data`, Biometrika (70), 121--132
"""
raise NotImplementedError