Source code for naginterfaces.library.contab

# -*- coding: utf-8 -*-
r"""
Module Summary
--------------
Interfaces for the NAG Mark 30.3 `contab` Chapter.

``contab`` - Contingency Table Analysis

The functions in this module are for the analysis of discrete multivariate data.
One suite of functions computes tables while other functions are for the analysis of two-way contingency tables, conditional logistic models and one-factor analysis of binary data.

Functions in submodule :mod:`~naginterfaces.library.correg` may be used to fit generalized linear models to discrete data including binary data and contingency tables.

Functionality Index
-------------------

:math:`\chi^2` statistics for two-way contingency table: :meth:`chisq`

Conditional logistic model for stratified data: :meth:`condl_logistic`

Frequency count for :meth:`binary`: :meth:`binary_service`

Latent variable model for dichotomous data: :meth:`binary`

**Multiway tables from set of classification factors**

  marginal table from :meth:`tabulate_stat` or :meth:`tabulate_percentile`: :meth:`tabulate_margin`

  using given percentile/quantile: :meth:`tabulate_percentile`

  using selected statistic: :meth:`tabulate_stat`

For full information please refer to the NAG Library document

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11intro.html
"""

# NAG Copyright 2017-2024.

[docs]def chisq(nobs):
    r"""
    ``chisq`` computes :math:`\chi^2` statistics for a two-way contingency table.
    For a :math:`2\times 2` table with a small number of observations exact probabilities are computed.

    .. _g11aa-py2-py-doc:

    For full information please refer to the NAG Library document for g11aa

    https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11aaf.html

    .. _g11aa-py2-py-parameters:

    **Parameters**
        **nobs** : int, array-like, shape :math:`\left(\textit{nrow}, \textit{ncol}\right)`
            The contingency table :math:`\mathrm{nobs}[\textit{i}-1,\textit{j}-1]` must contain :math:`n_{{\textit{i}\textit{j}}}`, for :math:`\textit{j} = 1,2,\ldots,c`, for :math:`\textit{i} = 1,2,\ldots,r`.

    **Returns**
        **expt** : float, ndarray, shape :math:`\left(\textit{nrow}, \textit{ncol}\right)`
            The table of expected values. :math:`\mathrm{expt}[\textit{i}-1,\textit{j}-1]` contains :math:`f_{{\textit{i}\textit{j}}}`, for :math:`\textit{j} = 1,2,\ldots,c`, for :math:`\textit{i} = 1,2,\ldots,r`.

        **chist** : float, ndarray, shape :math:`\left(\textit{nrow}, \textit{ncol}\right)`
            The table of :math:`\chi^2` contributions. :math:`\mathrm{chist}[\textit{i}-1,\textit{j}-1]` contains :math:`\frac{\left(n_{{\textit{i}\textit{j}}}-f_{{\textit{i}\textit{j}}}\right)^2}{f_{{\textit{i}\textit{j}}}}`, for :math:`\textit{j} = 1,2,\ldots,c`, for :math:`\textit{i} = 1,2,\ldots,r`.

        **prob** : float
            If :math:`c = 2`, :math:`r = 2` and :math:`n\leq 40` then :math:`\mathrm{prob}` contains the two tail significance level for Fisher's exact test, otherwise :math:`\mathrm{prob}` contains the significance level from the Pearson :math:`\chi^2` statistic.

        **chi** : float
            The Pearson :math:`\chi^2` statistic.

        **g** : float
            The likelihood ratio test statistic.

        **df** : float
            The degrees of freedom for the statistics.

    .. _g11aa-py2-py-errors:

    **Raises**
        **NagValueError**
            (`errno` :math:`1`)
                On entry, :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{ncol} \geq 2`.

            (`errno` :math:`1`)
                On entry, :math:`\textit{nrow} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{nrow} \geq 2`.

            (`errno` :math:`2`)
                On entry, all elements of :math:`\mathrm{nobs} = 0`.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{nobs}[i-1,j-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{nobs}[i-1,j-1] \geq 0`.

            (`errno` :math:`3`)
                On entry, a :math:`2\times 2` table has a row or column with both values zero.

    **Warns**
        **NagAlgorithmicWarning**
            (`errno` :math:`4`)
                At least one cell has an expected frequency, :math:`f_{{ij}}\leq 0.5`.

    .. _g11aa-py2-py-notes:

    **Notes**
        `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`

        For a set of :math:`n` observations classified by two variables, with :math:`r` and :math:`c` levels respectively, a two-way table of frequencies with :math:`r` rows and :math:`c` columns can be computed.

        .. math::
            \begin{array}{ccccc}n_{{11}}&n_{{12}}&\ldots &n_{{1c}}&n_{{1.}}\\n_{{21}}&n_{{22}}&\ldots &n_{{2c}}&n_{{2.}}\\ \vdots & \vdots & \vdots & \vdots & \vdots \\n_{{r1}}&n_{{r2}}&\ldots &n_{{rc}}&n_{{r.}}\\n_{{.1}}&n_{{.2}}&\ldots &n_{{.c}}&n \end{array}

        To measure the association between the two classification variables two statistics that can be used are, the Pearson :math:`\chi^2` statistic, :math:`{\sum_{{i = 1}}^r\sum_{{j = 1}}^c}\frac{\left(n_{{ij}}-f_{{ij}}\right)^2}{f_{{ij}}}`, and the likelihood ratio test statistic, :math:`2{\sum_{{i = 1}}^r\sum_{{j = 1}}^c}n_{{ij}}\times \log\left(n_{{ij}}/f_{{ij}}\right)`, where :math:`f_{{ij}}` are the fitted values from the model that assumes the effects due to the classification variables are additive, i.e., there is no association.
        These values are the expected cell frequencies and are given by

        .. math::
            f_{{ij}} = n_{{i.}}n_{{.j}}/n\text{.}

        Under the hypothesis of no association between the two classification variables, both these statistics have, approximately, a :math:`\chi^2`-distribution with :math:`\left(c-1\right)\left(r-1\right)` degrees of freedom.
        This distribution is arrived at under the assumption that the expected cell frequencies, :math:`f_{{ij}}`, are not too small.
        For a discussion of this point see Everitt (1977).
        He concludes by saying, '... in the majority of cases the chi-square criterion may be used for tables with expectations in excess of :math:`0.5` in the smallest cell'.

        In the case of the :math:`2\times 2` table, i.e., :math:`c = 2` and :math:`r = 2`, the :math:`\chi^2` approximation can be improved by using Yates' continuity correction factor.
        This decreases the absolute value of :math:`\left(n_{{ij}}-f_{{ij}}\right)` by :math:`\frac{1}{2}`.
        For :math:`2\times 2` tables with a small value of :math:`n` the exact probabilities from Fisher's test are computed.
        These are based on the hypergeometric distribution and are computed using :meth:`stat.prob_hypergeom <naginterfaces.library.stat.prob_hypergeom>`.
        A two tail probability is computed as :math:`\mathrm{min}\left(1, {2p_u}, {2p_l}\right)`, where :math:`p_u` and :math:`p_l` are the upper and lower one-tail probabilities from the hypergeometric distribution.

    .. _g11aa-py2-py-references:

    **References**
        Everitt, B S, 1977, `The Analysis of Contingency Tables`, Chapman and Hall

        Kendall, M G and Stuart, A, 1973, `The Advanced Theory of Statistics (Volume 2)`, (3rd Edition), Griffin
    """
    raise NotImplementedError

[docs]def tabulate_stat(stat, update, weight, isf, lfac, ifac, y, wt, table, ncells, icount, auxt):
    r"""
    ``tabulate_stat`` computes a table from a set of classification factors using a selected statistic.

    .. _g11ba-py2-py-doc:

    For full information please refer to the NAG Library document for g11ba

    https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11baf.html

    .. _g11ba-py2-py-parameters:

    **Parameters**
        **stat** : str, length 1
            Indicates which statistic is to be computed for the table cells.

            :math:`\mathrm{stat} = \texttt{'N'}`

                The number of observations for each cell.

            :math:`\mathrm{stat} = \texttt{'T'}`

                The total for the variable in :math:`\mathrm{y}` for each cell.

            :math:`\mathrm{stat} = \texttt{'A'}`

                The average (mean) for the variable in :math:`\mathrm{y}` for each cell.

            :math:`\mathrm{stat} = \texttt{'V'}`

                The variance for the variable in :math:`\mathrm{y}` for each cell.

            :math:`\mathrm{stat} = \texttt{'L'}`

                The largest value for the variable in :math:`\mathrm{y}` for each cell.

            :math:`\mathrm{stat} = \texttt{'S'}`

                The smallest value for the variable in :math:`\mathrm{y}` for each cell.

        **update** : str, length 1
            Indicates if an existing table is to be updated by further observation.

            :math:`\mathrm{update} = \texttt{'I'}`

                The table cells will be initialized to zero before tabulations take place.

            :math:`\mathrm{update} = \texttt{'U'}`

                The table input in :math:`\mathrm{table}` will be updated. The arguments :math:`\mathrm{ncells}`, :math:`\mathrm{table}`, :math:`\mathrm{icount}` and :math:`\mathrm{auxt}` must remain unchanged from the previous call to ``tabulate_stat``.

        **weight** : str, length 1
            Indicates if weights are to be used.

            :math:`\mathrm{weight} = \texttt{'U'}`

                Weights are not used and unit weights are assumed.

            :math:`\mathrm{weight} = \texttt{'W'}` or :math:`\texttt{'V'}`

                Weights are used and must be supplied in :math:`\mathrm{wt}`. The only difference between :math:`\mathrm{weight} = \texttt{'W'}` and :math:`\mathrm{weight} = \texttt{'V'}` is if the variance is computed.

            :math:`\mathrm{weight} = \texttt{'W'}`

                The divisor for the variance is the sum of the weights minus one and if :math:`\mathrm{weight} = \texttt{'V'}`, the divisor is the number of observations with nonzero weights minus one. The former is useful if the weights represent the frequency of the observed values.

            If :math:`\mathrm{stat} = \texttt{'T'}` or :math:`\texttt{'A'}`, the weighted total or mean is computed respectively.

            If :math:`\mathrm{stat} = \texttt{'N'}`, :math:`\texttt{'L'}` or :math:`\texttt{'S'}`, the only effect of weights is to eliminate values with zero weights from the computations.

        **isf** : int, array-like, shape :math:`\left(\textit{nfac}\right)`
            Indicates which factors in :math:`\mathrm{ifac}` are to be used in the tabulation.

            If :math:`\mathrm{isf}[i-1] > 0` the :math:`i`\ th factor in :math:`\mathrm{ifac}` is included in the tabulation.

            Note that if :math:`\mathrm{isf}[\textit{i}-1]\leq 0`, for :math:`\textit{i} = 1,2,\ldots,\textit{nfac}` then the statistic for the whole sample is calculated and returned in a :math:`1\times 1` table.

        **lfac** : int, array-like, shape :math:`\left(\textit{nfac}\right)`
            The number of levels of the classifying factors in :math:`\mathrm{ifac}`.

        **ifac** : int, array-like, shape :math:`\left(n, \textit{nfac}\right)`
            The :math:`\textit{nfac}` coded classification factors for the :math:`\textit{n}` observations.

        **y** : float, array-like, shape :math:`\left(n\right)`
            The variable to be tabulated. If :math:`\mathrm{stat} = \texttt{'N'}`, :math:`\mathrm{y}` is not referenced.

        **wt** : float, array-like, shape :math:`\left(:\right)`
            Note: the required length for this argument is determined as follows: if :math:`\mathrm{weight}\text{ in } (\texttt{'W'}, \texttt{'V'})`: :math:`n`; otherwise: :math:`1`.

            If :math:`\mathrm{weight} = \texttt{'W'}` or :math:`\texttt{'V'}`, :math:`\mathrm{wt}` must contain the :math:`\textit{n}` weights. Otherwise :math:`\mathrm{wt}` is not referenced.

        **table** : float, array-like, shape :math:`\left(\textit{maxt}\right)`
            If :math:`\mathrm{update} = \texttt{'U'}`, :math:`\mathrm{table}` must be unchanged from the previous call to ``tabulate_stat``, otherwise :math:`\mathrm{table}` need not be set.

        **ncells** : int
            If :math:`\mathrm{update} = \texttt{'U'}`, :math:`\mathrm{ncells}` must be unchanged from the previous call to ``tabulate_stat``, otherwise :math:`\mathrm{ncells}` need not be set.

        **icount** : int, array-like, shape :math:`\left(\textit{maxt}\right)`
            If :math:`\mathrm{update} = \texttt{'U'}`, :math:`\mathrm{icount}` must be unchanged from the previous call to ``tabulate_stat``, otherwise :math:`\mathrm{icount}` need not be set.

        **auxt** : float, array-like, shape :math:`\left(:\right)`
            Note: the required length for this argument is determined as follows: if :math:`\mathrm{stat}=\texttt{'A'}`: :math:`\textit{maxt}`; if :math:`\mathrm{stat}=\texttt{'V'}`: :math:`{ 2\times \textit{maxt} }`; otherwise: :math:`0`.

            If :math:`\mathrm{update} = \texttt{'U'}`, :math:`\mathrm{auxt}` must be unchanged from the previous call to ``tabulate_stat``, otherwise :math:`\mathrm{auxt}` need not be set.

    **Returns**
        **table** : float, ndarray, shape :math:`\left(\textit{maxt}\right)`
            The computed table. The :math:`\mathrm{ncells}` cells of the table are stored so that for any two factors the index relating to the factor referred to later in :math:`\mathrm{lfac}` and :math:`\mathrm{ifac}` changes faster. For further details see `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11baf.html#fcomments>`__.

        **ncells** : int
            The number of cells in the table.

        **ndim** : int
            The number of factors defining the table.

        **idim** : int, ndarray, shape :math:`\left(\textit{nfac}\right)`
            The first :math:`\mathrm{ndim}` elements contain the number of levels for the factors defining the table.

        **icount** : int, ndarray, shape :math:`\left(\textit{maxt}\right)`
            A table containing the number of observations contributing to each cell of the table, stored identically to :math:`\mathrm{table}`. Note if :math:`\mathrm{stat} = \texttt{'N'}` this is the same as is returned in :math:`\mathrm{table}`.

        **auxt** : float, ndarray, shape :math:`\left(:\right)`
            If :math:`\mathrm{stat} = \texttt{'A'}` or :math:`\texttt{'V'}`, the first :math:`\mathrm{ncells}` values hold the table containing the sum of the weights for the observations contributing to each cell, stored identically to :math:`\mathrm{table}`.

            If :math:`\mathrm{stat} = \texttt{'V'}`, the second set of :math:`\mathrm{ncells}` values hold the table of cell means.

            Otherwise :math:`\mathrm{auxt}` is not referenced.

    .. _g11ba-py2-py-errors:

    **Raises**
        **NagValueError**
            (`errno` :math:`1`)
                On entry, :math:`\mathrm{weight} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{weight} = \texttt{'U'}`, :math:`\texttt{'V'}` or :math:`\texttt{'W'}`.

            (`errno` :math:`1`)
                On entry, :math:`\mathrm{update} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{update} = \texttt{'I'}` or :math:`\texttt{'U'}`.

            (`errno` :math:`1`)
                On entry, :math:`\mathrm{stat} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{stat} = \texttt{'N'}`, :math:`\texttt{'T'}`, :math:`\texttt{'A'}`, :math:`\texttt{'V'}`, :math:`\texttt{'L'}` or :math:`\texttt{'S'}`.

            (`errno` :math:`1`)
                On entry, :math:`\textit{nfac} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{nfac} \geq 1`.

            (`errno` :math:`1`)
                On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`n \geq 2`.

            (`errno` :math:`2`)
                On entry, :math:`\textit{maxt} = \langle\mathit{\boldsymbol{value}}\rangle` and minimum value for :math:`\textit{maxt} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{maxt}\geq \text{product}` of the levels of the factors included in the tabulation.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{wt}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{wt}[i-1]\geq 0.0`.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{lfac}[j-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ifac}[i-1,j-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{ifac}[i-1,j-1]\leq \mathrm{lfac}[j-1]`.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ifac}[i-1,j-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{ifac}[i-1,j-1] > 0`.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{lfac}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                On entry, :math:`\mathrm{lfac}[i-1] > 2`.

            (`errno` :math:`3`)
                The variance divisor :math:`\text{}\leq 0.0`.

            (`errno` :math:`4`)
                :math:`\mathrm{auxt}` or :math:`\mathrm{table}` has changed between calls.

            (`errno` :math:`4`)
                :math:`\mathrm{auxt}` has changed between calls.

            (`errno` :math:`4`)
                :math:`\mathrm{ncells}` has changed between calls.

    .. _g11ba-py2-py-notes:

    **Notes**
        `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`

        A dataset may include both classification variables and general variables.
        The classification variables, known as factors, take a small number of values known as levels.
        For example, the factor sex would have the levels male and female.
        These can be coded as :math:`1` and :math:`2` respectively.
        Given several factors, a multi-way table can be constructed such that each cell of the table represents one level from each factor.
        For example, the two factors sex and habitat, habitat having three levels (inner-city, suburban and rural) define the :math:`2\times 3` contingency table

        [table omitted]

        For each cell statistics can be computed.
        If a third variable in the dataset was age, then for each cell the average age could be computed:

        [table omitted]

        That is the average age for all observations for males living in rural areas is :math:`35.6`.
        Other statistics can also be computed: the number of observations, the total, the variance, the largest value and the smallest value.

        ``tabulate_stat`` computes a table for one of the selected statistics.
        The factors have to be coded with levels :math:`1,2,\ldots \text{}`.
        Weights can be used to eliminate values from the calculations, e.g., if they represent 'missing values'.
        There is also the facility to update an existing table with the addition of new observations.

    .. _g11ba-py2-py-references:

    **References**
        John, J A and Quenouille, M H, 1977, `Experiments: Design and Analysis`, Griffin

        Kendall, M G and Stuart, A, 1969, `The Advanced Theory of Statistics (Volume 1)`, (3rd Edition), Griffin

        West, D H D, 1979, `Updating mean and variance estimates: An improved method`, Comm. ACM (22), 532--555
    """
    raise NotImplementedError

[docs]def tabulate_percentile(typ, isf, lfac, ifac, percnt, y, maxt, wt=None):
    r"""
    ``tabulate_percentile`` computes a table from a set of classification factors using a given percentile or quantile, for example the median.

    .. _g11bb-py2-py-doc:

    For full information please refer to the NAG Library document for g11bb

    https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11bbf.html

    .. _g11bb-py2-py-parameters:

    **Parameters**
        **typ** : str, length 1
            Indicates if the variable to be tabulated is discrete or continuous.

            :math:`\mathrm{typ} = \texttt{'D'}`

                The percentiles are computed for a discrete variable.

            :math:`\mathrm{typ} = \texttt{'C'}`

                The percentiles are computed for a continuous variable using linear interpolation.

        **isf** : int, array-like, shape :math:`\left(\textit{nfac}\right)`
            Indicates which factors in :math:`\mathrm{ifac}` are to be used in the tabulation.

            If :math:`\mathrm{isf}[i-1] > 0` the :math:`i`\ th factor in :math:`\mathrm{ifac}` is included in the tabulation.

            Note that if :math:`\mathrm{isf}[\textit{i}-1]\leq 0`, for :math:`\textit{i} = 1,2,\ldots,\textit{nfac}` then the statistic for the whole sample is calculated and returned in a :math:`1\times 1` table.

        **lfac** : int, array-like, shape :math:`\left(\textit{nfac}\right)`
            The number of levels of the classifying factors in :math:`\mathrm{ifac}`.

        **ifac** : int, array-like, shape :math:`\left(n, \textit{nfac}\right)`
            The :math:`\textit{nfac}` coded classification factors for the :math:`\textit{n}` observations.

        **percnt** : float
            :math:`p`, the percentile to be tabulated.

        **y** : float, array-like, shape :math:`\left(n\right)`
            The variable to be tabulated.

        **maxt** : int
            The maximum size of the table to be computed.

        **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional
            If :math:`\textit{weight} = \texttt{'W'}`, :math:`\mathrm{wt}` must contain the :math:`\textit{n}` weights. Otherwise :math:`\mathrm{wt}` is not referenced.

    **Returns**
        **table** : float, ndarray, shape :math:`\left(\mathrm{maxt}\right)`
            The computed table. The :math:`\mathrm{ncells}` cells of the table are stored so that for any two factors the index relating to the factor occurring later in :math:`\mathrm{lfac}` and :math:`\mathrm{ifac}` changes faster. For further details see `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11bbf.html#fcomments>`__.

        **ncells** : int
            The number of cells in the table.

        **ndim** : int
            The number of factors defining the table.

        **idim** : int, ndarray, shape :math:`\left(\textit{nfac}\right)`
            The first :math:`\mathrm{ndim}` elements contain the number of levels for the factors defining the table.

        **icount** : int, ndarray, shape :math:`\left(\mathrm{maxt}\right)`
            A table containing the number of observations contributing to each cell of the table, stored identically to :math:`\mathrm{table}`.

    .. _g11bb-py2-py-errors:

    **Raises**
        **NagValueError**
            (`errno` :math:`1`)
                On entry, :math:`\mathrm{percnt} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`0.0 < p < 100.0`.

            (`errno` :math:`1`)
                On entry, :math:`\mathrm{typ} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{typ} = \texttt{'D'}` or :math:`\texttt{'C'}`.

            (`errno` :math:`1`)
                On entry, :math:`\textit{nfac} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{nfac} \geq 1`.

            (`errno` :math:`1`)
                On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`n \geq 2`.

            (`errno` :math:`2`)
                On entry, :math:`\mathrm{maxt} = \langle\mathit{\boldsymbol{value}}\rangle` and minimum value for :math:`\mathrm{maxt} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{maxt}\geq \text{product}` of the levels of the factors included in the tabulation.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{wt}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{wt}[i-1]\geq 0.0`.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{lfac}[j-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ifac}[i-1,j-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{ifac}[i-1,j-1]\leq \mathrm{lfac}[j-1]`.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ifac}[i-1,j-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{ifac}[i-1,j-1] > 0`.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{lfac}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                On entry, :math:`\mathrm{lfac}[i-1] > 2`.

            (`errno` :math:`3`)
                Some cells of the table are empty.

    .. _g11bb-py2-py-notes:

    **Notes**
        `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`

        A dataset may include both classification variables and general variables.
        The classification variables, known as factors, take a small number of values known as levels.
        For example, the factor sex would have the levels male and female.
        These can be coded as :math:`1` and :math:`2` respectively.
        Given several factors, a multi-way table can be constructed such that each cell of the table represents one level from each factor.
        For example, the two factors sex and habitat, habitat having three levels (inner-city, suburban and rural) define the :math:`2\times 3` contingency table

        [table omitted]

        For each cell statistics can be computed.
        If a third variable in the dataset was age then for each cell the median age could be computed:

        [table omitted]

        That is, the median age for all observations for males living in rural areas is :math:`37`, the median being the 50% quantile.
        Other quantiles can also be computed: the :math:`p` percent quantile or percentile, :math:`q_p`, is the estimate of the value such that :math:`p` percent of observations are less than :math:`q_p`.
        This is calculated in two different ways depending on whether the tabulated variable is continuous or discrete.
        Let there be :math:`m` values in a cell and let :math:`y_{\left(1\right)}`, :math:`y_{\left(2\right)},\ldots,y_{\left(m\right)}` be the values for that cell sorted into ascending order.
        Also, associated with each value there is a weight, :math:`w_{\left(1\right)}`, :math:`w_{\left(2\right)},\ldots,w_{\left(m\right)}`, which could represent the observed frequency for that value, with :math:`W_j = \sum_{{i = 1}}^jw_{\left(i\right)}` and :math:`W_j^{\prime } = \sum_{{i = 1}}^jw_{\left(i\right)}-\frac{1}{2}w_{\left(j\right)}`.
        For the :math:`p` percentile let :math:`p_w = \left(p/100\right)W_m` and :math:`p_w^{\prime } = \left(p/100\right)W_m^{\prime }`, then the percentiles for the two cases are as given below.

        If the variable is discrete, that is, it takes only a limited number of (usually integer) values, then the percentile is defined as

        .. math::
            \begin{array}{ll}y_{\left(j\right)}&\text{if }W_{{j-1}} < p_W < W_j\\&\\\frac{{y_{\left(j+1\right)}+y_{\left(j\right)}}}{2}&\text{if }p_w = W_j\text{.}\end{array}

        If the data is continuous then the quantiles are estimated by linear interpolation.

        .. math::
            \begin{array}{ll}y_{\left(1\right)}&\text{if } p_w^{\prime }\leq W_1^{\prime }\\&\\\left(1-f\right)y_{\left(j-1\right)}+fy_{\left(j\right)}&\text{if } W_{{j-1}}^{\prime } < p_w^{\prime }\leq W_j^{\prime }\\&\\y_{\left(m\right)}&\text{if } p_w^{\prime } > W_m^{\prime }\text{,}\end{array}

        where :math:`f = \left(p_w^{\prime }-W_{{j-1}}^{\prime }\right)/\left(W_j^{\prime }-W_{{j-1}}^{\prime }\right)`.

    .. _g11bb-py2-py-references:

    **References**
        John, J A and Quenouille, M H, 1977, `Experiments: Design and Analysis`, Griffin

        Kendall, M G and Stuart, A, 1969, `The Advanced Theory of Statistics (Volume 1)`, (3rd Edition), Griffin
    """
    raise NotImplementedError

[docs]def tabulate_margin(stat, table, idim, isdim, maxst):
    r"""
    ``tabulate_margin`` computes a marginal table from a table computed by :meth:`tabulate_stat` or :meth:`tabulate_percentile` using a selected statistic.

    .. _g11bc-py2-py-doc:

    For full information please refer to the NAG Library document for g11bc

    https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11bcf.html

    .. _g11bc-py2-py-parameters:

    **Parameters**
        **stat** : str, length 1
            Indicates which statistic is to be used to compute the marginal table.

            :math:`\mathrm{stat} = \texttt{'T'}`

                The total.

            :math:`\mathrm{stat} = \texttt{'A'}`

                The average or mean.

            :math:`\mathrm{stat} = \texttt{'M'}`

                The median.

            :math:`\mathrm{stat} = \texttt{'V'}`

                The variance.

            :math:`\mathrm{stat} = \texttt{'L'}`

                The largest value.

            :math:`\mathrm{stat} = \texttt{'S'}`

                The smallest value.

        **table** : float, array-like, shape :math:`\left(\textit{ncells}\right)`
            The table as computed by :meth:`tabulate_stat` or :meth:`tabulate_percentile`.

        **idim** : int, array-like, shape :math:`\left(\textit{ndim}\right)`
            The number of levels for each dimension of :math:`\mathrm{table}` as returned by :meth:`tabulate_stat` or :meth:`tabulate_percentile`.

        **isdim** : int, array-like, shape :math:`\left(\textit{ndim}\right)`
            Indicates which dimensions of :math:`\mathrm{table}` are to be included in the sub-table. If :math:`\mathrm{isdim}[i-1] > 0` the dimension or factor indicated by :math:`\mathrm{idim}[i-1]` is to be included in the sub-table, otherwise it is excluded.

        **maxst** : int
            The maximum size of sub-table to be computed.

    **Returns**
        **stable** : float, ndarray, shape :math:`\left(\mathrm{maxst}\right)`
            The first :math:`\mathrm{mcells}` elements contain the sub-table computed using the statistic indicated by :math:`\mathrm{stat}`. The table is stored in a similar way to :math:`\mathrm{table}` with the :math:`\mathrm{mcells}` cells stored so that for any two dimensions the index relating to the dimension given later in :math:`\mathrm{idim}` changes faster. For further details see `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11bcf.html#fcomments>`__.

        **mcells** : int
            The number of cells in the sub-table in :math:`\mathrm{stable}`.

        **mdim** : int
            The number of dimensions to the sub-table in :math:`\mathrm{stable}`.

        **mlevel** : int, ndarray, shape :math:`\left(\textit{ndim}\right)`
            The first :math:`\mathrm{mdim}` elements contain the number of levels for the dimensions of the sub-table in :math:`\mathrm{stable}`. The remaining elements are not referenced.

        **auxt** : float, ndarray, shape :math:`\left(:\right)`
            If :math:`\mathrm{stat} = \texttt{'V'}` :math:`\mathrm{auxt}` contains the sub-table of means corresponding to the sub-table of variances in :math:`\mathrm{stable}`. Otherwise :math:`\mathrm{auxt}` is not referenced.

    .. _g11bc-py2-py-errors:

    **Raises**
        **NagValueError**
            (`errno` :math:`1`)
                On entry, :math:`\mathrm{stat} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{stat} = \texttt{'T'}`, :math:`\texttt{'A'}`, :math:`\texttt{'M'}`, :math:`\texttt{'V'}`, :math:`\texttt{'L'}` or :math:`\texttt{'S'}`.

            (`errno` :math:`1`)
                On entry, :math:`\textit{ndim} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{ndim} \geq 2`.

            (`errno` :math:`2`)
                On entry, :math:`\mathrm{maxst} = \langle\mathit{\boldsymbol{value}}\rangle` and minimum value for :math:`\mathrm{maxst} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{maxst}\geq \text{product}` of the levels of the dimensions of the :math:`\mathrm{table}` included in the sub-table, :math:`\mathrm{stable}`.

            (`errno` :math:`2`)
                On entry, all elements of :math:`\mathrm{isdim} > 0`.

            (`errno` :math:`2`)
                On entry, no elements of :math:`\mathrm{isdim} > 0`.

            (`errno` :math:`2`)
                On entry, :math:`\textit{ncells}` is incompatible with :math:`\mathrm{idim}`.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{idim}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{idim} \geq 2`.

    .. _g11bc-py2-py-notes:

    **Notes**
        For a dataset containing classification variables (known as factors) the functions :meth:`tabulate_stat` and :meth:`tabulate_percentile` compute a table using selected statistics, for example the mean or the median.
        The table is indexed by the levels of the selected factors, for example if there were three factors A, B and C with :math:`3`, :math:`2` and :math:`4` levels respectively and the mean was to be tabulated the resulting table would be :math:`3\times 2\times 4` with each cell being the mean of all observations with the appropriate combination of levels of the three factors.
        In further analysis the table of means averaged over C for A and B may be required; this can be computed from the full table by taking the mean over the third dimension of the table, C.

        In general, given a table computed by :meth:`tabulate_stat` or :meth:`tabulate_percentile`, ``tabulate_margin`` computes a sub-table defined by a subset of the factors used to define the table such that each cell of the sub-table is the selected statistic computed over the remaining factors.
        The statistics that can be used are the total, the mean, the median, the variance, the smallest and the largest value.

    .. _g11bc-py2-py-references:

    **References**
        John, J A and Quenouille, M H, 1977, `Experiments: Design and Analysis`, Griffin

        Kendall, M G and Stuart, A, 1969, `The Advanced Theory of Statistics (Volume 1)`, (3rd Edition), Griffin

        West, D H D, 1979, `Updating mean and variance estimates: An improved method`, Comm. ACM (22), 532--555
    """
    raise NotImplementedError

[docs]def condl_logistic(ns, z, isz, ic, isi, b, tol, maxit, iprint=0, io_manager=None):
    r"""
    ``condl_logistic`` returns parameter estimates for the conditional logistic analysis of stratified data, for example, data from case-control studies and survival analyses.

    .. _g11ca-py2-py-doc:

    For full information please refer to the NAG Library document for g11ca

    https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11caf.html

    .. _g11ca-py2-py-parameters:

    **Parameters**
        **ns** : int
            The number of strata, :math:`n_s`.

        **z** : float, array-like, shape :math:`\left(n, m\right)`
            The :math:`i`\ th row must contain the covariates which are associated with the :math:`i`\ th observation.

        **isz** : int, array-like, shape :math:`\left(m\right)`
            Indicates which subset of covariates are to be included in the model.

            If :math:`\mathrm{isz}[j-1]\geq 1`, the :math:`j`\ th covariate is included in the model.

            If :math:`\mathrm{isz}[j-1] = 0`, the :math:`j`\ th covariate is excluded from the model and not referenced.

        **ic** : int, array-like, shape :math:`\left(n\right)`
            Indicates whether the :math:`i`\ th observation is a case or a control.

            If :math:`\mathrm{ic}[i-1] = 0`, indicates that the :math:`i`\ th observation is a case.

            If :math:`\mathrm{ic}[i-1] = 1`, indicates that the :math:`i`\ th observation is a control.

        **isi** : int, array-like, shape :math:`\left(n\right)`
            Stratum indicators which also allow data points to be excluded from the analysis.

            If :math:`\mathrm{isi}[i-1] = k`, indicates that the :math:`i`\ th observation is from the :math:`k`\ th stratum, where :math:`k = 1,2,\ldots,\mathrm{ns}`.

            If :math:`\mathrm{isi}[i-1] = 0`, indicates that the :math:`i`\ th observation is to be omitted from the analysis.

        **b** : float, array-like, shape :math:`\left(\textit{ip}\right)`
            Initial estimates of the covariate coefficient parameters :math:`\beta`. :math:`\mathrm{b}[j-1]` must contain the initial estimate of the coefficent of the covariate in :math:`\mathrm{z}` corresponding to the :math:`j`\ th nonzero value of :math:`\mathrm{isz}`.

            `Suggested value`: in many cases an initial value of zero for :math:`\mathrm{b}[j-1]` may be used. For another suggestion see `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11caf.html#fcomments>`__.

        **tol** : float
            Indicates the accuracy required for the estimation. Convergence is assumed when the decrease in deviance is less than :math:`\mathrm{tol}\times \left(1.0+\textit{CurrentDeviance}\right)`. This corresponds approximately to an absolute accuracy if the deviance is small and a relative accuracy if the deviance is large.

        **maxit** : int
            The maximum number of iterations required for computing the estimates. If :math:`\mathrm{maxit}` is set to :math:`0` then the standard errors, the score functions and the variance-covariance matrix are computed for the input value of :math:`\beta` in :math:`\mathrm{b}` but :math:`\beta` is not updated.

        **iprint** : int, optional
            Indicates if the printing of information on the iterations is required.

            :math:`\mathrm{iprint}\leq 0`

                No printing.

            :math:`\mathrm{iprint}\geq 1`

                The deviance and the current estimates are printed every :math:`\mathrm{iprint}` iterations. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see :class:`~naginterfaces.base.utils.FileObjManager`).

        **io_manager** : FileObjManager, optional
            Manager for I/O in this routine.

    **Returns**
        **dev** : float
            The deviance, that is, minus twice the maximized log-likelihood.

        **b** : float, ndarray, shape :math:`\left(\textit{ip}\right)`
            :math:`\mathrm{b}[j-1]` contains the estimate :math:`\hat{\beta }_i` of the coefficient of the covariate stored in the :math:`i`\ th column of :math:`\mathrm{z}` where :math:`i` is the :math:`j`\ th nonzero value in the array :math:`\mathrm{isz}`.

        **se** : float, ndarray, shape :math:`\left(\textit{ip}\right)`
            :math:`\mathrm{se}[\textit{j}-1]` is the asymptotic standard error of the estimate contained in :math:`\mathrm{b}[\textit{j}-1]` and score function in :math:`\mathrm{sc}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,\textit{ip}`.

        **sc** : float, ndarray, shape :math:`\left(\textit{ip}\right)`
            :math:`\mathrm{sc}[j]` is the value of the score function :math:`U_j\left(\beta \right)` for the estimate contained in :math:`\mathrm{b}[j-1]`.

        **cov** : float, ndarray, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)`
            The variance-covariance matrix of the parameter estimates in :math:`\mathrm{b}` stored in packed form by column, i.e., the covariance between the parameter estimates given in :math:`\mathrm{b}[i-1]` and :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is given in :math:`\mathrm{cov}[j\left(j-1\right)/2+i]`.

        **nca** : int, ndarray, shape :math:`\left(\mathrm{ns}\right)`
            :math:`\mathrm{nca}[\textit{i}-1]` contains the number of cases in the :math:`\textit{i}`\ th stratum, for :math:`\textit{i} = 1,2,\ldots,\mathrm{ns}`.

        **nct** : int, ndarray, shape :math:`\left(\mathrm{ns}\right)`
            :math:`\mathrm{nct}[\textit{i}-1]` contains the number of controls in the :math:`\textit{i}`\ th stratum, for :math:`\textit{i} = 1,2,\ldots,\mathrm{ns}`.

    .. _g11ca-py2-py-errors:

    **Raises**
        **NagValueError**
            (`errno` :math:`1`)
                On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{tol}\geq 10\times \text{machine precision}`.

            (`errno` :math:`1`)
                On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{ip} \geq 1`.

            (`errno` :math:`1`)
                On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{maxit} \geq 0`.

            (`errno` :math:`1`)
                On entry, :math:`\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{ns} \geq 1`.

            (`errno` :math:`1`)
                On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`n \geq 2`.

            (`errno` :math:`1`)
                On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`m \geq 1`.

            (`errno` :math:`2`)
                On entry, there are not :math:`\textit{ip}` values of :math:`\mathrm{isz} > 0`.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{isz}[i-1] < \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{isz}[i-1] \geq 0`.

            (`errno` :math:`2`)
                On entry, too few observations included in model.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{isi}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`0\leq \mathrm{isi}[i-1]\leq \mathrm{ns}`.

            (`errno` :math:`2`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ic}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{ic}[i] = 0` or :math:`1`.

            (`errno` :math:`4`)
                Overflow in calculations.

            (`errno` :math:`5`)
                The matrix of second partial derivatives is singular.

    **Warns**
        **NagAlgorithmicWarning**
            (`errno` :math:`6`)
                Convergence not achieved in :math:`\langle\mathit{\boldsymbol{value}}\rangle` iterations.

    .. _g11ca-py2-py-notes:

    **Notes**
        In the analysis of binary data, the logistic model is commonly used.
        This relates the probability of one of the outcomes, say :math:`y = 1`, to :math:`p` explanatory variates or covariates by

        .. math::
            \mathrm{Prob}\left(y = 1\right) = \frac{{\mathrm{exp}\left(\alpha +z^\mathrm{T}\beta \right)}}{{1+\mathrm{exp}\left(\alpha +z^\mathrm{T}\beta \right)}}\text{,}

        where :math:`\beta` is a vector of unknown coefficients for the covariates :math:`z` and :math:`\alpha` is a constant term.
        If the observations come from different strata or groups, :math:`\alpha` would vary from strata to strata.
        If the observed outcomes are independent then the :math:`y`\ s follow a Bernoulli distribution, i.e., a binomial distribution with sample size one and the model can be fitted as a generalized linear model with binomial errors.

        In some situations the number of observations for which :math:`y = 1` may not be independent.
        For example, in epidemiological research, case-control studies are widely used in which one or more observed cases are matched with one or more controls.
        The matching is based on fixed characteristics such as age and sex, and is designed to eliminate the effect of such characteristics in order to more accurately determine the effect of other variables.
        Each case-control group can be considered as a stratum.
        In this type of study the binomial model is not appropriate, except if the strata are large, and a conditional logistic model is used.
        This considers the probability of the cases having the observed vectors of covariates given the set of vectors of covariates in the strata.
        In the situation of one case per stratum, the conditional likelihood for :math:`n_{\mathrm{s}}` strata can be written as

        .. math::
            L = \prod_{{i = 1}}^{n_s}\frac{{\mathrm{exp}\left(z_i^\mathrm{T}\beta \right)}}{{\left[\sum_{{l \in S_i}}{\mathrm{exp}\left(z_l^\mathrm{T}\beta \right)}\right]}}\text{,}

        where :math:`S_i` is the set of observations in the :math:`i`\ th stratum, with associated vectors of covariates :math:`z_l`, :math:`l \in S_i`, and :math:`z_i` is the vector of covariates of the case in the :math:`i`\ th stratum.
        In the general case of :math:`c_i` cases per strata then the full conditional likelihood is

        .. math::
            L = \prod_{{i = 1}}^{n_s}\frac{{\mathrm{exp}\left(s_i^\mathrm{T}\beta \right)}}{{\left[\sum_{{l \in C_i}}{\mathrm{exp}\left(s_l^\mathrm{T}\beta \right)}\right]}}\text{,}

        where :math:`s_i` is the sum of the vectors of covariates for the cases in the :math:`i`\ th stratum and :math:`s_l`, :math:`l \in C_i` refer to the sum of vectors of covariates for all distinct sets of :math:`c_i` observations drawn from the :math:`i`\ th stratum.
        The conditional likelihood can be maximized by a Newton--Raphson procedure.
        The covariances of the parameter estimates can be estimated from the inverse of the matrix of second derivatives of the logarithm of the conditional likelihood, while the first derivatives provide the score function, :math:`U_{\textit{j}}\left(\beta \right)`, for :math:`\textit{j} = 1,2,\ldots,p`, which can be used for testing the significance of parameters.

        If the strata are not small, :math:`C_i` can be large so to improve the speed of computation, the algorithm in Howard (1972) and described by Krailo and Pike (1984) is used.

        A second situation in which the above conditional likelihood arises is in fitting Cox's proportional hazard model (see :meth:`surviv.coxmodel <naginterfaces.library.surviv.coxmodel>`) in which the strata refer to the risk sets for each failure time and where the failures are cases.
        When ties are present in the data :meth:`surviv.coxmodel <naginterfaces.library.surviv.coxmodel>` uses an approximation.
        For an exact estimate, the data can be expanded using :meth:`surviv.coxmodel_risksets <naginterfaces.library.surviv.coxmodel_risksets>` to create the risk sets/strata and ``condl_logistic`` used.

    .. _g11ca-py2-py-references:

    **References**
        Cox, D R, 1972, `Regression models in life tables (with discussion)`, J. Roy. Statist. Soc. Ser. B (34), 187--220

        Cox, D R and Hinkley, D V, 1974, `Theoretical Statistics`, Chapman and Hall

        Howard, S, 1972, `Remark on the paper by Cox, D R (1972): Regression methods`, J. R. Statist. Soc. (B 34), and life tables, 187--220

        Krailo, M D and Pike, M C, 1984, `Algorithm AS 196. Conditional multivariate logistic analysis of stratified case-control studies`, Appl. Statist. (33), 95--103

        Smith, P G, Pike, M C, Hill, P, Breslow, N E and Day, N E, 1981, `Algorithm AS 162. Multivariate conditional logistic analysis of stratum-matched case-control studies`, Appl. Statist. (30), 190--197
    """
    raise NotImplementedError

[docs]def binary(n, gprob, x, irl, a, c, cgetol, chisqr, iprint=1, maxit=1000, ishow=0, io_manager=None):
    r"""
    ``binary`` fits a latent variable model (with a single factor) to data consisting of a set of measurements on individuals in the form of binary-valued sequences (generally referred to as score patterns).
    Various measures of goodness-of-fit are calculated along with the factor (theta) scores.

    .. _g11sa-py2-py-doc:

    For full information please refer to the NAG Library document for g11sa

    https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11saf.html

    .. _g11sa-py2-py-parameters:

    **Parameters**
        **n** : int
            :math:`n`, the number of individuals in the sample.

        **gprob** : bool
            Must be set equal to :math:`\mathbf{True}` if :math:`G\left(z\right) = \Phi^{-1}\left(z\right)` and :math:`\mathbf{False}` if :math:`G\left(z\right) = \mathrm{logit}\left(z\right)`.

        **x** : bool, array-like, shape :math:`\left(\textit{ns}, \textit{ip}\right)`
            The first :math:`s` rows of :math:`\mathrm{x}` must contain the :math:`s` different score patterns. The :math:`l`\ th row of :math:`\mathrm{x}` must contain the :math:`l`\ th score pattern with :math:`\mathrm{x}[l-1,j-1]` set equal to :math:`\mathbf{True}` if :math:`x_{{lj}} = 1` and :math:`\mathbf{False}` if :math:`x_{{lj}} = 0`. All rows of :math:`\mathrm{x}` must be distinct.

        **irl** : int, array-like, shape :math:`\left(\textit{ns}\right)`
            The :math:`i`\ th component of :math:`\mathrm{irl}` must be set equal to the frequency with which the :math:`i`\ th row of :math:`\mathrm{x}` occurs.

        **a** : float, array-like, shape :math:`\left(\textit{ip}\right)`
            :math:`\mathrm{a}[j-1]` must be set equal to an initial estimate of :math:`\alpha_{{j1}}`. **In order to avoid divergence problems with the E-M algorithm you are strongly advised to set all the** :math:`\mathrm{a}[j-1]` **to** :math:`0.5` **.**

        **c** : float, array-like, shape :math:`\left(\textit{ip}\right)`
            :math:`\mathrm{c}[j-1]` must be set equal to an initial estimate of :math:`\alpha_{{j0}}`. **In order to avoid divergence problems with the E-M algorithm you are strongly advised to set all the** :math:`\mathrm{c}[j-1]` **to** :math:`0.0` **.**

        **cgetol** : float
            The accuracy to which the solution is required.

            If :math:`\mathrm{cgetol}` is set to :math:`10^{{-l}}` and on exit the function exits successfully or :math:`\mathrm{errno}` = 7, then all elements of the gradient vector will be smaller than :math:`10^{{-l}}` in absolute value.

            For most practical purposes the value :math:`10^{-4}` should suffice.

            You should be wary of setting :math:`\mathrm{cgetol}` too small since the convergence criterion may then have become too strict for the machine to handle.

            If :math:`\mathrm{cgetol}` has been set to a value which is less than the square root of the machine precision, :math:`\epsilon`, then ``binary`` will use the value :math:`\sqrt{\epsilon }` instead.

        **chisqr** : bool
            If :math:`\mathrm{chisqr}` is set equal to :math:`\mathbf{True}`, a likelihood ratio statistic will be calculated (see :math:`\mathrm{chi}`).

            If :math:`\mathrm{chisqr}` is set equal to :math:`\mathbf{False}`, no such statistic will be calculated.

        **iprint** : int, optional
            The frequency with which the maximum likelihood search function is to be monitored.

            :math:`\mathrm{iprint} > 0`

                The search is monitored once every :math:`\mathrm{iprint}` iterations, and when the number of quadrature points is increased, and again at the final solution point.

            :math:`\mathrm{iprint} = 0`

                The search is monitored once at the final point.

            :math:`\mathrm{iprint} < 0`

                The search is not monitored at all.

            :math:`\mathrm{iprint}` should normally be set to a small positive number.

        **maxit** : int, optional
            The maximum number of iterations to be made in the maximum likelihood search. There will be an error exit (see :ref:`Exceptions <g11sa-py2-py-errors>`) if the search function has not converged in :math:`\mathrm{maxit}` iterations.

        **ishow** : int, optional
            Indicates which of the following three quantities are to be printed before exit from the function (given a valid parameter set):

            (a) Table of maximum likelihood estimates and standard errors (as returned in the output arrays :math:`\mathrm{a}`, :math:`\mathrm{c}`, :math:`\mathrm{alpha}`, :math:`\mathrm{pigam}` and :math:`\mathrm{cm}`).

            (#) Table of observed and expected first - and second-order margins (as returned in the output arrays :math:`\mathrm{expp}` and :math:`\mathrm{obs}`).

            (#) Table of observed and expected frequencies of score patterns along with theta scores (as returned in the output arrays :math:`\mathrm{irl}`, :math:`\mathrm{exf}`, :math:`\mathrm{y}`, :math:`\mathrm{xl}` and :math:`\mathrm{iob}`) and the likelihood ratio statistic (if required).

            :math:`\mathrm{ishow} = 0`

                None of the above are printed.

            :math:`\mathrm{ishow} = 1`

                \(a) only is printed.

            :math:`\mathrm{ishow} = 2`

                \(b) only is printed.

            :math:`\mathrm{ishow} = 3`

                \(c) only is printed.

            :math:`\mathrm{ishow} = 4`

                \(a) and \(b) are printed.

            :math:`\mathrm{ishow} = 5`

                \(a) and \(c) are printed.

            :math:`\mathrm{ishow} = 6`

                \(b) and \(c) are printed.

            :math:`\mathrm{ishow} = 7`

                \(a), \(b) and \(c) are printed.

        **io_manager** : FileObjManager, optional
            Manager for I/O in this routine.

    **Returns**
        **x** : bool, ndarray, shape :math:`\left(\textit{ns}, \textit{ip}\right)`
            Given a valid parameter set then the first :math:`s` rows of :math:`\mathrm{x}` still contain the :math:`s` different score patterns. However, the following points should be noted:

            (i) If the estimated factor loading for the :math:`j`\ th item is negative then that item is re-coded, i.e., :math:`0`\ s and :math:`1`\ s (or :math:`\mathbf{True}` and :math:`\mathbf{False}`) in the :math:`j`\ th column of :math:`\mathrm{x}` are interchanged.

            (#) The rows of :math:`\mathrm{x}` will be reordered so that the theta scores corresponding to rows of :math:`\mathrm{x}` are in increasing order of magnitude.

        **irl** : int, ndarray, shape :math:`\left(\textit{ns}\right)`
            Given a valid parameter set then the first :math:`s` components of :math:`\mathrm{irl}` are reordered as are the rows of :math:`\mathrm{x}`.

        **a** : float, ndarray, shape :math:`\left(\textit{ip}\right)`
            :math:`\mathrm{a}[\textit{j}-1]` contains the latest estimate of :math:`\alpha_{{\textit{j}1}}`, for :math:`\textit{j} = 1,2,\ldots,p`. (Because of possible recoding all elements of :math:`\mathrm{a}` will be positive.)

        **c** : float, ndarray, shape :math:`\left(\textit{ip}\right)`
            :math:`\mathrm{c}[\textit{j}-1]` contains the latest estimate of :math:`\alpha_{{\textit{j}0}}`, for :math:`\textit{j} = 1,2,\ldots,p`.

        **niter** : int
            Given a valid parameter set then :math:`\mathrm{niter}` contains the number of iterations performed by the maximum likelihood search function.

        **alpha** : float, ndarray, shape :math:`\left(\textit{ip}\right)`
            Given a valid parameter set then :math:`\mathrm{alpha}[j-1]` contains the latest estimate of :math:`\alpha_j`. (Because of possible recoding all elements of :math:`\mathrm{alpha}` will be positive.)

        **pigam** : float, ndarray, shape :math:`\left(\textit{ip}\right)`
            Given a valid parameter set then :math:`\mathrm{pigam}[j-1]` contains the latest estimate of either :math:`\pi_j` if :math:`\mathrm{gprob} = \mathbf{False}` (logit model) or :math:`\gamma_j` if :math:`\mathrm{gprob} = \mathbf{True}` (probit model).

        **cm** : float, ndarray, shape :math:`\left(2\times \textit{ip}, 2\times \textit{ip}\right)`
            Given a valid parameter set then the strict lower triangle of :math:`\mathrm{cm}` contains the correlation matrix of the parameter estimates held in :math:`\mathrm{alpha}` and :math:`\mathrm{pigam}` on exit. The diagonal elements of :math:`\mathrm{cm}` contain the standard errors. Thus:

            .. rst-class:: nag-rules-none nag-align-left

            +--------------------------------------------+-+------------------------------------------------------------------------------+
            |:math:`\mathrm{cm}[2\times i-2,2\times i-2]`|=|standard error :math:`\left(\mathrm{alpha}[i-1]\right)`                       |
            +--------------------------------------------+-+------------------------------------------------------------------------------+
            |:math:`\mathrm{cm}[2\times i-1,2\times i-1]`|=|standard error :math:`\left(\mathrm{pigam}[i-1]\right)`                       |
            +--------------------------------------------+-+------------------------------------------------------------------------------+
            |:math:`\mathrm{cm}[2\times i-1,2\times i-2]`|=|correlation :math:`\left({\mathrm{pigam}[i-1]}, {\mathrm{alpha}[i-1]}\right)`,|
            +--------------------------------------------+-+------------------------------------------------------------------------------+

            for :math:`i = 1,2,\ldots,p`;

            .. rst-class:: nag-rules-none nag-align-left

            +--------------------------------------------+-+------------------------------------------------------------------------------+
            |:math:`\mathrm{cm}[2\times i-2,2\times j-2]`|=|correlation :math:`\left({\mathrm{alpha}[i-1]}, {\mathrm{alpha}[j-1]}\right)` |
            +--------------------------------------------+-+------------------------------------------------------------------------------+
            |:math:`\mathrm{cm}[2\times i-1,2\times j-1]`|=|correlation :math:`\left({\mathrm{pigam}[i-1]}, {\mathrm{pigam}[j-1]}\right)` |
            +--------------------------------------------+-+------------------------------------------------------------------------------+
            |:math:`\mathrm{cm}[2\times i-2,2\times j-1]`|=|correlation :math:`\left({\mathrm{alpha}[i-1]}, {\mathrm{pigam}[j-1]}\right)` |
            +--------------------------------------------+-+------------------------------------------------------------------------------+
            |:math:`\mathrm{cm}[2\times i-1,2\times j-2]`|=|correlation :math:`\left({\mathrm{alpha}[j-1]}, {\mathrm{pigam}[i-1]}\right)`,|
            +--------------------------------------------+-+------------------------------------------------------------------------------+

            for :math:`j = 1,2,\ldots,i-1`.

            If the second derivative matrix cannot be computed then all the elements of :math:`\mathrm{cm}` are returned as zero.

        **g** : float, ndarray, shape :math:`\left(2\times \textit{ip}\right)`
            Given a valid parameter set then :math:`\mathrm{g}` contains the estimated gradient vector corresponding to the final point held in the arrays :math:`\mathrm{alpha}` and :math:`\mathrm{pigam}`. :math:`\mathrm{g}[2\times \textit{j}-2]` contains the derivative of the log-likelihood with respect to :math:`\mathrm{alpha}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. :math:`\mathrm{g}[2\times \textit{j}-1]` contains the derivative of the log-likelihood with respect to :math:`\mathrm{pigam}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`.

        **expp** : float, ndarray, shape :math:`\left(\textit{ip}, \textit{ip}\right)`
            Given a valid parameter set then :math:`\mathrm{expp}[i-1,j-1]` contains the expected percentage of individuals in the sample who respond positively to items :math:`i` and :math:`j` (:math:`j\leq i`), corresponding to the final point held in the arrays :math:`\mathrm{alpha}` and :math:`\mathrm{pigam}`.

        **obs** : float, ndarray, shape :math:`\left(\textit{ip}, \textit{ip}\right)`
            Given a valid parameter set then :math:`\mathrm{obs}[i-1,j-1]` contains the observed percentage of individuals in the sample who responded positively to items :math:`i` and :math:`j` (:math:`j\leq i`).

        **exf** : float, ndarray, shape :math:`\left(\textit{ns}\right)`
            Given a valid parameter set then :math:`\mathrm{exf}[l-1]` contains the expected frequency of the :math:`l`\ th score pattern (:math:`l`\ th row of :math:`\mathrm{x}`), corresponding to the final point held in the arrays :math:`\mathrm{alpha}` and :math:`\mathrm{pigam}`.

        **y** : float, ndarray, shape :math:`\left(\textit{ns}\right)`
            Given a valid parameter set then :math:`\mathrm{y}[l-1]` contains the estimated theta score corresponding to the :math:`l`\ th row of :math:`\mathrm{x}`, for the final point held in the arrays :math:`\mathrm{alpha}` and :math:`\mathrm{pigam}`.

        **xl** : float, ndarray, shape :math:`\left(:\right)`
            If :math:`\mathrm{gprob}` has been set equal to :math:`\mathbf{False}` (logit model) then, given a valid parameter set, :math:`\mathrm{xl}[l-1]` contains the estimated component score corresponding to the :math:`l`\ th row of :math:`\mathrm{x}` for the final point held in the arrays :math:`\mathrm{alpha}` and :math:`\mathrm{pigam}`.

            If :math:`\mathrm{gprob}` is set equal to :math:`\mathbf{True}` (probit model), this array is not used.

        **iob** : int, ndarray, shape :math:`\left(\textit{ns}\right)`
            Given a valid parameter set then :math:`\mathrm{iob}[l-1]` contains the number of items in the :math:`l`\ th row of :math:`\mathrm{x}` for which the response was positive (:math:`\mathbf{True}`).

        **rlogl** : float
            Given a valid parameter set then :math:`\mathrm{rlogl}` contains the value of the log-likelihood kernel corresponding to the final point held in the arrays :math:`\mathrm{alpha}` and :math:`\mathrm{pigam}`, namely

            .. math::
                \sum_{{l = 0}}^{{s-1}}\mathrm{irl}[l]\times \log\left(\mathrm{exf}[l]/n\right)\text{.}

        **chi** : float
            If :math:`\mathrm{chisqr}` was set equal to :math:`\mathbf{True}` on entry, then given a valid parameter set, :math:`\mathrm{chi}` will contain the value of the likelihood ratio statistic corresponding to the final parameter estimates held in the arrays :math:`\mathrm{alpha}` and :math:`\mathrm{pigam}`, namely

            .. math::
                2\times \sum_{{l = 0}}^{{s-1}}\mathrm{irl}[l]\times \log\left(\mathrm{exf}[l]/\mathrm{irl}[l]\right)\text{.}

            The summation is over those elements of :math:`\mathrm{irl}` which are positive. If :math:`\mathrm{exf}[l-1]` is less than :math:`5.0`, then adjacent score patterns are pooled (the score patterns in :math:`\mathrm{x}` being first put in order of increasing theta score).

            If :math:`\mathrm{chisqr}` has been set equal to :math:`\mathbf{False}`, then :math:`\mathrm{chi}` is not used.

        **idf** : int
            If :math:`\mathrm{chisqr}` was set equal to :math:`\mathbf{True}` on entry, then given a valid parameter set, :math:`\mathrm{idf}` will contain the degrees of freedom associated with the likelihood ratio statistic, :math:`\mathrm{chi}`.

            .. rst-class:: nag-rules-none nag-align-left

            +--------------------------------------+---------------------+
            |:math:`\mathrm{idf} = s_0-2\times p`  |if :math:`s_0 < 2^p`;|
            +--------------------------------------+---------------------+
            |:math:`\mathrm{idf} = s_0-2\times p-1`|if :math:`s_0 = 2^p`,|
            +--------------------------------------+---------------------+

            where :math:`s_0` denotes the number of terms summed to calculate :math:`\mathrm{chi}` (:math:`s_0 = s` only if there is no pooling).

            If :math:`\mathrm{chisqr}` has been set equal to :math:`\mathbf{False}`, :math:`\mathrm{idf}` is not used.

        **siglev** : float
            If :math:`\mathrm{chisqr}` was set equal to :math:`\mathbf{True}` on entry, then given a valid parameter set, :math:`\mathrm{siglev}` will contain the significance level of :math:`\mathrm{chi}` based on :math:`\mathrm{idf}` degrees of freedom. If :math:`\mathrm{idf}` is zero or negative then :math:`\mathrm{siglev}` is set to zero.

            If :math:`\mathrm{chisqr}` was set equal to :math:`\mathbf{False}`, :math:`\mathrm{siglev}` is not used.

    .. _g11sa-py2-py-errors:

    **Raises**
        **NagValueError**
            (`errno` :math:`1`)
                On entry, :math:`\sum_i\left(\mathrm{irl}[i]\right) = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\sum_i\left(\mathrm{irl}[i]\right) = \mathrm{n}`.

            (`errno` :math:`1`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{irl}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{irl}[i-1]\geq 0`.

            (`errno` :math:`1`)
                On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: rows :math:`i` and :math:`j` of :math:`\mathrm{x}` should not be identical.

            (`errno` :math:`1`)
                On entry, :math:`\textit{ns} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{ns} > 2\times \textit{ip}`.

            (`errno` :math:`1`)
                On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{ip} \geq 3`.

            (`errno` :math:`1`)
                On entry, :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{n} \geq 7`.

            (`errno` :math:`1`)
                On entry, :math:`\mathrm{ishow} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{ishow} = 0`, :math:`1`, :math:`2`, :math:`3`, :math:`4`, :math:`5`, :math:`6` or :math:`7`.

            (`errno` :math:`1`)
                On entry, :math:`\textit{ns} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{ns}\leq 2^{\textit{ip}}`.

            (`errno` :math:`1`)
                On entry, :math:`\textit{ns} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{ns}\leq \mathrm{n}`.

            (`errno` :math:`1`)
                On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\mathrm{maxit} \geq 1`.

            (`errno` :math:`2`)
                For at least one of the :math:`\textit{ip}` items the responses are all at the same level.

            (`errno` :math:`3`)
                :math:`\mathrm{maxit}` iterations have been performed: :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`.

            (`errno` :math:`4`)
                One of the elements of :math:`\mathrm{a}` has exceeded :math:`10` in absolute.

            (`errno` :math:`5`)
                Failure to invert Hessian matrix and :math:`\mathrm{maxit}` iterations made: :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`.

            (`errno` :math:`6`)
                Failure to invert Hessian matrix plus Heywood case encountered.

    **Warns**
        **NagAlgorithmicWarning**
            (`errno` :math:`7`)
                :math:`\chi^2` statistic has less than one degree of freedom.

    .. _g11sa-py2-py-notes:

    **Notes**
        Given a set of :math:`p` dichotomous variables :math:`\tilde{x} = \left(x_1, x_2, \ldots, x_p\right)^{\prime }`, where :math:`{}^{\prime }` denotes vector or matrix transpose, the objective is to investigate whether the association between them can be adequately explained by a latent variable model of the form (see Bartholomew (1980) and Bartholomew (1987))

        .. math::
            G\left\{\pi_i\left(\theta \right)\right\} = \alpha_{{i0}}+\alpha_{{i1}}\theta \text{.}

        The :math:`x_i` are called item responses and take the value :math:`0` or :math:`1`. :math:`\theta` denotes the latent variable assumed to have a standard Normal distribution over a population of individuals to be tested on :math:`p` items.
        Call :math:`\pi_i\left(\theta \right) = P\left(x_i = 1 | \theta \right)` the item response function: it represents the probability that an individual with latent ability :math:`\theta` will produce a positive response `(1) <https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11saf.html#eqn1>`__ to item :math:`i`. :math:`\alpha_{{i0}}` and :math:`\alpha_{{i1}}` are item parameters which can assume any real values.
        The set of parameters, :math:`\alpha_{{\textit{i}1}}`, for :math:`\textit{i} = 1,2,\ldots,p`, being coefficients of the unobserved variable :math:`\theta`, can be interpreted as 'factor loadings'.

        :math:`G` is a function selected by you as either :math:`\Phi^{-1}` or logit, mapping the interval :math:`\left(0, 1\right)` onto the whole real line.
        Data from a random sample of :math:`n` individuals takes the form of the matrices :math:`X` and :math:`R` defined below:

        .. math::
            X_{{s\times p}} = \left[\begin{array}{llll}x_{11}^{{}}&x_{12}&\ldots &x_{{1p}}\\x_{21}&x_{22}&\ldots &x_{{2p}}\\ \vdots & \vdots && \vdots \\x_{{s1}}&x_{{s2}}&\ldots &x_{{sp}}\end{array}\right] = \left[\begin{array}{l}\tilde{x}_1\\\tilde{x}_2\\ \vdots \\\tilde{x}_s\end{array}\right]\text{, }\quad R_{{s\times 1}} = \left[\begin{array}{l}r_1^{{}}\\r_2\\ \vdots \\r_s\end{array}\right]

        where :math:`\tilde{x}_l = \left(x_{{l1}}, x_{{l2}}, \ldots, x_{{lp}}\right)` denotes the :math:`l`\ th score pattern in the sample, :math:`r_l` the frequency with which :math:`\tilde{x}_l` occurs and :math:`s` the number of different score patterns observed. (Thus :math:`\sum_{{l = 1}}^sr_l = n`).
        It can be shown that the log-likelihood function is proportional to

        .. math::
            \sum_{{l = 1}}^sr_l\log\left(P_l\right)\text{,}

        where

        .. math::
            P_l = P\left(\tilde{x} = \tilde{x}_l\right) = \int_{{-\infty }}^{\infty }P\left(\tilde{x} = \tilde{x}_l | \theta \right)\phi \left(\theta \right){d\theta }

        (:math:`\phi \left(\theta \right)` being the probability density function of a standard Normal random variable).

        :math:`P_l` denotes the unconditional probability of observing score pattern :math:`\tilde{x}_l`.
        The integral in `(2) <https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11saf.html#eqn2>`__ is approximated using Gauss--Hermite quadrature.
        If we take :math:`G\left(z\right) = \mathrm{logit}\left(z\right) = \log\left(\frac{z}{{1-z}}\right)` in `(1) <https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11saf.html#eqn1>`__ and reparameterise as follows,

        .. math::
            \begin{array}{lcl}\alpha_i& = &\alpha_{{i1}}\text{,}\\\pi_i& = &\mathrm{logit}^{-1}\left(\alpha_{{i0}}\right)\text{,}\end{array}

        then `(1) <https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11saf.html#eqn1>`__ reduces to the logit model (see Bartholomew (1980))

        .. math::
            \pi_i\left(\theta \right) = \frac{\pi_i}{{\pi_i+\left(1-\pi_i\right)\mathrm{exp}\left(-\alpha_i\theta \right)}}\text{.}

        If we take :math:`G\left(z\right) = \Phi^{-1}\left(z\right)` (where :math:`\Phi` is the cumulative distribution function of a standard Normal random variable) and reparameterise as follows,

        .. math::
            \begin{array}{lcl}\alpha_i& = &\frac{\alpha_{{i1}}}{\sqrt{\left(1+\alpha_{{i1}}^2\right)}}\\\\\\\gamma_i& = &\frac{{-\alpha_{{i0}}}}{\sqrt{\left(1+\alpha_{{i1}}^2\right)}}\end{array}\text{,}

        then `(1) <https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11saf.html#eqn1>`__ reduces to the probit model (see Bock and Aitkin (1981))

        .. math::
            \pi_i\left(\theta \right) = \phi \left(\frac{{\alpha_i\theta -\gamma_i}}{{\sqrt{\left(1-\alpha_i^2\right)}}}\right)\text{.}

        An E-M algorithm (see Bock and Aitkin (1981)) is used to maximize the log-likelihood function.
        The number of quadrature points used is set initially to :math:`10` and once convergence is attained increased to :math:`20`.

        The theta score of an individual responding in score pattern :math:`\tilde{x}_l` is computed as the posterior mean, i.e., :math:`E\left(\theta | \tilde{x}_l\right)`.
        For the logit model the component score :math:`X_l = \sum_{{j = 1}}^p\alpha_jx_{{lj}}` is also calculated. (Note that in calculating the theta scores and measures of goodness-of-fit ``binary`` automatically reverses the coding on item :math:`j` if :math:`\alpha_j < 0`; it is assumed in the model that a response at the one level is showing a higher measure of latent ability than a response at the zero level.)

        The frequency distribution of score patterns is required as input data.
        If your data is in the form of individual score patterns (uncounted), then :meth:`binary_service` may be used to calculate the frequency distribution.

    .. _g11sa-py2-py-references:

    **References**
        Bartholomew, D J, 1980, `Factor analysis for categorical data (with Discussion)`, J. Roy. Statist. Soc. Ser. B (42), 293--321

        Bartholomew, D J, 1987, `Latent Variable Models and Factor Analysis`, Griffin

        Bock, R D and Aitkin, M, 1981, `Marginal maximum likelihood estimation of item parameters: Application of an E-M algorithm`, Psychometrika (46), 443--459
    """
    raise NotImplementedError

[docs]def binary_service(x):
    r"""
    ``binary_service`` is a service function which may be used prior to calling :meth:`binary` to calculate the frequency distribution of a set of dichotomous score patterns.

    .. _g11sb-py2-py-doc:

    For full information please refer to the NAG Library document for g11sb

    https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g11/g11sbf.html

    .. _g11sb-py2-py-parameters:

    **Parameters**
        **x** : bool, array-like, shape :math:`\left(n, \textit{ip}\right)`
            :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set equal to :math:`\mathbf{True}` if :math:`x_{{\textit{i}\textit{j}}} = 1`, and :math:`\mathbf{False}` if :math:`x_{{\textit{i}\textit{j}}} = 0`, for :math:`\textit{j} = 1,2,\ldots,p`, for :math:`\textit{i} = 1,2,\ldots,n`.

    **Returns**
        **ns** : int
            The number of different score patterns, :math:`s`.

        **x** : bool, ndarray, shape :math:`\left(n, \textit{ip}\right)`
            The first :math:`s` rows of :math:`\mathrm{x}` contain the :math:`s` different score patterns.

        **irl** : int, ndarray, shape :math:`\left(n\right)`
            The frequency with which the :math:`\textit{l}`\ th row of :math:`\mathrm{x}` occurs, for :math:`\textit{l} = 1,2,\ldots,s`.

    .. _g11sb-py2-py-errors:

    **Raises**
        **NagValueError**
            (`errno` :math:`1`)
                On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`n \geq 7`.

            (`errno` :math:`1`)
                On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`.

                Constraint: :math:`\textit{ip} \geq 3`.

    .. _g11sb-py2-py-notes:

    **Notes**
        When each of :math:`n` individuals responds to each of :math:`p` dichotomous variables the data assumes the form of the matrix :math:`X` defined below

        .. math::
            X = \left[\begin{array}{llll}x_{11}&x_{12}&\ldots &x_{{1p}}\\x_{21}&x_{22}&\ldots &x_{{2p}}\\ \vdots & \vdots && \vdots \\x_{{n1}}&x_{{n2}}&\ldots &x_{{np}}\end{array}\right] = \left[\begin{array}{l}{\underline{x}}_1\\{\underline{x}}_2\\ \vdots \\{\underline{x}}_n\end{array}\right]\text{,}

        where the :math:`x` take the value of :math:`0` or :math:`1` and :math:`{\underline{x}}_{\textit{l}} = \left(x_{{\textit{l}1}}, x_{{\textit{l}2}}, \ldots, x_{{\textit{l}p}}\right)`, for :math:`\textit{l} = 1,2,\ldots,n`, denotes the score pattern of the :math:`l`\ th individual. ``binary_service`` calculates the number of different score patterns, :math:`s`, and the frequency with which each occurs.
        This information can then be passed to :meth:`binary`.
    """
    raise NotImplementedError
NAG and Python

Return to Front

Source code for naginterfaces.library.contab