# -*- coding: utf-8 -*-
r"""
Module Summary
--------------
Interfaces for the NAG Mark 30.2 `anova` Chapter.
``anova`` - Analysis of Variance
This module is concerned with methods for analysing the results of designed experiments.
The range of experiments covered include:
single factor designs with equal sized blocks such as randomized complete block and balanced incomplete block designs,
row and column designs such as Latin squares, and
complete factorial designs.
Further designs may be analysed by combining the analyses provided by multiple calls to functions or by using general linear model functions provided in submodule :mod:`~naginterfaces.library.correg`.
See Also
--------
``naginterfaces.library.examples.anova`` :
This subpackage contains examples for the ``anova`` module.
See also the :ref:`library_anova_ex` subsection.
Functionality Index
-------------------
**Analysis of variance for**
complete factorial design: :meth:`factorial`
general block design or completely randomized design: :meth:`random`
two-way hierarchical classification, subgroups of unequal size: :meth:`hier2`
**General linear model**
generate dummy variables and orthogonal polynomials: :meth:`dummyvars`
**Inferences on means**
simultaneous confidence intervals: :meth:`confidence`
sum of squares for contrast between means: :meth:`contrasts`
**Rater Reliability**
intraclass correlation (ICC): :meth:`icc`
For full information please refer to the NAG Library document
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04intro.html
"""
# NAG Copyright 2017-2024.
[docs]def hier2(y, lsub, nobs):
r"""
``hier2`` performs an analysis of variance for a two-way hierarchical classification with subgroups of possibly unequal size, and also computes the treatment group and subgroup means.
A fixed effects model is assumed.
.. _g04ag-py2-py-doc:
For full information please refer to the NAG Library document for g04ag
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04agf.html
.. _g04ag-py2-py-parameters:
**Parameters**
**y** : float, array-like, shape :math:`\left(n\right)`
The elements of :math:`\mathrm{y}` must contain the observations :math:`y_{\mathrm{mij}}` in the following order:
.. math::
y_{111},y_{211},\ldots,y_{{n_{11}11}},y_{112},y_{212},\ldots,y_{{n_{12}12}},\ldots,y_{{11l_1}},\ldots \text{,}
.. math::
y_{{n_{{1l_1}}1l_1}},\ldots,y_{{1ij}},\ldots,y_{{n_{{ij}}ij}},\ldots,y_{{1kl_k}},\ldots,y_{{n_{{kl_k}}kl_k}}\text{.}
In words, the ordering is by group, and within each group is by subgroup, the members of each subgroup being in consecutive locations in :math:`\mathrm{y}`.
**lsub** : int, array-like, shape :math:`\left(k\right)`
The number of subgroups within group :math:`\textit{i}`, :math:`l_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,k`.
**nobs** : int, array-like, shape :math:`\left(l\right)`
The numbers of observations in each subgroup, :math:`n_{{ij}}`, in the following order:
.. math::
n_{11},n_{12},\ldots,n_{{1l_1}},n_{21},\ldots,n_{{2l_2}},\ldots,n_{{k1}},\ldots,n_{{kl_k}}
**Returns**
**ngp** : int, ndarray, shape :math:`\left(k\right)`
The total number of observations in group :math:`\textit{i}`, :math:`n_{{\textit{i}.}}`, for :math:`\textit{i} = 1,2,\ldots,k`.
**gbar** : float, ndarray, shape :math:`\left(k\right)`
The mean for group :math:`\textit{i}`, :math:`\bar{y}_{{.\textit{i}.}}`, for :math:`\textit{i} = 1,2,\ldots,k`.
**sgbar** : float, ndarray, shape :math:`\left(l\right)`
The subgroup means, :math:`\bar{y}_{{.ij}}`, in the following order:
.. math::
\bar{y}_{.11},\bar{y}_{.12},\ldots,\bar{y}_{{.1l}_1},\bar{y}_{.21},\bar{y}_{.22},\ldots,\bar{y}_{{.2l}_2},\ldots,\bar{y}_{{.k1}},\bar{y}_{{.k2}},\ldots,\bar{y}_{{.kl}_k}\text{.}
**gm** : float
The grand mean, :math:`\bar{y}_{\ldots }`.
**ss** : float, ndarray, shape :math:`\left(4\right)`
Contains the sums of squares for the analysis of variance, as follows;
:math:`\mathrm{ss}[0] = \text{}` Between group sum of squares, :math:`\mathrm{ss}_g`,
:math:`\mathrm{ss}[1] = \text{}` Between subgroup within groups sum of squares, :math:`\mathrm{ss}_{{sg}}`,
:math:`\mathrm{ss}[2] = \text{}` Residual sum of squares, :math:`\mathrm{ss}_{\mathrm{res}}`,
:math:`\mathrm{ss}[3] = \text{}` Corrected total sum of squares, :math:`\mathrm{ss}_{\mathrm{tot}}`.
**idf** : int, ndarray, shape :math:`\left(4\right)`
Contains the degrees of freedom attributable to each sum of squares in the analysis of variance, as follows:
:math:`\mathrm{idf}[0] = \text{}` Degrees of freedom for between group sum of squares,
:math:`\mathrm{idf}[1] = \text{}` Degrees of freedom for between subgroup within groups sum of squares,
:math:`\mathrm{idf}[2] = \text{}` Degrees of freedom for residual sum of squares,
:math:`\mathrm{idf}[3] = \text{}` Degrees of freedom for corrected total sum of squares.
**f** : float, ndarray, shape :math:`\left(2\right)`
Contains the mean square ratios, :math:`F_1` and :math:`F_2`, for the between groups variation, and the between subgroups within groups variation, with respect to the residual, respectively.
**fp** : float, ndarray, shape :math:`\left(2\right)`
Contains the significances of the mean square ratios, :math:`p_1` and :math:`p_2` respectively.
.. _g04ag-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`k = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`k > 1`.
(`errno` :math:`2`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{lsub}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{lsub}[i-1] > 0`.
(`errno` :math:`3`)
On entry, :math:`\sum_i\mathrm{lsub}[i] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`l = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\sum_i\mathrm{lsub}[i] = l`.
(`errno` :math:`4`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{nobs}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{nobs}[i-1] > 0`.
(`errno` :math:`5`)
On entry, :math:`\sum_i\mathrm{nobs}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\sum_i\mathrm{nobs}[i-1] = n`.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`6`)
The total corrected sum of squares is zero.
(`errno` :math:`7`)
The residual sum of squares is zero.
.. _g04ag-py2-py-notes:
**Notes**
`No equivalent traditional C interface for this routine exists in the NAG Library.`
In a two-way hierarchical classification, there are :math:`k` (:math:`\text{}\geq 2`) treatment groups, the :math:`i`\ th of which is subdivided into :math:`l_i` treatment subgroups.
The :math:`j`\ th subgroup of group :math:`i` contains :math:`n_{{ij}}` observations, which may be denoted by
.. math::
y_{{1ij}},y_{{2ij}},\ldots,y_{{n_{{ij}}ij}}\text{.}
The general observation is denoted by :math:`y_{{mij}}`, being the :math:`m`\ th observation in subgroup :math:`j` of group :math:`i`, for :math:`1\leq i\leq k`, :math:`1\leq j\leq l_i`, :math:`1\leq m\leq n_{{ij}}`.
The following quantities are computed
(i) The subgroup means
.. math::
\bar{y}_{{.ij}} = \frac{{\sum_{{m = 1}}^{n_{{ij}}}y_{{mij}}}}{n_{{ij}}}
(#) The group means
.. math::
\bar{y}_{{.i.}} = \frac{{\sum_{{j = 1}}^{l_i}\sum_{{m = 1}}^{n_{{ij}}}y_{{mij}}}}{{\sum_{{j = 1}}^{l_i}n_{{ij}}}}
(#) The grand mean
.. math::
\bar{y}_{\ldots } = \frac{{\sum_{{i = 1}}^k\sum_{{j = 1}}^{l_i}\sum_{{m = 1}}^{n_{{ij}}}y_{{mij}}}}{{\sum_{{i = 1}}^k\sum_{{j = 1}}^{l_i}n_{{ij}}}}
(#) The number of observations in each group
.. math::
n_{{i.}} = \sum_{{j = 1}}^{l_i}n_{{ij}}
(#) Sums of squares
.. rst-class:: nag-rules-none nag-align-left
+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Between groups |:math:`\text{} = \mathrm{SS}_g = \sum_{{i = 1}}^kn_{{i.}}\left(\bar{y}_{{.i.}}-\bar{y}_{\ldots }\right)^2` |
+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Between subgroups within groups|:math:`\text{} = \mathrm{SS}_{{sg}} = \sum_{{i = 1}}^k\sum_{{j = 1}}^{l_i}n_{{ij}}\left(y_{{.ij}}-\bar{y}_{{.i.}}\right)^2` |
+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Residual (within subgroups) |:math:`= \mathrm{SS}_{\mathrm{res}} = \sum_{{i = 1}}^k\sum_{{j = 1}}^{l_i}\sum_{{m = 1}}^{n_{{ij}}}\left(y_{{mij}}-\bar{y}_{{.ij}}\right)^2 = \mathrm{SS}_{\mathrm{tot}}-\mathrm{SS}_g-\mathrm{SS}_{{sg}}`|
+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Corrected total |:math:`\text{} = \mathrm{SS}_{\mathrm{tot}} = \sum_{{i = 1}}^k\sum_{{j = 1}}^{l_i}\sum_{{m = 1}}^{n_{{ij}}}\left(y_{{mij}}-\bar{y}_{\ldots }\right)^2` |
+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
(#) Degrees of freedom of variance components
.. rst-class:: nag-rules-none nag-align-left
+------------------------+-----------+
|Between groups: |:math:`k-1`|
+------------------------+-----------+
|Subgroups within groups:|:math:`l-k`|
+------------------------+-----------+
|Residual: |:math:`n-l`|
+------------------------+-----------+
|Total: |:math:`n-1`|
+------------------------+-----------+
where
:math:`l = \sum_{{i = 1}}^kl_i`,
:math:`n = \sum_{{i = 1}}^kn_{{i.}}`
(#) :math:`F` ratios. These are the ratios of the group and subgroup mean squares to the residual mean square.
.. rst-class:: nag-rules-none nag-align-left
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Groups |:math:`F_1 = \frac{{\text{Between groups sum of squares}/\left(k-1\right)}}{{\text{Residual sum of squares}/\left(n-l\right)}} = \frac{{\mathrm{SS}_g/\left(k-1\right)}}{{\mathrm{SS}_{\mathrm{res}}/\left(n-l\right)}}` |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Subgroups|:math:`F_2 = \frac{{\text{Between subgroups (within group) sum of squares}/\left(l-k\right)}}{{\text{Residual sum of squares}/\left(n-l\right)}} = \frac{{\mathrm{SS}_{{sg}}/\left(l-k\right)}}{{\mathrm{SS}_{\mathrm{res}}/\left(n-l\right)}}`|
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
If either :math:`F` ratio exceeds :math:`9999.0`, the value :math:`9999.0` is assigned instead.
(#) :math:`\mathrm{f}` significances. The probability of obtaining a value from the appropriate :math:`F`-distribution which exceeds the computed mean square ratio.
.. rst-class:: nag-rules-none nag-align-left
+---------+-------------------------------------------------------------------------------------+
|Groups |:math:`p_1 = \mathrm{Prob}\left(F_{{\left(k-1\right),\left(n-l\right)}} > F_1\right)`|
+---------+-------------------------------------------------------------------------------------+
|Subgroups|:math:`p_2 = \mathrm{Prob}\left(F_{{\left(l-k\right),\left(n-l\right)}} > F_2\right)`|
+---------+-------------------------------------------------------------------------------------+
where :math:`F_{{\nu_1,\nu_2}}` denotes the central :math:`F`-distribution with degrees of freedom :math:`\nu_1` and :math:`\nu_2`.
If any :math:`F_i = 9999.0`, then :math:`p_i` is set to zero, :math:`i = 1,2`.
.. _g04ag-py2-py-references:
**References**
Kendall, M G and Stuart, A, 1976, `The Advanced Theory of Statistics (Volume 3)`, (3rd Edition), Griffin
Moore, P G, Shirley, E A and Edwards, D E, 1972, `Standard Statistical Calculations`, Pitman
"""
raise NotImplementedError
[docs]def random(y, iblock, nt, it, tol, irdf):
r"""
``random`` computes the analysis of variance and treatment means and standard errors for a randomized block or completely randomized design.
.. _g04bb-py2-py-doc:
For full information please refer to the NAG Library document for g04bb
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04bbf.html
.. _g04bb-py2-py-parameters:
**Parameters**
**y** : float, array-like, shape :math:`\left(n\right)`
The observations in the order as described by :math:`\mathrm{iblock}` and :math:`\mathrm{nt}`.
**iblock** : int
Indicates the block structure.
:math:`\mathrm{abs}\left(\mathrm{iblock}\right)\leq 1`
There are no blocks, i.e., it is a completely randomized design.
:math:`\mathrm{iblock} \geq 2`
There are :math:`\mathrm{iblock}` blocks and the data should be input by blocks, i.e., :math:`\mathrm{y}` must contain the observations for block :math:`1` followed by the observations for block :math:`2`, etc.
:math:`\mathrm{iblock} \leq -2`
There are :math:`\mathrm{abs}\left(\mathrm{iblock}\right)` blocks and the data is input in parallel with respect to blocks, i.e., :math:`\mathrm{y}[0]` must contain the first observation for block :math:`1`, :math:`\mathrm{y}[1]` must contain the first observation for block :math:`2 \cdots \mathrm{y}[\mathrm{abs}\left(\mathrm{iblock}\right)-1]` must contain the first observation for block :math:`\mathrm{abs}\left(\mathrm{iblock}\right),\mathrm{y}[\mathrm{abs}\left(\mathrm{iblock}+1\right)-1]` must contain the second observation for block :math:`1`, etc.
**nt** : int
The number of treatments. If only blocks are required in the analysis then set :math:`\mathrm{nt} = 1`.
**it** : int, array-like, shape :math:`\left(:\right)`
Note: the required length for this argument is determined as follows: if :math:`\mathrm{nt}\geq 2`: :math:`n`; otherwise: :math:`1`.
:math:`\mathrm{it}[\textit{i}-1]` indicates which of the :math:`\mathrm{nt}` treatments plot :math:`\textit{i}` received, for :math:`\textit{i} = 1,2,\ldots,n`.
If :math:`\mathrm{nt} = 1`, :math:`\mathrm{it}` is not referenced.
**tol** : float
The tolerance value used to check for zero eigenvalues of the matrix :math:`\Omega`. If :math:`\mathrm{tol} = 0.0` a default value of :math:`10^{-5}` is used.
**irdf** : int
An adjustment to the degrees of freedom for the residual and total.
:math:`\mathrm{irdf}\geq 1`
The degrees of freedom for the total is set to :math:`n-\mathrm{irdf}` and the residual degrees of freedom adjusted accordingly.
:math:`\mathrm{irdf} = 0`
The total degrees of freedom for the total is set to :math:`n-1`, as usual.
**Returns**
**gmean** : float
The grand mean, :math:`\hat{\mu }`.
**bmean** : float, ndarray, shape :math:`\left(\left\lvert \mathrm{iblock}\right\rvert\right)`
If :math:`\mathrm{abs}\left(\mathrm{iblock}\right)\geq 2`, :math:`\mathrm{bmean}[\textit{j}-1]` contains the mean for the :math:`\textit{j}`\ th block, :math:`\hat{\beta }_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,b`.
**tmean** : float, ndarray, shape :math:`\left(\mathrm{nt}\right)`
If :math:`\mathrm{nt}\geq 2`, :math:`\mathrm{tmean}[\textit{l}-1]` contains the (adjusted) mean for the :math:`\textit{l}`\ th treatment, :math:`\hat{\mu }^*+\hat{\tau }_{\textit{l}}`, for :math:`\textit{l} = 1,2,\ldots,t`, where :math:`\hat{\mu }^*` is the mean of the treatment adjusted observations, :math:`y_{{ij\left(l\right)}}-\hat{\tau }_l`.
**tabl** : float, ndarray, shape :math:`\left(4, 5\right)`
The analysis of variance table. Column 1 contains the degrees of freedom, column 2 the sum of squares, and where appropriate, column 3 the mean squares, column 4 the :math:`F`-statistic and column 5 the significance level of the :math:`F`-statistic. Row 1 is for Blocks, row 2 for Treatments, row 3 for Residual and row 4 for Total. Mean squares are computed for all but the Total row; :math:`F`-statistics and significance are computed for Treatments and Blocks, if present. Any unfilled cells are set to zero.
**c** : float, ndarray, shape :math:`\left(\mathrm{nt}, \mathrm{nt}\right)`
If :math:`\mathrm{nt}\geq 2`, the upper triangular part of :math:`\mathrm{c}` contains the variance-covariance matrix of the treatment effects, the strictly lower triangular part contains the standard errors of the difference between two treatment effects (means), i.e., :math:`\mathrm{c}[\textit{i}-1,\textit{j}-1]` contains the covariance of treatment :math:`\textit{i}` and :math:`\textit{j}` if :math:`\textit{j}\geq \textit{i}` and the standard error of the difference between treatment :math:`\textit{i}` and :math:`\textit{j}` if :math:`\textit{j} < \textit{i}`, for :math:`\textit{j} = 1,2,\ldots,t`, for :math:`\textit{i} = 1,2,\ldots,t`.
**irep** : int, ndarray, shape :math:`\left(\mathrm{nt}\right)`
If :math:`\mathrm{nt}\geq 2`, the treatment replications, :math:`R_{{\textit{l}\textit{l}}}`, for :math:`\textit{l} = 1,2,\ldots,\mathrm{nt}`.
**r** : float, ndarray, shape :math:`\left(n\right)`
The residuals, :math:`r_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`.
**ef** : float, ndarray, shape :math:`\left(\mathrm{nt}\right)`
If :math:`\mathrm{nt}\geq 2`, the canonical efficiency factors.
.. _g04bb-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{irdf} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{irdf}\geq 0`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{tol}\geq 0.0`.
(`errno` :math:`1`)
On entry, no blocks or treatments in model.
(`errno` :math:`1`)
On entry, :math:`\mathrm{nt} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{nt}\geq 1`.
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n\geq 2`.
(`errno` :math:`2`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{iblock} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n\geq 2` and if :math:`\mathrm{abs}\left(\mathrm{iblock}\right)\geq 2`, :math:`\textit{n}` must be a multiple of :math:`\mathrm{abs}\left(\mathrm{iblock}\right)`.
(`errno` :math:`3`)
On entry, at least one treatment is not present. Treatment :math:`\langle\mathit{\boldsymbol{value}}\rangle` does not appear in :math:`\mathrm{it}`.
(`errno` :math:`3`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{it}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{nt} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`1\leq \mathrm{it}[i-1]\leq \mathrm{nt}`.
(`errno` :math:`4`)
On entry, the values in :math:`\mathrm{y}` are constant.
(`errno` :math:`5`)
The computation of the eigenvalues has failed to converge.
(`errno` :math:`5`)
A computed standard error is zero.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`6`)
The treatments are totally confounded with blocks.
(`errno` :math:`7`)
The residual degrees of freedom is zero.
(`errno` :math:`7`)
The residual mean square is zero.
(`errno` :math:`8`)
The design is disconnected.
.. _g04bb-py2-py-notes:
**Notes**
`In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`
In a completely randomized design, experimental material is divided into a number of units, or plots, to which a treatment can be applied.
In a randomized block design the units are grouped into blocks so that the variation within blocks is less than the variation between blocks.
If every treatment is applied to one plot in each block it is a complete block design.
If there are fewer plots per block than treatments then the design will be an incomplete block design and may be balanced or partially balanced.
For a completely randomized design, with :math:`t` treatments and :math:`n_t` plots per treatment, the linear model is
.. math::
y_{{ij}} = \mu +\tau_j+e_{{ij}}\text{, }\quad j = 1,2,\ldots,t\text{ and }i = 1,2,\ldots,n_j\text{,}
where :math:`y_{{ij}}` is the :math:`i`\ th observation for the :math:`j`\ th treatment, :math:`\mu` is the overall mean, :math:`\tau_j` is the effect of the :math:`j`\ th treatment and :math:`e_{{ij}}` is the random error term.
For a randomized block design, with :math:`t` treatments and :math:`b` blocks of :math:`k` plots, the linear model is
.. math::
y_{{ij\left(l\right)}} = \mu +\beta_i+\tau_l+e_{{ij}}\text{, }\quad i = 1,2,\ldots,b\text{, }j = 1,2,\ldots,k\text{ and }l = 1,2,\ldots,t\text{,}
where :math:`\beta_i` is the effect of the :math:`i`\ th block and the :math:`ij\left(l\right)` notation indicates that the :math:`l`\ th treatment is applied to the :math:`j`\ th plot in the :math:`i`\ th block.
The completely randomized design gives rise to a one-way analysis of variance.
The treatments do not have to be equally replicated, i.e., do not have to occur the same number of times.
First the overall mean, :math:`\hat{\mu }`, is computed and subtracted from the observations to give :math:`y_{{ij}}^{\prime } = y_{{ij}}-\hat{\mu }`.
The estimated treatment effects, :math:`\hat{\tau }_j` are then computed as the treatment means of the mean adjusted observations, :math:`y_{{ij}}^{\prime }`, and the treatment sum of squares can be computed from the sum of squares of the treatment totals of the :math:`y_{{ij}}^{\prime }` divided by the number of observations per treatment total, :math:`n_j`.
The final residuals are computed as :math:`r_{{ij}} = y_{{ij}}^{\prime }-\hat{\tau }_j`, and, from the residuals, the residual sum of squares is calculated.
For the randomized block design the mean is computed and subtracted from the observations to give :math:`y_{{ij\left(l\right)}}^{\prime } = y_{{ij\left(l\right)}}-\hat{\mu }`.
The estimated block effects, ignoring treatment effects, :math:`\hat{\beta }_i`, are then computed using the block means of the :math:`y_{{ij\left(l\right)}}^{\prime }` and the unadjusted sum of squares computed as the sum of squared block totals for the :math:`y_{{ij\left(l\right)}}^{\prime }` divided by number of plots per block, :math:`k`.
The block adjusted observations are then computed as :math:`y_{{ij\left(l\right)}}^{{\prime \prime }} = {y_i^{\prime }j}_{\left(l\right)} = \hat{\beta }_i`.
In the case of the complete block design, with the same replication for each treatment within each block, the blocks and treatments are orthogonal, and so the treatment effects are estimated as the treatment means of the block adjusted observations, :math:`y_{{ij\left(l\right)}}^{{\prime \prime }}`.
The treatment sum of squares is computed as the sum of squared treatment totals of the :math:`y_{{ij\left(l\right)}}^{{\prime \prime }}` divided by the number of replicates to the treatments, :math:`r = bk/t`.
Finally the residuals, and hence the residual sum of squares, are given by :math:`r_{{ij\left(l\right)}} = y_{{ij\left(l\right)}}^{{\prime \prime }}-\hat{\tau }_l`.
For a design without the same replication for each treatment within each block the treatments and the blocks will not be orthogonal, so the treatments adjusted for blocks need to be computed.
The adjusted treatment effects are found as the solution to the equations
.. math::
\left(R-NN^{\mathrm{T}}/k\right)\hat{\tau } = q\text{,}
where :math:`q` is the vector of the treatment totals for block adjusted observations, :math:`y_{{ij\left(l\right)}}^{{\prime \prime }}`, :math:`R` is a diagonal matrix with :math:`R_{{ll}}` equal to the number of times the :math:`l`\ th treatment is replicated, and :math:`n` is the :math:`t\times b` incidence matrix, with :math:`N_{{lj}}` equal to the number of times treatment :math:`l` occurs in block :math:`j`.
The solution to the equations can be written as
.. math::
\hat{\tau } = \Omega q
where :math:`\Omega` is a generalized inverse of :math:`\left(R-NN^{\mathrm{T}}/k\right)`.
The solution is found from the eigenvalue decomposition of :math:`\left(R-NN^{\mathrm{T}}/k\right)`.
The residuals are first calculated by subtracting the estimated treatment effects from the block adjusted observations to give :math:`r_{{ij\left(l\right)}}^{\prime } = y_{{ij\left(l\right)}}^{{\prime \prime }}-\hat{\tau }_l`.
However, since only the unadjusted block effects have been removed and blocks and treatments are not orthogonal, the block means of the :math:`r_{{ij\left(l\right)}}^{\prime }` have to be subtracted to give the correct residuals, :math:`r_{{ij\left(l\right)}}` and residual sum of squares.
The mean squares are computed as the sum of squares divided by the degrees of freedom.
The degrees of freedom for the unadjusted blocks is :math:`b-1`, for the completely randomized and the complete block designs the degrees of freedom for the treatments is :math:`t-1`.
In the general case the degrees of freedom for treatments is the rank of the matrix :math:`\Omega`.
The :math:`F`-statistic given by the ratio of the treatment mean square to the residual mean square tests the hypothesis
.. math::
H_0:\tau_1 = \tau_2 = \cdots = \tau_t = 0\text{.}
The standard errors for the difference in treatment effects, or treatment means, for the completely randomized or the complete block designs, are given by:
.. math::
\mathrm{se}\left(\tau_j-\tau_{{j*}}\right) = \left(\frac{1}{n_j}+\frac{1}{n_{{j*}}}\right)s^2
where :math:`s^2` is the residual mean square and :math:`n_j = n_{{j*}} = b` in the complete block design.
In the general case the variances of the treatment effects are given by
.. math::
\mathrm{var}\left(\tau \right) = \Omega s^2
from which the appropriate standard errors of the difference between treatment effects or the difference between adjusted means can be calculated.
In the complete block design all the information on the treatment effects is given by the within block analysis.
In other designs there may be a loss of information due to the non-orthogonality of treatments and blocks.
The efficiency of the within block analysis in these cases is given by the (canonical) efficiency factors, these are the nonzero eigenvalues of the matrix :math:`\left(R-NN^{\mathrm{T}}/k\right)`, divided by the number of replicates in the case of equal replication, or by the mean of the number of replicates in the unequally replicated case, see John (1987).
If more than one eigenvalue is zero then the design is said to be disconnected and some treatments can only be compared using a between block analysis.
.. _g04bb-py2-py-references:
**References**
Cochran, W G and Cox, G M, 1957, `Experimental Designs`, Wiley
Davis, O L, 1978, `The Design and Analysis of Industrial Experiments`, Longman
John, J A, 1987, `Cyclic Designs`, Chapman and Hall
John, J A and Quenouille, M H, 1977, `Experiments: Design and Analysis`, Griffin
Searle, S R, 1971, `Linear Models`, Wiley
"""
raise NotImplementedError
[docs]def rowcol(nrep, nrow, ncol, y, nt, it, tol, irdf):
r"""
``rowcol`` computes the analysis of variance for a general row and column design together with the treatment means and standard errors.
.. _g04bc-py2-py-doc:
For full information please refer to the NAG Library document for g04bc
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04bcf.html
.. _g04bc-py2-py-parameters:
**Parameters**
**nrep** : int
:math:`b`, the number of replicates.
**nrow** : int
:math:`r`, the number of rows per replicate.
**ncol** : int
:math:`c`, the number of columns per replicate.
**y** : float, array-like, shape :math:`\left(\mathrm{nrep}\times \mathrm{nrow}\times \mathrm{ncol}\right)`
The :math:`n = brc` observations ordered by columns within rows within replicates. That is :math:`\mathrm{y}[ rc\left(\textit{i}-1\right) +r\left(\textit{j}-1\right) +\textit{k} -1]` contains the observation from the :math:`\textit{k}`\ th column of the :math:`\textit{j}`\ th row of the :math:`\textit{i}`\ th replicate, for :math:`\textit{k} = 1,2,\ldots,c`, for :math:`\textit{j} = 1,2,\ldots,r`, for :math:`\textit{i} = 1,2,\ldots,b`.
**nt** : int
The number of treatments. If only replicates, rows and columns are required in the analysis then set :math:`\mathrm{nt} = 1`.
**it** : int, array-like, shape :math:`\left(:\right)`
Note: the required length for this argument is determined as follows: if :math:`\mathrm{nt} > 1`: :math:`{ \mathrm{nrep} \times \mathrm{nrow} \times \mathrm{ncol} }`; otherwise: :math:`1`.
If :math:`\mathrm{nt} > 1`, :math:`\mathrm{it}[\textit{i}-1]` indicates which of the :math:`\mathrm{nt}` treatments unit :math:`\textit{i}` received, for :math:`\textit{i} = 1,2,\ldots,n`.
If :math:`\mathrm{nt} = 1`, :math:`\mathrm{it}` is not referenced.
**tol** : float
The tolerance value used to check for zero eigenvalues of the matrix :math:`\Omega`. If :math:`\mathrm{tol} = 0.0` a default value of :math:`0.00001` is used.
**irdf** : int
An adjustment to the degrees of freedom for the residual and total.
:math:`\mathrm{irdf}\geq 1`
The degrees of freedom for the total is set to :math:`n-\mathrm{irdf}` and the residual degrees of freedom adjusted accordingly.
:math:`\mathrm{irdf} = 0`
The total degrees of freedom for the total is set to :math:`n-1`, as usual.
**Returns**
**gmean** : float
The grand mean, :math:`\hat{\mu }`.
**tmean** : float, ndarray, shape :math:`\left(\mathrm{nt}\right)`
If :math:`\mathrm{nt}\geq 2`, :math:`\mathrm{tmean}[\textit{l}-1]` contains the (adjusted) mean for the :math:`\textit{l}`\ th treatment, :math:`\hat{\mu }^*+\hat{\tau }_{\textit{l}}`, for :math:`\textit{l} = 1,2,\ldots,t`, where :math:`\hat{\mu }^*` is the mean of the treatment adjusted observations :math:`y_{{ijk\left(l\right)}}-\hat{\tau }_l`. Otherwise :math:`\mathrm{tmean}` is not referenced.
**tabl** : float, ndarray, shape :math:`\left(6, 5\right)`
The analysis of variance table. Column 1 contains the degrees of freedom, column 2 the sum of squares, and where appropriate, column 3 the mean squares, column 4 the :math:`F`-statistic and column 5 the significance level of the :math:`F`-statistic. Row 1 is for replicates, row 2 for rows, row 3 for columns, row 4 for treatments (if :math:`\mathrm{nt} > 1`), row 5 for residual and row 6 for total. Mean squares are computed for all but the total row, :math:`F`-statistics and significance are computed for treatments, replicates, rows and columns. Any unfilled cells are set to zero.
**c** : float, ndarray, shape :math:`\left(\mathrm{nt}, \mathrm{nt}\right)`
The upper triangular part of :math:`\mathrm{c}` contains the variance-covariance matrix of the treatment effects, the strictly lower triangular part contains the standard errors of the difference between two treatment effects (means), i.e., :math:`\mathrm{c}[\textit{i}-1,\textit{j}-1]` contains the covariance of treatment :math:`\textit{i}` and :math:`\textit{j}` if :math:`\textit{j}\geq \textit{i}` and the standard error of the difference between treatment :math:`\textit{i}` and :math:`\textit{j}` if :math:`\textit{j} < \textit{i}`, for :math:`\textit{j} = 1,2,\ldots,t`, for :math:`\textit{i} = 1,2,\ldots,t`.
**irep** : int, ndarray, shape :math:`\left(\mathrm{nt}\right)`
If :math:`\mathrm{nt} > 1`, the treatment replications, :math:`R_{{\textit{l}\textit{l}}}`, for :math:`\textit{l} = 1,2,\ldots,\mathrm{nt}`. Otherwise :math:`\mathrm{irep}` is not referenced.
**rpmean** : float, ndarray, shape :math:`\left(\mathrm{nrep}\right)`
If :math:`\mathrm{nrep} > 1`, :math:`\mathrm{rpmean}[\textit{i}-1]` contains the mean for the :math:`\textit{i}`\ th replicate, :math:`\hat{\mu }+\hat{\beta }_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,b`. Otherwise :math:`\mathrm{rpmean}` is not referenced.
**rmean** : float, ndarray, shape :math:`\left(\mathrm{nrep}\times \mathrm{nrow}\right)`
:math:`\mathrm{rmean}[\textit{j}-1]` contains the mean for the :math:`\textit{j}`\ th row, :math:`\hat{\mu }+\hat{\rho }_i`, for :math:`\textit{j} = 1,2,\ldots,r`.
**cmean** : float, ndarray, shape :math:`\left(\mathrm{nrep}\times \mathrm{ncol}\right)`
:math:`\mathrm{cmean}[\textit{k}-1]` contains the mean for the :math:`\textit{k}`\ th column, :math:`\hat{\mu }+\hat{\gamma }_{\textit{k}}`, for :math:`\textit{k} = 1,2,\ldots,c`.
**r** : float, ndarray, shape :math:`\left(\mathrm{nrep}\times \mathrm{nrow}\times \mathrm{ncol}\right)`
The residuals, :math:`r_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`.
**ef** : float, ndarray, shape :math:`\left(\mathrm{nt}\right)`
If :math:`\mathrm{nt}\geq 2`, the canonical efficiency factors. Otherwise :math:`\mathrm{ef}` is not referenced.
.. _g04bc-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{nrep} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{nrep}\geq 1`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{irdf} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{irdf}\geq 0`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{tol}\geq 0.0`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{ncol}\geq 2`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{nt} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{nt}\geq 1`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{nrow} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{nrow}\geq 2`.
(`errno` :math:`2`)
On entry, at least one treatment is not present. Treatment :math:`\langle\mathit{\boldsymbol{value}}\rangle` does not appear in :math:`\mathrm{it}`.
(`errno` :math:`2`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{it}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{nt} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`1\leq \mathrm{it}[i-1]\leq \mathrm{nt}`.
(`errno` :math:`3`)
On entry, the values in :math:`\mathrm{y}` are constant.
(`errno` :math:`4`)
The computation of the eigenvalues has failed to converge.
(`errno` :math:`4`)
A computed standard error is zero.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`5`)
The treatments are totally confounded with blocks.
(`errno` :math:`6`)
The residual degrees of freedom is zero.
(`errno` :math:`6`)
The residual mean square is zero.
(`errno` :math:`7`)
The design is disconnected.
.. _g04bc-py2-py-notes:
**Notes**
`In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`
In a row and column design the experimental material can be characterised by a two-way classification, nominally called rows and columns.
Each experimental unit can be considered as being located in a particular row and column.
It is assumed that all rows are of the same length and all columns are of the same length.
Sets of equal numbers of rows/columns can be grouped together to form replicates, sometimes known as squares or rectangles, as appropriate.
If for a replicate, the number of rows, the number of columns and the number of treatments are equal and every treatment occurs once in each row and each column then the design is a Latin square.
If this is not the case the treatments will be non-orthogonal to rows and columns.
For example in the case of a lattice square each treatment occurs only once in each square.
For a row and column design, with :math:`t` treatments in :math:`r` rows and :math:`c` columns and :math:`b` replicates or squares with :math:`n = brc` observations the linear model is:
.. math::
y_{{ijk\left(l\right)}} = \mu +\beta_i+\rho_j+\gamma_k+\tau_l+e_{{ijk}}
for :math:`i = 1,2\ldots,b`, :math:`j = 1,2,\ldots,r`, :math:`k = 1,2\ldots,c` and :math:`l = 1,2,\ldots,t`, where :math:`\beta_i` is the effect of the :math:`i`\ th replicate, :math:`\rho_j` is the effect of the :math:`j`\ th row, :math:`\gamma_k` is the effect of the :math:`k`\ th column and the :math:`ijk\left(l\right)` notation indicates that the :math:`l`\ th treatment is applied to the unit in row :math:`j`, column :math:`k` of replicate :math:`i`.
To compute the analysis of variance for a row and column design the mean is computed and subtracted from the observations to give, :math:`y_{{ijk\left(l\right)}}^{\prime } = y_{{ijk\left(l\right)}}-\hat{\mu }`.
Since the replicates, rows and columns are orthogonal the estimated effects, ignoring treatment effects, :math:`\hat{\beta }_i`, :math:`\hat{\rho }_j`, :math:`\hat{\gamma }_k`, can be computed using the appropriate means of the :math:`y_{{ijk\left(l\right)}}^{\prime }`, and the unadjusted sum of squares computed as the appropriate sum of squared totals for the :math:`y_{{ijk\left(l\right)}}^{\prime }` divided by number of units per total.
The observations adjusted for replicates, rows and columns can then be computed by subtracting the estimated effects from :math:`y_{{ijk\left(l\right)}}^{\prime }` to give :math:`y_{{ijk\left(l\right)}}^{{\prime \prime }}`.
In the case of a Latin square design the treatments are orthogonal to replicates, rows and columns and so the treatment effects, :math:`\hat{\tau }_l`, can be estimated as the treatment means of the adjusted observations, :math:`y_{{ijk\left(l\right)}}^{{\prime \prime }}`.
The treatment sum of squares is computed as the sum of squared treatment totals of the :math:`y_{{ij\left(l\right)}}^{{\prime \prime }}` divided by the number of times each treatment is replicated.
Finally the residuals, and hence the residual sum of squares, are given by :math:`r_{{ij\left(l\right)}} = y_{{ij\left(l\right)}}^{{\prime \prime }}-\hat{\tau }_l`.
For a design which is not orthogonal, for example a lattice square or an incomplete Latin square, the treatment effects adjusted for replicates, rows and columns need to be computed.
The adjusted treatment effects are found as the solution to the equations:
.. math::
A\hat{\tau } = \left(R-N_bN_b^{\mathrm{T}}/\left(rc\right)-N_rN_r^{\mathrm{T}}/\left(bc\right)-N_cN_c^{\mathrm{T}}/\left(br\right)\right)\hat{\tau } = q
where :math:`q` is the vector of the treatment totals of the observations adjusted for replicates, rows and columns, :math:`y_{{ijk\left(l\right)}}^{{\prime \prime }}`, :math:`R` is a diagonal matrix with :math:`R_{{ll}}` equal to the number of times the :math:`l`\ th treatment is replicated, and :math:`N_b` is the :math:`t\times b` incidence matrix, with :math:`N_{{l,i}}` equal to the number of times treatment :math:`l` occurs in replicate :math:`i`, with :math:`N_r` and :math:`N_c` being similarly defined for rows and columns.
The solution to the equations can be written as:
.. math::
\hat{\tau } = \Omega q
where, :math:`\Omega` is a generalized inverse of :math:`A`.
The solution is found from the eigenvalue decomposition of :math:`A`.
The residuals are first calculated by subtracting the estimated adjusted treatment effects from the adjusted observations to give :math:`r_{{ij\left(l\right)}}^{\prime } = y_{{ij\left(l\right)}}^{{\prime \prime }}-\hat{\tau }_l`.
However, since only the unadjusted replicate, row and column effects have been removed and they are not orthogonal to treatments, the replicate, row and column means of the :math:`r_{{ij\left(l\right)}}^{\prime }` have to be subtracted to give the correct residuals, :math:`r_{{ij\left(l\right)}}` and residual sum of squares.
Given the sums of squares, the mean squares are computed as the sums of squares divided by the degrees of freedom.
The degrees of freedom for the unadjusted replicates, rows and columns are :math:`b-1`, :math:`r-1` and :math:`c-1` respectively and for the Latin square designs the degrees of freedom for the treatments is :math:`t-1`.
In the general case the degrees of freedom for treatments is the rank of the matrix :math:`\Omega`.
The :math:`F`-statistic given by the ratio of the treatment mean square to the residual mean square tests the hypothesis:
.. math::
H_0:\tau_1 = \tau_2 = \cdots = \tau_t = 0\text{.}
The standard errors for the difference in treatment effects, or treatment means, for Latin square designs, are given by:
.. math::
se\left(\hat{\tau }_j-\hat{\tau }_{{j*}}\right) = \sqrt{2s^2/\left(bt\right)}
where :math:`s^2` is the residual mean square.
In the general case the variances of the treatment effects are given by:
.. math::
\mathrm{Var}\left(\hat{\tau }\right) = \Omega s^2
from which the appropriate standard errors of the difference between treatment effects or the difference between adjusted means can be calculated.
The analysis of a row-column design can be considered as consisting of different strata: the replicate stratum, the rows within replicate and the columns within replicate strata and the units stratum.
In the Latin square design all the information on the treatment effects is given at the units stratum.
In other designs there may be a loss of information due to the non-orthogonality of treatments and replicates, rows and columns and information on treatments may be available in higher strata.
The efficiency of the estimation at the units stratum is given by the (canonical) efficiency factors, these are the nonzero eigenvalues of the matrix, :math:`A`, divided by the number of replicates in the case of equal replication, or by the mean of the number of replicates in the unequally replicated case, see John (1987).
If more than one eigenvalue is zero then the design is said to be disconnected and information on some treatment comparisons can only be obtained from higher strata.
.. _g04bc-py2-py-references:
**References**
Cochran, W G and Cox, G M, 1957, `Experimental Designs`, Wiley
Davis, O L, 1978, `The Design and Analysis of Industrial Experiments`, Longman
John, J A, 1987, `Cyclic Designs`, Chapman and Hall
John, J A and Quenouille, M H, 1977, `Experiments: Design and Analysis`, Griffin
Searle, S R, 1971, `Linear Models`, Wiley
"""
raise NotImplementedError
[docs]def factorial(y, lfac, nblock, inter, irdf):
r"""
``factorial`` computes an analysis of variance table and treatment means for a complete factorial design.
.. _g04ca-py2-py-doc:
For full information please refer to the NAG Library document for g04ca
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04caf.html
.. _g04ca-py2-py-parameters:
**Parameters**
**y** : float, array-like, shape :math:`\left(n\right)`
The observations in standard order, see :ref:`Notes <g04ca-py2-py-notes>`.
**lfac** : int, array-like, shape :math:`\left(\textit{nfac}\right)`
:math:`\mathrm{lfac}[\textit{i}]` must contain the number of levels for the :math:`\textit{i}`\ th factor, for :math:`\textit{i} = 1,2,\ldots,k`.
**nblock** : int
The number of blocks. If there are no blocks, set :math:`\mathrm{nblock} = 0` or :math:`1`.
**inter** : int
The maximum number of factors in an interaction term. If no interaction terms are to be computed, set :math:`\mathrm{inter} = 0` or :math:`1`.
**irdf** : int
The adjustment to the residual and total degrees of freedom. The total degrees of freedom are set to :math:`n-\mathrm{irdf}` and the residual degrees of freedom adjusted accordingly. For examples of the use of :math:`\mathrm{irdf}` see `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04caf.html#fcomments>`__.
**Returns**
**table** : float, ndarray, shape :math:`\left(\textit{mterm}, 5\right)`
The first :math:`\mathrm{itotal}` rows of :math:`\mathrm{table}` contain the analysis of variance table. The first column contains the degrees of freedom, the second column contains the sum of squares, the third column (except for the row corresponding to the total sum of squares) contains the mean squares, i.e., the sums of squares divided by the degrees of freedom, and the fourth and fifth columns contain the :math:`F` ratio and significance level, respectively (except for rows corresponding to the total sum of squares, and the residual sum of squares). All other cells of the table are set to zero.
The first row corresponds to the blocks and is set to zero if there are no blocks.
The :math:`\mathrm{itotal}`\ th row corresponds to the total sum of squares for :math:`\mathrm{y}` and the :math:`\left(\mathrm{itotal}-1\right)`\ th row corresponds to the residual sum of squares.
The central rows of the table correspond to the main effects followed by the interaction if specified by :math:`\mathrm{inter}`.
The main effects are in the order specified by :math:`\mathrm{lfac}` and the interactions are in lexical order, see :ref:`Notes <g04ca-py2-py-notes>`.
**itotal** : int
The row in :math:`\mathrm{table}` corresponding to the total sum of squares. The number of treatment effects is :math:`\mathrm{itotal}-3`.
**tmean** : float, ndarray, shape :math:`\left(\textit{maxt}\right)`
The treatment means. The position of the means for an effect is given by the index in :math:`\mathrm{imean}`. For a given effect the means are in standard order, see :ref:`Notes <g04ca-py2-py-notes>`.
**e** : float, ndarray, shape :math:`\left(\textit{maxt}\right)`
The estimated effects in the same order as for the means in :math:`\mathrm{tmean}`.
**imean** : int, ndarray, shape :math:`\left(\min\left(\textit{mterm},\left(\mathrm{itotal}-3\right)\right)\right)`
Indicates the position of the effect means in :math:`\mathrm{tmean}`. The effect means corresponding to the first treatment effect in the ANOVA table are stored in :math:`\mathrm{tmean}[0]` up to :math:`\mathrm{tmean}[\mathrm{imean}[0]-1]`. Other effect means corresponding to the :math:`i`\ th treatment effect, :math:`i = 1,2,\ldots,\mathrm{itotal}-3`, are stored in :math:`\mathrm{tmean}[\mathrm{imean}[i-1]]` up to :math:`\mathrm{tmean}[\mathrm{imean}[i]-1]`.
**semean** : float, ndarray, shape :math:`\left(\min\left(\textit{mterm},\left(\mathrm{itotal}-3\right)\right)\right)`
The standard error of the difference between means corresponding to the :math:`i`\ th treatment effect in the ANOVA table.
**bmean** : float, ndarray, shape :math:`\left(\mathrm{nblock}+1\right)`
:math:`\mathrm{bmean}[0]` contains the grand mean, if :math:`\mathrm{nblock} > 1`, :math:`\mathrm{bmean}[1]` up to :math:`\mathrm{bmean}[\mathrm{nblock}]` contain the block means.
**r** : float, ndarray, shape :math:`\left(n\right)`
The residuals.
.. _g04ca-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{irdf} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{irdf}\geq 0`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{inter} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{nfac} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{inter}\leq \textit{nfac}`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{inter} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{inter}\geq 0`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{nblock} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{nblock}\geq 0`.
(`errno` :math:`1`)
On entry, :math:`\textit{nfac} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nfac}\geq 1`.
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n\geq 4`.
(`errno` :math:`2`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{lfac}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`
Constraint: :math:`\mathrm{lfac}[i-1] \geq 2`.
(`errno` :math:`2`)
On entry, the number of plots per block is incompatible with the number of plot factors.
(`errno` :math:`2`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{nblock} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: when :math:`\mathrm{nblock} > 1`, :math:`\textit{n}` must be a multiple of :math:`\mathrm{nblock}`.
(`errno` :math:`3`)
On entry, the values of :math:`\mathrm{y}` are constant.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`4`)
The residual sum of squares is zero.
(`errno` :math:`4`)
There are no degrees of freedom for the residual sum of squares.
.. _g04ca-py2-py-notes:
**Notes**
`In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`
An experiment consists of a collection of units, or plots, to which a number of treatments are applied.
In a factorial experiment the effects of several different sets of conditions are compared, e.g., three different temperatures, :math:`T_1`, :math:`T_2` and :math:`T_3`, and two different pressures, :math:`P_1` and :math:`P_2`.
The conditions are known as factors and the different values the conditions take are known as levels.
In a factorial experiment the experimental treatments are the combinations of all the different levels of all factors, e.g.,
.. math::
\begin{array}{lll}T_1P_1\text{,}&T_2P_1\text{,}&T_3P_1\\&&\\T_1P_2\text{,}&T_2P_2\text{,}&T_3P_2\end{array}
The effect of a factor averaged over all other factors is known as a main effect, and the effect of a combination of some of the factors averaged over all other factors is known as an interaction.
This can be represented by a linear model.
In the above example if the response was :math:`y_{{ijk}}` for the :math:`k`\ th replicate of the :math:`i`\ th level of :math:`T` and the :math:`j`\ th level of :math:`P` the linear model would be
.. math::
y_{{ijk}} = \mu +t_i+p_j+\gamma_{{ij}}+e_{{ijk}}
where :math:`\mu` is the overall mean, :math:`t_i` is the main effect of :math:`T`, :math:`p_j` is the main effect of :math:`P`, :math:`\gamma_{{ij}}` is the :math:`T\times P` interaction and :math:`e_{{ijk}}` is the random error term.
In order to find unique estimates constraints are placed on the parameters estimates.
For the example here these are:
.. math::
\begin{array}{ll}\sum_{{i = 1}}^3\hat{t}_i = 0\text{,}&\\&\\\sum_{{j = 1}}^2\hat{p}_j = 0\text{,}&\\&\\ \sum_{{i = 1}}^3 \hat{\gamma }_{{ij}} = 0 \text{,} &\text{for }j = 1,2\text{ and}\\&\\ \sum_{{j = 1}}^2 \hat{\gamma }_{{ij}} = 0 \text{,} & \text{for } i = 1,2,3 \text{,} \end{array}
where :math:`\hat{\text{ }}` denotes the estimate.
If there is variation in the experimental conditions (e.g., in an experiment on the production of a material different batches of raw material may be used, or the experiment may be carried out on different days), then plots that are similar are grouped together into blocks.
For a balanced complete factorial experiment all the treatment combinations occur the same number of times in each block.
``factorial`` computes the analysis of variance (ANOVA) table by sequentially computing the totals and means for an effect from the residuals computed when previous effects have been removed.
The effect sum of squares is the sum of squared totals divided by the number of observations per total.
The means are then subtracted from the residuals to compute a new set of residuals.
At the same time the means for the original data are computed.
When all effects are removed the residual sum of squares is computed from the residuals.
Given the sums of squares an ANOVA table is then computed along with standard errors for the difference in treatment means.
The data for ``factorial`` has to be in standard order given by the order of the factors.
Let there be :math:`k` factors, :math:`f_1,f_2,\ldots,f_k` in that order with levels :math:`l_1,l_2,\ldots,l_k` respectively.
Standard order requires the levels of factor :math:`f_1` are in order :math:`1,2,\ldots,l_1` and within each level of :math:`f_1` the levels of :math:`f_2` are in order :math:`1,2,\ldots,l_2` and so on.
For an experiment with blocks the data is for block :math:`1` then for block :math:`2`, etc.
Within each block the data must be arranged so that the levels of factor :math:`f_1` are in order :math:`1,2,\ldots,l_1` and within each level of :math:`f_1` the levels of :math:`f_2` are in order :math:`1,2,\ldots,l_2` and so on.
Any within block replication of treatment combinations must occur within the levels of :math:`f_k`.
The ANOVA table is given in the following order.
For a complete factorial experiment the first row is for blocks, if present, then the main effects of the factors in their order, e.g., :math:`f_1` followed by :math:`f_2`, etc.
These are then followed by all the two factor interactions then all the three factor interactions, etc., the last two rows being for the residual and total sums of squares.
The interactions are arranged in lexical order for the given factor order.
For example, for the three factor interactions for a five factor experiment the :math:`10` interactions would be in the following order:
.. math::
\begin{array}{l}f_1f_2f_3\\f_1f_2f_4\\f_1f_2f_5\\f_1f_3f_4\\f_1f_3f_5\\f_1f_4f_5\\f_2f_3f_4\\f_2f_3f_5\\f_2f_4f_5\\f_3f_4f_5\end{array}
.. _g04ca-py2-py-references:
**References**
Cochran, W G and Cox, G M, 1957, `Experimental Designs`, Wiley
Davis, O L, 1978, `The Design and Analysis of Industrial Experiments`, Longman
John, J A and Quenouille, M H, 1977, `Experiments: Design and Analysis`, Griffin
"""
raise NotImplementedError
[docs]def contrasts(tmean, irep, rms, rdf, ct, tabl, tol, usetx, tx):
r"""
``contrasts`` computes sum of squares for a user-defined contrast between means.
.. _g04da-py2-py-doc:
For full information please refer to the NAG Library document for g04da
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04daf.html
.. _g04da-py2-py-parameters:
**Parameters**
**tmean** : float, array-like, shape :math:`\left(\textit{nt}\right)`
The treatment means, :math:`\hat{\tau }_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,t`.
**irep** : int, array-like, shape :math:`\left(\textit{nt}\right)`
The replication for each treatment mean, :math:`n_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,t`.
**rms** : float
The residual mean square, :math:`\hat{\sigma }^2`.
**rdf** : float
The residual degrees of freedom.
**ct** : float, array-like, shape :math:`\left(\textit{nt}, \textit{nc}\right)`
The columns of :math:`\mathrm{ct}` must contain the :math:`\textit{nc}` contrasts, that is :math:`\mathrm{ct}[\textit{i}-1,\textit{j}-1]` must contain :math:`\lambda_{{\textit{i}\textit{j}}}`, for :math:`\textit{j} = 1,2,\ldots,\textit{nc}`, for :math:`\textit{i} = 1,2,\ldots,t`.
**tabl** : float, array-like, shape :math:`\left(\textit{nc}, 5\right)`
The elements of :math:`\mathrm{tabl}` that are not referenced as described below remain unchanged.
**tol** : float
The tolerance, :math:`\epsilon` used to check if the contrasts are orthogonal and if they are orthogonal to the mean. If :math:`\mathrm{tol}\leq 0.0` the value machine precision is used.
**usetx** : bool
If :math:`\mathrm{usetx} = \mathbf{True}` the means :math:`\tau_i^*` are provided in :math:`\mathrm{tx}` and the formula `(2) <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04daf.html#eqn2>`__ is used instead of formula `(1) <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04daf.html#eqn1>`__.
If :math:`\mathrm{usetx} = \mathbf{False}` formula `(1) <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04daf.html#eqn1>`__ is used and :math:`\mathrm{tx}` is not referenced.
**tx** : float, array-like, shape :math:`\left(\textit{nt}\right)`
If :math:`\mathrm{usetx} = \mathbf{True}` :math:`\mathrm{tx}` must contain the means :math:`\tau_{\textit{i}}^*`, for :math:`\textit{i} = 1,2,\ldots,t`.
**Returns**
**est** : float, ndarray, shape :math:`\left(\textit{nc}\right)`
The estimates of the contrast, :math:`\Lambda_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\textit{nc}`.
**tabl** : float, ndarray, shape :math:`\left(\textit{nc}, 5\right)`
The rows of the analysis of variance table for the contrasts. For each row column 1 contains the degrees of freedom, column 2 contains the sum of squares, column 3 contains the mean square, column 4 the :math:`F`-statistic and column 5 the significance level for the contrast. Note that the degrees of freedom are always one and so the mean square equals the sum of squares.
.. _g04da-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{rdf} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{rdf}\geq 1.0`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{rms} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{rms} > 0.0`.
(`errno` :math:`1`)
On entry, :math:`\textit{nt} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nt}\geq 2`.
(`errno` :math:`1`)
On entry, :math:`\textit{nc} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nc}\geq 1`.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`2`)
The :math:`\langle\mathit{\boldsymbol{value}}\rangle`:math:`\langle\mathit{\boldsymbol{value}}\rangle` and :math:`\langle\mathit{\boldsymbol{value}}\rangle`:math:`\langle\mathit{\boldsymbol{value}}\rangle` contrasts are not orthogonal.
(`errno` :math:`2`)
The :math:`\langle\mathit{\boldsymbol{value}}\rangle`:math:`\langle\mathit{\boldsymbol{value}}\rangle` contrast is not orthogonal to the mean.
.. _g04da-py2-py-notes:
**Notes**
`No equivalent traditional C interface for this routine exists in the NAG Library.`
In the analysis of designed experiments the first stage is to compute the basic analysis of variance table, the estimate of the error variance (the residual or error mean square), :math:`\hat{\sigma }^2`, and the (variance ratio) :math:`F`-statistic for the :math:`t` treatments.
If this :math:`F`-test is significant then the second stage of the analysis is to explore which treatments are significantly different.
If there is a structure to the treatments then this may lead to hypotheses that can be defined before the analysis and tested using linear contrasts.
For example, if the treatments were three different fixed temperatures, say :math:`18`, :math:`20` and :math:`22`, and an uncontrolled temperature (denoted by :math:`\mathrm{N}`) then the following contrasts might be of interest.
.. math::
\begin{array}{rrrrr}&18&20&22&\mathrm{N}\\&&&&\\\left(\mathrm{a}\right)&\frac{1}{3}&\frac{1}{3}&\frac{1}{3}&-1\\&&&&\\\left(\mathrm{b}\right)&-1&0&1&0\end{array}
The first represents the average difference between the controlled temperatures and the uncontrolled temperature.
The second represents the linear effect of an increasing fixed temperature.
For a randomized complete block design or a completely randomized design, let the treatment means be :math:`\hat{\tau }_i`, :math:`i = 1,2,\ldots,t`, and let the :math:`j`\ th contrast be defined by :math:`\lambda_{{ij}}`, :math:`i = 1,2,\ldots,t`, then the estimate of the contrast is simply:
.. math::
\Lambda_j = \sum_{{i = 1}}^t\hat{\tau }_i\lambda_{{ij}}
and the sum of squares for the contrast is:
.. math::
\mathrm{SS}_j = \frac{{\Lambda_j^2}}{{\sum_{{i = 1}}^t\lambda_{{ij}}^2/n_i}}
where :math:`n_i` is the number of observations for the :math:`i`\ th treatment.
Such a contrast has one degree of freedom so that the appropriate :math:`F`-statistic is :math:`\mathrm{SS}_j/\hat{\sigma }^2`.
The two contrasts :math:`\lambda_{{ij}}` and :math:`\lambda_{{ij^{\prime }}}` are orthogonal if :math:`\sum_{{i = 1}}^t\lambda_{{ij}}\lambda_{{ij^{\prime }}} = 0` and the contrast :math:`\lambda_{{ij}}` is orthogonal to the overall mean if :math:`\sum_{{i = 1}}^t\lambda_{{ij}} = 0`.
In practice these sums will be tested against a small quantity, :math:`\epsilon`.
If each of a set of contrasts is orthogonal to the mean and they are all mutually orthogonal then the contrasts provide a partition of the treatment sum of squares into independent components.
Hence the resulting :math:`F`-tests are independent.
If the treatments come from a design in which treatments are not orthogonal to blocks then the sum of squares for a contrast is given by:
.. math::
\mathrm{SS}_j = \frac{{\Lambda_j\Lambda_j^*}}{{\sum_{{i = 1}}^t\lambda_{{ij}}^2/n_i}}
where
.. math::
\Lambda_j^* = \sum_{{i = 1}}^t\tau_i^*\lambda_{{ij}}
with :math:`\tau_{\textit{i}}^*`, for :math:`\textit{i} = 1,2,\ldots,t`, being adjusted treatment means computed by first eliminating blocks then computing the treatment means from the block adjusted observations without taking into account the non-orthogonality between treatments and blocks.
For further details see John (1987).
.. _g04da-py2-py-references:
**References**
Cochran, W G and Cox, G M, 1957, `Experimental Designs`, Wiley
John, J A, 1987, `Cyclic Designs`, Chapman and Hall
Winer, B J, 1970, `Statistical Principles in Experimental Design`, McGraw--Hill
"""
raise NotImplementedError
[docs]def confidence(typ, tmean, rdf, c, clevel):
r"""
``confidence`` computes simultaneous confidence intervals for the differences between means.
It is intended for use after :meth:`random` or :meth:`rowcol`.
.. _g04db-py2-py-doc:
For full information please refer to the NAG Library document for g04db
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04dbf.html
.. _g04db-py2-py-parameters:
**Parameters**
**typ** : str, length 1
Indicates which method is to be used.
:math:`\mathrm{typ} = \texttt{'T'}`
The Tukey--Kramer method is used.
:math:`\mathrm{typ} = \texttt{'B'}`
The Bonferroni method is used.
:math:`\mathrm{typ} = \texttt{'D'}`
The Dunn--Sidak method is used.
:math:`\mathrm{typ} = \texttt{'L'}`
The Fisher LSD method is used.
:math:`\mathrm{typ} = \texttt{'S'}`
The Scheffe's method is used.
**tmean** : float, array-like, shape :math:`\left(\textit{nt}\right)`
The treatment means, :math:`\hat{\tau }_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,t`.
**rdf** : float
:math:`\nu`, the residual degrees of freedom.
**c** : float, array-like, shape :math:`\left(\textit{nt}, \textit{nt}\right)`
The strictly lower triangular part of :math:`\mathrm{c}` must contain the standard errors of the differences between the means as returned by :meth:`random` and :meth:`rowcol`. That is :math:`\mathrm{c}[i,j]`, :math:`i > j`, contains the standard error of the difference between the :math:`i`\ th and :math:`j`\ th mean in :math:`\mathrm{tmean}`.
**clevel** : float
The required confidence level for the computed intervals, (:math:`1-\alpha`).
**Returns**
**cil** : float, ndarray, shape :math:`\left(\textit{nt}\times \left(\textit{nt}-1\right)/2\right)`
The :math:`\left(\left(\textit{i}-1\right)\left(\textit{i}-2\right)/2+\textit{j}\right)`\ th element contains the lower limit to the confidence interval for the difference between :math:`\textit{i}`\ th and :math:`\textit{j}`\ th means in :math:`\mathrm{tmean}`, for :math:`\textit{j} = 1,2,\ldots,\textit{i}-1`, for :math:`\textit{i} = 2,3,\ldots,t`.
**ciu** : float, ndarray, shape :math:`\left(\textit{nt}\times \left(\textit{nt}-1\right)/2\right)`
The :math:`\left(\left(\textit{i}-1\right)\left(\textit{i}-2\right)/2+\textit{j}\right)`\ th element contains the upper limit to the confidence interval for the difference between :math:`\textit{i}`\ th and :math:`\textit{j}`\ th means in :math:`\mathrm{tmean}`, for :math:`\textit{j} = 1,2,\ldots,\textit{i}-1`, for :math:`\textit{i} = 2,3,\ldots,t`.
**isig** : int, ndarray, shape :math:`\left(\textit{nt}\times \left(\textit{nt}-1\right)/2\right)`
The :math:`\left(\left(\textit{i}-1\right)\left(\textit{i}-2\right)/2+\textit{j}\right)`\ th element indicates if the difference between :math:`\textit{i}`\ th and :math:`\textit{j}`\ th means in :math:`\mathrm{tmean}` is significant, for :math:`\textit{j} = 1,2,\ldots,\textit{i}-1`, for :math:`\textit{i} = 2,3,\ldots,t`. If the difference is significant then the returned value is :math:`1`; otherwise the returned value is :math:`0`.
.. _g04db-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{typ} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{typ} = \texttt{'T'}`, :math:`\texttt{'B'}`, :math:`\texttt{'D'}`, :math:`\texttt{'L'}` or :math:`\texttt{'S'}`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{clevel} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`0.0 < \mathrm{clevel} < 1.0`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{rdf} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{rdf}\geq 1.0`.
(`errno` :math:`1`)
On entry, :math:`\textit{nt} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nt}\geq 2`.
(`errno` :math:`2`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{c}[i-1,j-1] > 0.0`.
(`errno` :math:`3`)
There has been a failure in the computation of the studentized range statistic.
.. _g04db-py2-py-notes:
**Notes**
`In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`
In the computation of analysis of a designed experiment the first stage is to compute the basic analysis of variance table, the estimate of the error variance (the residual or error mean square), :math:`\hat{\sigma }^2`, the residual degrees of freedom, :math:`\nu`, and the (variance ratio) :math:`F`-statistic for the :math:`t` treatments.
The second stage of the analysis is to compare the treatment means.
If the treatments have no structure, for example the treatments are different varieties, rather than being structured, for example a set of different temperatures, then a multiple comparison procedure can be used.
A multiple comparison procedure looks at all possible pairs of means and either computes confidence intervals for the difference in means or performs a suitable test on the difference.
If there are :math:`t` treatments then there are :math:`t\left(t-1\right)/2` comparisons to be considered.
In tests the type :math:`1` error or significance level is the probability that the result is considered to be significant when there is no difference in the means.
If the usual :math:`t`-test is used with, say, a :math:`6\%` significance level then the type :math:`1` error for all :math:`k = t\left(t-1\right)/2` tests will be much higher.
If the tests were independent then if each test is carried out at the :math:`100\alpha` percent level then the overall type :math:`1` error would be :math:`\alpha^* = 1-\left(1-\alpha \right)^k\simeq k\alpha`.
In order to provide an overall protection the individual tests, or confidence intervals, would have to be carried out at a value of :math:`\alpha` such that :math:`\alpha^*` is the required significance level, e.g., five percent.
The :math:`100\left(1-\alpha \right)` percent confidence interval for the difference in two treatment means, :math:`\hat{\tau }_i` and :math:`\hat{\tau }_j` is given by
.. math::
\left(\hat{\tau }_i-\hat{\tau }_j\right)\pm T_{\left(\alpha, \nu, t\right)}^*se\left(\hat{\tau }_i-\hat{\tau }_j\right)\text{,}
where :math:`se\left(\right)` denotes the standard error of the difference in means and :math:`T_{\left(\alpha, \nu, t\right)}^*` is an appropriate percentage point from a distribution.
There are several possible choices for :math:`T_{\left(\alpha, \nu, t\right)}^*`.
These are:
(a) :math:`\frac{1}{2}q_{\left({1-\alpha }, \nu, t\right)}`, the studentized range statistic, see :meth:`stat.inv_cdf_studentized_range <naginterfaces.library.stat.inv_cdf_studentized_range>`. It is the appropriate statistic to compare the largest mean with the smallest mean. This is known as Tukey--Kramer method.
(#) :math:`t_{\left({\alpha /k}, \nu \right)}`, this is the Bonferroni method.
(#) :math:`t_{\left(\alpha_0, \nu \right)}`, where :math:`\alpha_0 = 1-\left(1-\alpha \right)^{{1/k}}`, this is known as the Dunn--Sidak method.
(#) :math:`t_{\left(\alpha, \nu \right)}`, this is known as Fisher's LSD (least significant difference) method. It should only be used if the overall :math:`F`-test is significant, the number of treatment comparisons is small and were planned before the analysis.
(#) :math:`\sqrt{\left(k-1\right)F_{{1-\alpha,k-1,\nu }}}` where :math:`F_{{1-\alpha,k-1,\nu }}` is the deviate corresponding to a lower tail probability of :math:`1-\alpha` from an :math:`F`-distribution with :math:`k-1` and :math:`\nu` degrees of freedom. This is Scheffe's method.
In cases (b), (c) and (d), :math:`t_{\left(\alpha, \nu \right)}` denotes the :math:`\alpha` two tail significance level for the Student's :math:`t`-distribution with :math:`\nu` degrees of freedom, see :meth:`stat.inv_cdf_students_t <naginterfaces.library.stat.inv_cdf_students_t>`.
The Scheffe method is the most conservative, followed closely by the Dunn--Sidak and Tukey--Kramer methods.
To compute a test for the difference between two means the statistic,
.. math::
\frac{{\hat{\tau }_i-\hat{\tau }_j}}{{se\left(\hat{\tau }_i-\hat{\tau }_j\right)}}
is compared with the appropriate value of :math:`T_{\left(\alpha, \nu, t\right)}^*`.
.. _g04db-py2-py-references:
**References**
Kotz, S and Johnson, N L (ed.), 1985, `Multiple range and associated test procedures`, Encyclopedia of Statistical Sciences (5), Wiley, New York
Kotz, S and Johnson, N L (ed.), 1985, `Multiple comparison`, Encyclopedia of Statistical Sciences (5), Wiley, New York
Winer, B J, 1970, `Statistical Principles in Experimental Design`, McGraw--Hill
"""
raise NotImplementedError
[docs]def dummyvars(typ, levels, ifact, v):
r"""
``dummyvars`` computes orthogonal polynomial or dummy variables for a factor or classification variable.
.. _g04ea-py2-py-doc:
For full information please refer to the NAG Library document for g04ea
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04eaf.html
.. _g04ea-py2-py-parameters:
**Parameters**
**typ** : str, length 1
The type of dummy variable to be computed.
If :math:`\mathrm{typ} = \texttt{'P'}`, an orthogonal Polynomial representation is computed.
If :math:`\mathrm{typ} = \texttt{'H'}`, a Helmert matrix representation is computed.
If :math:`\mathrm{typ} = \texttt{'F'}`, the contrasts relative to the First level are computed.
If :math:`\mathrm{typ} = \texttt{'L'}`, the contrasts relative to the Last level are computed.
If :math:`\mathrm{typ} = \texttt{'C'}`, a Complete set of dummy variables is computed.
**levels** : int
:math:`k`, the number of levels of the factor.
**ifact** : int, array-like, shape :math:`\left(n\right)`
The :math:`n` values of the factor.
**v** : float, array-like, shape :math:`\left(:\right)`
Note: the required length for this argument is determined as follows: if :math:`\mathrm{typ}=\texttt{'P'}`: :math:`\mathrm{levels}`; otherwise: :math:`1`.
If :math:`\mathrm{typ} = \texttt{'P'}`, the :math:`k` distinct values of the underlying variable for which the orthogonal polynomial is to be computed.
If :math:`\mathrm{typ} \neq \texttt{'P'}`, :math:`\mathrm{v}` is not referenced.
**Returns**
**x** : float, ndarray, shape :math:`\left(n, :\right)`
The :math:`n\times k^*` matrix of dummy variables, where :math:`k^* = k-1` if :math:`\mathrm{typ} = \texttt{'P'}`, :math:`\texttt{'H'}`, :math:`\texttt{'F'}` or :math:`\texttt{'L'}` and :math:`k^* = k` if :math:`\mathrm{typ} = \texttt{'C'}`.
**rep** : float, ndarray, shape :math:`\left(\mathrm{levels}\right)`
The number of replications for each level of the factor, :math:`r_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,k`.
.. _g04ea-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`1`)
On entry, :math:`\mathrm{typ} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{typ} = \texttt{'P'}`, :math:`\texttt{'H'}`, :math:`\texttt{'F'}`, :math:`\texttt{'L'}` or :math:`\texttt{'C'}`.
(`errno` :math:`1`)
On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{levels} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`n\geq \mathrm{levels}`.
(`errno` :math:`1`)
On entry, :math:`\mathrm{levels} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{levels}\geq 2`.
(`errno` :math:`2`)
On entry, not all values of :math:`\mathrm{v}` are distinct.
(`errno` :math:`2`)
On entry, not all levels are present in :math:`\mathrm{ifact}`.
(`errno` :math:`2`)
On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{ifact}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{levels} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`1\leq \mathrm{levels}[i-1]\leq \mathrm{levels}`.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`3`)
The :math:`\langle\mathit{\boldsymbol{value}}\rangle`:math:`\langle\mathit{\boldsymbol{value}}\rangle` polynomial has all elements zero.
.. _g04ea-py2-py-notes:
**Notes**
`In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.`
In the analysis of an experimental design using a general linear model the factors or classification variables that specify the design have to be coded as dummy variables. ``dummyvars`` computes dummy variables that can then be used in the fitting of the general linear model using :meth:`correg.linregm_fit <naginterfaces.library.correg.linregm_fit>`.
If the factor of length :math:`n` has :math:`k` levels then the simplest representation is to define :math:`k` dummy variables, :math:`X_j` such that :math:`X_j = 1` if the factor is at level :math:`j` and :math:`0` otherwise for :math:`j = 1,2,\ldots,k`.
However, there is usually a mean included in the model and the sum of the dummy variables will be aliased with the mean.
To avoid the extra redundant argument :math:`k-1` dummy variables can be defined as the contrasts between one level of the factor, the reference level, and the remaining levels.
If the reference level is the first level then the dummy variables can be defined as :math:`X_{\textit{j}} = 1` if the factor is at level :math:`\textit{j}` and :math:`0` otherwise, for :math:`\textit{j} = 2,3,\ldots,k`.
Alternatively, the last level can be used as the reference level.
A second way of defining the :math:`k-1` dummy variables is to use a Helmert matrix in which levels :math:`2,3,\ldots,k` are compared with the average effect of the previous levels.
For example if :math:`k = 4` then the contrasts would be:
.. math::
\begin{array}{rrrrr}1&&-1&-1&-1\\2&&1&-1&-1\\3&&0&2&-1\\4&&0&0&3\end{array}
Thus variable :math:`\textit{j}`, for :math:`\textit{j} = 1,2,\ldots,k-1` is given by
:math:`X_j = -1` if factor is at level less than :math:`j+1`
:math:`X_j = \sum_{{i = 1}}^jr_i/r_{{j+1}}` if factor is at level :math:`j+1`
:math:`X_j = 0` if factor is at level greater than :math:`j+1`
where :math:`r_j` is the number of replicates of level :math:`j`.
If the factor can be considered as a set of values from an underlying continuous variable then the factor can be represented by a set of :math:`k-1` orthogonal polynomials representing the linear, quadratic etc. effects of the underlying variable.
The orthogonal polynomial is computed using Forsythe's algorithm (Forsythe (1957), see also Cooper (1968)).
The values of the underlying continuous variable represented by the factor levels have to be supplied to the function.
The orthogonal polynomials are standardized so that the sum of squares for each dummy variable is one.
For the other methods integer (:math:`{\pm 1}`) representations are retained except that in the Helmert representation the code of level :math:`j+1` in dummy variable :math:`j` will be a fraction.
.. _g04ea-py2-py-references:
**References**
Cooper, B E, 1968, `Algorithm AS 10. The use of orthogonal polynomials`, Appl. Statist. (17), 283--287
Forsythe, G E, 1957, `Generation and use of orthogonal polynomials for data fitting with a digital computer`, J. Soc. Indust. Appl. Math. (5), 74--88
"""
raise NotImplementedError
[docs]def icc(mtype, score, rtype=1, smiss=None, alpha=0.05):
r"""
``icc`` calculates the intraclass correlation (ICC).
.. _g04ga-py2-py-doc:
For full information please refer to the NAG Library document for g04ga
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g04/g04gaf.html
.. _g04ga-py2-py-parameters:
**Parameters**
**mtype** : int
Indicates which model is to be used.
:math:`\mathrm{mtype} = 1`
The reliability study is a one-factor design, :math:`\text{ICC}\left(1,1\right)`.
:math:`\mathrm{mtype} = 2`
The reliability study is a random factorial design, :math:`\text{ICC}\left(2,1\right)`.
:math:`\mathrm{mtype} = 3`
The reliability study is a mixed factorial design, :math:`\text{ICC}\left(3,1\right)`.
**score** : float, array-like, shape :math:`\left(\textit{nrep}, \textit{nsubj}, \textit{nrater}\right)`
The matrix of scores, with :math:`\mathrm{score}[k-1,i-1,j-1]` being the score given to the :math:`i`\ th subject by the :math:`j`\ th rater in the :math:`k`\ th replicate.
If rater :math:`j` did not rate subject :math:`i` at replication :math:`k`, the corresponding element of :math:`\mathrm{score}`, :math:`\mathrm{score}[k-1,i-1,j-1]`, should be set to :math:`\mathrm{smiss}`.
**rtype** : int, optional
Indicates which type of reliability is required.
:math:`\mathrm{rtype} = 1`
Interrater reliability is required.
:math:`\mathrm{rtype} = 2`
Intrarater reliability is required.
**smiss** : None or float, optional
The value used to indicate a missing score.
Care should be taken in the selection of the value used to indicate a missing score. ``icc`` will treat any score in the inclusive range :math:`\left(1\pm 0.1^{\left(\texttt{machine.decimal_digits}-2\right)}\right)\times \mathrm{smiss}` as missing.
Alternatively, a NaN (Not A Number) can be used to indicate missing values, in which case the value of :math:`\mathrm{smiss}` and any missing values of :math:`\mathrm{score}` can be set through a call to :meth:`ieee.create_nan <naginterfaces.library.ieee.create_nan>`.
**alpha** : float, optional
:math:`\alpha`, the significance level used in the construction of the confidence intervals for :math:`\mathrm{icc}`.
**Returns**
**icc** : float
An estimate of the intraclass correlation to measure either the interrater reliability, :math:`\rho`, or intrarater reliability, :math:`\gamma`, as specified by :math:`\mathrm{mtype}` and :math:`\mathrm{rtype}`.
**lci** : float
An approximate lower limit for the :math:`100\left(1-\alpha \right)\%` confidence interval for the ICC.
**uci** : float
An approximate upper limit for the :math:`100\left(1-\alpha \% \right)` confidence interval for the ICC.
**fstat** : float
:math:`f`, the :math:`F`-statistic associated with :math:`\mathrm{icc}`.
**df1** : float
:math:`\nu_1` and :math:`\nu_2`, the degrees of freedom associated with :math:`f`.
**df2** : float
:math:`\nu_1` and :math:`\nu_2`, the degrees of freedom associated with :math:`f`.
**pvalue** : float
:math:`P\left(F\geq f:\nu_1,\nu_1\right)`, the upper tail probability from an :math:`F` distribution.
.. _g04ga-py2-py-errors:
**Raises**
**NagValueError**
(`errno` :math:`11`)
On entry, :math:`\mathrm{mtype} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{mtype} = 1`, :math:`2` or :math:`3`.
(`errno` :math:`21`)
On entry, :math:`\mathrm{rtype} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\mathrm{rtype} = 1` or :math:`2`.
(`errno` :math:`31`)
On entry, :math:`\textit{nrep} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nrep}\geq 1`.
(`errno` :math:`32`)
On entry, :math:`\textit{nrep} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: when :math:`\mathrm{mtype} = 2` or :math:`3` and :math:`\mathrm{rtype} = 2`, :math:`\textit{nrep}\geq 2`.
(`errno` :math:`33`)
On entry, after adjusting for missing data, :math:`\textit{nrep} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nrep}\geq 1`.
(`errno` :math:`34`)
On entry, after adjusting for missing data, :math:`\textit{nrep} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: when :math:`\mathrm{mtype} = 2` or :math:`3` and :math:`\mathrm{rtype} = 2`, :math:`\textit{nrep}\geq 2`.
(`errno` :math:`41`)
On entry, :math:`\textit{nsubj} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nsubj}\geq 2`.
(`errno` :math:`42`)
On entry, after adjusting for missing data, :math:`\textit{nsubj} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nsubj}\geq 2`.
(`errno` :math:`51`)
On entry, :math:`\textit{nrater} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nrater}\geq 2`.
(`errno` :math:`52`)
On entry, after adjusting for missing data, :math:`\textit{nrater} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{nrater}\geq 2`.
(`errno` :math:`61`)
Unable to calculate the ICC due to a division by zero.
This is often due to degenerate data, for example all scores being the same.
(`errno` :math:`71`)
On entry, :math:`\textit{mscore} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`\textit{mscore} = 1` or :math:`2`.
(`errno` :math:`91`)
On entry, :math:`\mathrm{alpha} = \langle\mathit{\boldsymbol{value}}\rangle`.
Constraint: :math:`0.0 < \mathrm{alpha} < 1.0`.
(`errno` :math:`92`)
On entry, :math:`\mathrm{alpha} = \langle\mathit{\boldsymbol{value}}\rangle`.
:math:`\mathrm{alpha}` is too close to either zero or one.
This error is unlikely to occur.
**Warns**
**NagAlgorithmicWarning**
(`errno` :math:`62`)
On entry, a replicate, subject or rater contained all missing data.
All output quantities have been calculated using the reduced problem size.
(`errno` :math:`101`)
:math:`\mathrm{icc}` does not fall into the interval :math:`\left[\mathrm{lci},\mathrm{uci}\right]`.
All output quantities have been calculated.
(`errno` :math:`102`)
The estimate of at least one variance component was negative.
Negative estimates were set to zero and all output quantities calculated as documented.
.. _g04ga-py2-py-notes:
**Notes**
Many scientific investigations involve assigning a value (score) to a number of objects of interest (subjects).
In most instances the method used to score the subject will be affected by measurement error which can affect the analysis and interpretation of the data.
When the score is based on the subjective opinion of one or more individuals (raters) the measurement error can be high and, therefore, it is important to be able to assess its magnitude.
One way of doing this is to run a reliability study and calculate the intraclass correlation (ICC).
In a typical reliability study each of a random sample of :math:`n_s` subjects are scored, independently, by :math:`n_r` raters.
Each rater scores the same subject :math:`m` times (i.e., there are :math:`m` replicate scores).
The scores, :math:`y_{{\textit{i}\textit{j}\textit{k}}}`, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,n_r`, for :math:`\textit{i} = 1,2,\ldots,n_s` can be arranged into :math:`m` data tables, with the :math:`n_s` rows of the table, labelled :math:`1,2,\ldots,n_s`, corresponding to the subjects and the :math:`n_r` columns of the table, labelled :math:`1,2,\ldots,n_r`, to the raters.
For example the following data, taken from Shrout and Fleiss (1979), shows a typical situation where four raters (:math:`n_r = 4`) have scored six subjects (:math:`n_s = 6`) once, i.e., there has been no replication (:math:`m = 1`).
[table omitted]
The term intraclass correlation is a general one and can mean either a measure of interrater reliability, i.e., a measure of how similar the raters are, or intrarater reliability, i.e., a measure of how consistent each rater is.
There are a numerous different versions of the ICC, six of which can be calculated using ``icc``.
The different versions of the ICC can lead to different conclusions when applied to the same data, it is, therefore, essential to choose the most appropriate based on the design of the reliability study and whether inter - or intrarater reliability is of interest.
The six measures of the ICC are split into three different types of studies, denoted: :math:`\text{ICC}\left(1,1\right)`, :math:`\text{ICC}\left(2,1\right)` and :math:`\text{ICC}\left(3,1\right)`.
This notation ties up with that used by Shrout and Fleiss (1979).
Each class of study results in two forms of the ICC, depending on whether inter - or intrarater reliability is of interest.
:math:`\text{ICC}\left(1,1\right)` **: One-Factor Design**
The one-factor designs differ, depending on whether inter - or intrarater reliability is of interest:
`Interrater reliability`
In a one-factor design to measure interrater reliability, each subject is scored by a different set of raters randomly selected from a larger population of raters.
Therefore, even though they use the same set of labels each row of the data table is associated with a different set of raters.
A model of the following form is assumed:
.. math::
y_{{ijk}} = \mu +s_i+\epsilon_{{ijk}}
where :math:`s_i` is the subject effect and :math:`\epsilon_{{ijk}}` is the error term, with :math:`s_i\sim N\left(0,\sigma_s^2\right)` and :math:`\epsilon_{{ijk}}\sim N\left(0,\sigma_{\epsilon }^2\right)`.
The measure of the interrater reliability, :math:`\rho`, is then given by:
.. math::
\rho = \frac{\hat{\sigma }_s^2}{{\hat{\sigma }_s^2+\hat{\sigma }_{\epsilon }^2}}
where :math:`\hat{\sigma }_s` and :math:`\hat{\sigma }_{\epsilon }` are the estimated values of :math:`\sigma_s` and :math:`\sigma_{\epsilon }` respectively.
`Intrarater reliability`
In a one-factor design to measure intrarater reliability, each rater scores a different set of subjects.
Therefore, even though they use the same set of labels, each column of the data table is associated with a different set of subjects.
A model of the following form is assumed:
.. math::
y_{{ijk}} = \mu +r_j+\epsilon_{{ijk}}
where :math:`r_i` is the rater effect and :math:`\epsilon_{{ijk}}` is the error term, with :math:`r_j\sim N\left(0,\sigma_r^2\right)` and :math:`\epsilon_{{ijk}}\sim N\left(0,\sigma_{\epsilon }^2\right)`.
The measure of the intrarater reliability, :math:`\gamma`, is then given by:
.. math::
\gamma = \frac{\hat{\sigma }_r^2}{{\hat{\sigma }_r^2+\hat{\sigma }_{\epsilon }^2}}
where :math:`\hat{\sigma }_r` and :math:`\hat{\sigma }_{\epsilon }` are the estimated values of :math:`\sigma_r` and :math:`\sigma_{\epsilon }` respectively.
:math:`\text{ICC}\left(2,1\right)` **: Random Factorial Design**
In a random factorial design, each subject is scored by the same set of raters.
The set of raters have been randomly selected from a larger population of raters.
A model of the following form is assumed:
.. math::
y_{{ijk}} = \mu +s_i+r_j+\left(sr\right)_{{ij}}+\epsilon_{{ijk}}
where :math:`s_i` is the subject effect, :math:`r_i` is the rater effect, :math:`\left(sr\right)_{{ij}}` is the subject-rater interaction effect and :math:`\epsilon_{{ijk}}` is the error term, with :math:`s_i\sim N\left(0,\sigma_s^2\right)`, :math:`r_j\sim N\left(0,\sigma_r^2\right)`, :math:`\left(sr\right)_{{ij}}\sim N\left(0,\sigma_{{sr}}^2\right)` and :math:`\epsilon_{{ijk}}\sim N\left(0,\sigma_{\epsilon }^2\right)`.
`Interrater reliability`
The measure of the interrater reliability, :math:`\rho`, is given by:
.. math::
\rho = \frac{\hat{\sigma }_s^2}{{\hat{\sigma }_s^2+\hat{\sigma }_r^2+\hat{\sigma }_{{sr}}^2+\hat{\sigma }_{\epsilon }^2}}
where :math:`\hat{\sigma }_s`, :math:`\hat{\sigma }_r`, :math:`\hat{\sigma }_{{sr}}` and :math:`\hat{\sigma }_{\epsilon }` are the estimated values of :math:`\sigma_s`, :math:`\sigma_r`, :math:`\sigma_{{sr}}` and :math:`\sigma_{\epsilon }` respectively.
`Intrarater reliability`
The measure of the intrarater reliability, :math:`\gamma`, is given by:
.. math::
\gamma = \frac{\hat{\sigma }_r^2}{{\hat{\sigma }_s^2+\hat{\sigma }_r^2+\hat{\sigma }_{{sr}}^2+\hat{\sigma }_{\epsilon }^2}}
where :math:`\hat{\sigma }_s`, :math:`\hat{\sigma }_r`, :math:`\hat{\sigma }_{{sr}}` and :math:`\hat{\sigma }_{\epsilon }` are the estimated values of :math:`\sigma_s`, :math:`\sigma_r`, :math:`\sigma_{{sr}}` and :math:`\sigma_{\epsilon }` respectively.
:math:`\text{ICC}\left(3,1\right)` **: Mixed Factorial Design**
In a mixed factorial design, each subject is scored by the same set of raters and these are the only raters of interest.
A model of the following form is assumed:
.. math::
y_{{ijk}} = \mu +s_i+r_j+\left(sr\right)_{{ij}}+\epsilon_{{ijk}}
where :math:`s_i` is the subject effect, :math:`r_i` is the fixed rater effect, :math:`\left(sr\right)_{{ij}}` is the subject-rater interaction effect and :math:`\epsilon_{{ijk}}` is the error term, with :math:`s_i\sim N\left(0,\sigma_s^2\right)`, :math:`Σ_{{j = 1}}^{n_r}r_j = 0`, :math:`\left(sr\right)_{{ij}}\sim N\left(0,\sigma_{{sr}}^2\right)`, :math:`Σ_{{j = 1}}^{n_r}\left(sr\right)_{{ij}} = 0` and :math:`\epsilon_{{ijk}}\sim N\left(0,\sigma_{\epsilon }^2\right)`.
`Interrater reliability`
The measure of the interrater reliability, :math:`\rho`, is then given by:
.. math::
\rho = \frac{{\hat{\sigma }_s^2-\hat{\sigma }_{{sr}}^2/\left(r-1\right)}}{{\hat{\sigma }_s^2+\hat{\sigma }_{{sr}}^2+\hat{\sigma }_{\epsilon }^2}}
where :math:`\hat{\sigma }_s`, :math:`\hat{\sigma }_{{sr}}` and :math:`\hat{\sigma }_{\epsilon }` are the estimated values of :math:`\sigma_s`, :math:`\sigma_{{sr}}` and :math:`\sigma_{\epsilon }` respectively.
`Intrarater reliability`
The measure of the intrarater reliability, :math:`\gamma`, is then given by:
.. math::
\gamma = \frac{{\hat{\sigma }_s^2+\hat{\sigma }_{{sr}}^2}}{{\hat{\sigma }_s^2+\hat{\sigma }_{{sr}}^2+\hat{\sigma }_{\epsilon }^2}}
where :math:`\hat{\sigma }_s`, :math:`\hat{\sigma }_{{sr}}` and :math:`\hat{\sigma }_{\epsilon }` are the estimated values of :math:`\sigma_s`, :math:`\sigma_{{sr}}` and :math:`\sigma_{\epsilon }` respectively.
As well as an estimate of the ICC, ``icc`` returns an approximate :math:`\left(1-\alpha \right)\%` confidence interval for the ICC and an :math:`F`-statistic, :math:`f`, associated degrees of freedom (:math:`\nu_1` and :math:`\nu_2`) and p-value, :math:`p`, for testing that the ICC is zero.
Details on the formula used to calculate the confidence interval, :math:`f`, :math:`\nu_1`, :math:`\nu_2`, :math:`\hat{\sigma }_s^2`, :math:`\hat{\sigma }_r^2`, :math:`\hat{\sigma }_{{sr}}^2` and :math:`\hat{\sigma }_{\epsilon }^2` are given in Gwet (2014).
In the case where there are no missing data these should tie up with the formula presented in Shrout and Fleiss (1979).
In some circumstances, the formula presented in Gwet (2014) for calculating :math:`\hat{\sigma }_s^2`, :math:`\hat{\sigma }_r^2`, :math:`\hat{\sigma }_{{sr}}^2` and :math:`\hat{\sigma }_{\epsilon }^2` can result in a negative value being calculated.
In such instances, :math:`\mathrm{errno}` = 102, the offending estimate is set to zero and the calculations continue as normal.
It should be noted that Shrout and Fleiss (1979) also present methods for calculating the ICC based on average scores, denoted :math:`\text{ICC}\left(1,k\right)`, :math:`\text{ICC}\left(2,k\right)` and :math:`\text{ICC}\left(3,k\right)`.
These are not supplied here as multiple replications are allowed (:math:`m > 1`) hence there is no need to average the scores prior to calculating ICC when using ``icc``.
.. _g04ga-py2-py-references:
**References**
Gwet, K L, 2014, `Handbook of Inter-rater Reliability`, Fourth Edition, Advanced Analytics LLC
Shrout, P E and Fleiss, J L, 1979, `Intraclass Correlations: Uses in Assessing Rater Reliability`, Pyschological Bulletin, Vol 86 (2), 420--428
See Also
--------
:meth:`naginterfaces.library.examples.anova.icc_ex.main`
"""
raise NotImplementedError