Source code for naginterfaces.library.correg

# -*- coding: utf-8 -*-
r"""
Module Summary
--------------
Interfaces for the NAG Mark 30.2 `correg` Chapter.

``correg`` - Correlation and Regression Analysis

This module is concerned with two techniques

(i) correlation analysis and

(#) regression modelling,

both of which are concerned with determining the inter-relationships among two or more variables.

Other modules of the NAG Library which cover similar problems are submodule :mod:`~naginterfaces.library.fit` and submodule :mod:`~naginterfaces.library.opt`. Submodule :mod:`~naginterfaces.library.fit` functions may be used to fit linear models by criteria other than least squares and also for polynomial regression; submodule :mod:`~naginterfaces.library.opt` functions may be used to fit nonlinear models and linearly constrained linear models.

See Also
--------
``naginterfaces.library.examples.correg`` :
    This subpackage contains examples for the ``correg`` module.
    See also the :ref:`library_correg_ex` subsection.

Functionality Index
-------------------

**Correlation-like coefficients**

  all variables

    casewise treatment of missing values: :meth:`coeffs_zero_miss_case`

    no missing values: :meth:`coeffs_zero`

    pairwise treatment of missing values: :meth:`coeffs_zero_miss_pair`

  subset of variables

    casewise treatment of missing values: :meth:`coeffs_zero_subset_miss_case`

    no missing values: :meth:`coeffs_zero_subset`

    pairwise treatment of missing values: :meth:`coeffs_zero_subset_miss_pair`

**Generalized linear models**

  binomial errors: :meth:`glm_binomial`

  computes estimable function: :meth:`glm_estfunc`

  gamma errors: :meth:`glm_gamma`

  Normal errors: :meth:`glm_normal`

  Poisson errors: :meth:`glm_poisson`

  prediction: :meth:`glm_predict`

  transform model parameters: :meth:`glm_constrain`

**Hierarchical mixed effects regression**

  initiation: :meth:`mixeff_hier_init`

  using maximum likelihood: :meth:`mixeff_hier_ml`

  using restricted maximum likelihood: :meth:`mixeff_hier_reml`

**Least angle regression (includes LASSO)**

  Additional parameter calculation: :meth:`lars_param`

  Model fitting

    Cross-product matrix: :meth:`lars_xtx`

    Raw data: :meth:`lars`

**Linear mixed effects regression**

  fitting (via REML or ML): :meth:`lmm_fit`

  initiation: :meth:`lmm_init`

  initiation, combine: :meth:`lmm_init_combine`

  via maximum likelihood (ML): :meth:`mixeff_ml`

  via restricted maximum likelihood (REML): :meth:`mixeff_reml`

**Multiple linear regression**

  from correlation coefficients: :meth:`linregm_coeffs_const`

  from correlation-like coefficients: :meth:`linregm_coeffs_noconst`

**Multiple linear regression/General linear model**

  add/delete observation from model: :meth:`linregm_obs_edit`

  add independent variable to model: :meth:`linregm_var_add`

  computes estimable function: :meth:`linregm_estfunc`

  delete independent variable from model: :meth:`linregm_var_del`

  general linear regression model: :meth:`linregm_fit`

  regression for new dependent variable: :meth:`linregm_fit_newvar`

  regression parameters from updated model: :meth:`linregm_update`

  transform model parameters: :meth:`linregm_constrain`

**Nearest correlation matrix**

  fixed elements: :meth:`corrmat_fixed`

  fixed submatrix: :meth:`corrmat_shrinking`

  :math:`k`-factor structure: :meth:`corrmat_nearest_kfactor`

  method of Qi and Sun

    element-wise weights: :meth:`corrmat_h_weight`

    unweighted, unbounded: :meth:`corrmat_nearest`

    weighted norm: :meth:`corrmat_nearest_bounded`

  rank-constrained: :meth:`corrmat_nearest_rank`

  shrinkage method: :meth:`corrmat_target`

**Non-parametric rank correlation (Kendall and/or Spearman)**

  missing values

    casewise treatment of missing values

      overwriting input data: :meth:`coeffs_kspearman_miss_case_overwrite`

      preserving input data: :meth:`coeffs_kspearman_miss_case`

    pairwise treatment of missing values: :meth:`coeffs_kspearman_miss_pair`

  no missing values

    overwriting input data: :meth:`coeffs_kspearman_overwrite`

    preserving input data: :meth:`coeffs_kspearman`

**Partial least squares**

  calculates predictions given an estimated PLS model: :meth:`pls_pred`

  fits a PLS model for a given number of factors: :meth:`pls_fit`

  orthogonal scores using SVD: :meth:`pls_svd`

  orthogonal scores using Wold's method: :meth:`pls_wold`

**Product-moment correlation**

  correlation coefficients, all variables

    casewise treatment of missing values: :meth:`coeffs_pearson_miss_case`

    no missing values: :meth:`coeffs_pearson`

    pairwise treatment of missing values: :meth:`coeffs_pearson_miss_pair`

  correlation coefficients, subset of variables

    casewise treatment of missing values: :meth:`coeffs_pearson_subset_miss_case`

    no missing values: :meth:`coeffs_pearson_subset`

    pairwise treatment of missing values: :meth:`coeffs_pearson_subset_miss_pair`

  correlation matrix

    compute correlation and covariance matrices: :meth:`corrmat`

    compute from sum of squares matrix: :meth:`ssqmat_to_corrmat`

    compute partial correlation and covariance matrices: :meth:`corrmat_partial`

  sum of squares matrix

    combine: :meth:`ssqmat_combine`

    compute: :meth:`ssqmat`

    update: :meth:`ssqmat_update`

**Quantile regression**

  linear

    comprehensive: :meth:`quantile_linreg`

    simple: :meth:`quantile_linreg_easy`

**Residuals**

  Durbin--Watson test: :meth:`linregm_stat_durbwat`

  standardized residuals and influence statistics: :meth:`linregm_stat_resinf`

**Ridge regression**

  ridge parameter(s) supplied: :meth:`ridge`

  ridge parameter optimized: :meth:`ridge_opt`

**Robust correlation**

  Huber's method: :meth:`robustm_corr_huber`

  user-supplied weight function only: :meth:`robustm_corr_user`

  user-supplied weight function plus derivatives: :meth:`robustm_corr_user_deriv`

**Robust regression**

  compute weights for use with :meth:`robustm_user`: :meth:`robustm_wts`

  standard :math:`M`-estimates: :meth:`robustm`

  user-supplied weight functions: :meth:`robustm_user`

  variance-covariance matrix following :meth:`robustm_user`: :meth:`robustm_user_varmat`

**Selecting regression model**

  all possible regressions: :meth:`linregm_rssq`

  forward selection: :meth:`linregm_fit_onestep`

  :math:`R^2` and :math:`C_p` statistics: :meth:`linregm_rssq_stat`

**Service functions**

  for multiple linear regression

    reorder elements from vectors and matrices: :meth:`linregm_service_reorder`

    select elements from vectors and matrices: :meth:`linregm_service_select`

  general option getting function: :meth:`optget`

  general option setting function: :meth:`optset`

**Simple linear regression**

  no intercept: :meth:`linregs_noconst`

  no intercept with missing values: :meth:`linregs_noconst_miss`

  with intercept: :meth:`linregs_const`

  with intercept and with missing values: :meth:`linregs_const_miss`

**Stepwise linear regression**

  Clarke's sweep algorithm: :meth:`linregm_fit_stepwise`

For full information please refer to the NAG Library document

https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02intro.html
"""

# NAG Copyright 2017-2024.

[docs]def corrmat_nearest(g, errtol=0.0, maxits=0, maxit=0): r""" ``corrmat_nearest`` computes the nearest correlation matrix, in the Frobenius norm, to a given square, input matrix. .. _g02aa-py2-py-doc: For full information please refer to the NAG Library document for g02aa https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02aaf.html .. _g02aa-py2-py-parameters: **Parameters** **g** : float, array-like, shape :math:`\left(n, n\right)` :math:`G`, the initial matrix. **errtol** : float, optional The termination tolerance for the Newton iteration. If :math:`\mathrm{errtol}\leq 0.0`, :math:`n\times \sqrt{\text{machine precision}}` is used. **maxits** : int, optional :math:`\mathrm{maxits}` specifies the maximum number of iterations used for the iterative scheme used to solve the linear algebraic equations at each Newton step. If :math:`\mathrm{maxits}\leq 0`, :math:`100` is used. **maxit** : int, optional Specifies the maximum number of Newton iterations. If :math:`\mathrm{maxit}\leq 0`, :math:`200` is used. **Returns** **x** : float, ndarray, shape :math:`\left(n, n\right)` Contains the nearest correlation matrix. **itera** : int The number of Newton steps taken. **feval** : int The number of function evaluations of the dual problem. **nrmgrd** : float The norm of the gradient of the last Newton step. .. _g02aa-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 0`. (`errno` :math:`2`) Newton iteration fails to converge in :math:`\langle\mathit{\boldsymbol{value}}\rangle` iterations. (`errno` :math:`4`) An intermediate eigenproblem could not be solved. This should not occur. Please contact `NAG <https://www.nag.com>`__ with details of your call. **Warns** **NagAlgorithmicWarning** (`errno` :math:`3`) Machine precision is limiting convergence. The array returned in :math:`\mathrm{x}` may still be of interest. .. _g02aa-py2-py-notes: **Notes** A correlation matrix may be characterised as a real square matrix that is symmetric, has a unit diagonal and is positive semidefinite. ``corrmat_nearest`` applies an inexact Newton method to a dual formulation of the problem, as described by Qi and Sun (2006). It applies the improvements suggested by Borsdorf and Higham (2010). .. _g02aa-py2-py-references: **References** Borsdorf, R and Higham, N J, 2010, `A preconditioned (Newton) algorithm for the nearest correlation matrix`, IMA Journal of Numerical Analysis (30(1)), 94--107 Qi, H and Sun, D, 2006, `A quadratically convergent Newton method for computing the nearest correlation matrix`, SIAM J. Matrix AnalAppl (29(2)), 360--385 See Also -------- :meth:`naginterfaces.library.examples.correg.corrmat_nearest_ex.main` """ raise NotImplementedError
[docs]def corrmat_nearest_bounded(g, opt, alpha=None, w=None, errtol=0.0, maxits=0, maxit=0): r""" ``corrmat_nearest_bounded`` computes the nearest correlation matrix, in the Frobenius norm or weighted Frobenius norm, and optionally with bounds on the eigenvalues, to a given square, input matrix. .. _g02ab-py2-py-doc: For full information please refer to the NAG Library document for g02ab https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02abf.html .. _g02ab-py2-py-parameters: **Parameters** **g** : float, array-like, shape :math:`\left(n, n\right)` :math:`G`, the initial matrix. **opt** : str, length 1 Indicates the problem to be solved. :math:`\mathrm{opt} = \texttt{'A'}` The lower bound problem is solved. :math:`\mathrm{opt} = \texttt{'W'}` The weighted norm problem is solved. :math:`\mathrm{opt} = \texttt{'B'}` Both problems are solved. **alpha** : None or float, optional The value of :math:`\alpha`. If :math:`\mathrm{opt} = \texttt{'W'}`, :math:`\mathrm{alpha}` need not be set. **w** : None or float, array-like, shape :math:`\left(:\right)`, optional Note: the required length for this argument is determined as follows: if :math:`\mathrm{opt}\neq\texttt{'A'}`: :math:`n`; otherwise: :math:`0`. The square roots of the diagonal elements of :math:`W`, that is the diagonal of :math:`W^{\frac{1}{2}}`. If :math:`\mathrm{opt} = \texttt{'A'}`, :math:`\mathrm{w}` is not referenced and may be **None**. **errtol** : float, optional The termination tolerance for the Newton iteration. If :math:`\mathrm{errtol}\leq 0.0`, :math:`n\times \sqrt{\text{machine precision}}` is used. **maxits** : int, optional Specifies the maximum number of iterations to be used by the iterative scheme to solve the linear algebraic equations at each Newton step. If :math:`\mathrm{maxits}\leq 0`, :math:`2\times n` is used. **maxit** : int, optional Specifies the maximum number of Newton iterations. If :math:`\mathrm{maxit}\leq 0`, :math:`200` is used. **Returns** **x** : float, ndarray, shape :math:`\left(n, n\right)` Contains the nearest correlation matrix. **itera** : int The number of Newton steps taken. **feval** : int The number of function evaluations of the dual problem. **nrmgrd** : float The norm of the gradient of the last Newton step. .. _g02ab-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 0`. (`errno` :math:`1`) On entry, the value of :math:`\mathrm{opt}` is invalid. Constraint: :math:`\mathrm{opt} = \texttt{'A'}`, :math:`\texttt{'W'}` or :math:`\texttt{'B'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{alpha} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0.0 < \mathrm{alpha} < 1.0`. (`errno` :math:`1`) On entry, all elements of :math:`\mathrm{w}` were not positive. Constraint: :math:`\mathrm{w}[\textit{i}-1] > 0.0`, for all :math:`i`. (`errno` :math:`2`) Newton iteration fails to converge in :math:`\langle\mathit{\boldsymbol{value}}\rangle` iterations. Increase :math:`\mathrm{maxit}` or check the call to the function. **Warns** **NagAlgorithmicWarning** (`errno` :math:`3`) The machine precision is limiting convergence. In this instance the returned value of :math:`\mathrm{x}` may be useful. (`errno` :math:`4`) An intermediate eigenproblem could not be solved. This should not occur. Please contact `NAG <https://www.nag.com>`__ with details of your call. .. _g02ab-py2-py-notes: **Notes** Finds the nearest correlation matrix :math:`X` by minimizing :math:`\frac{1}{2}\left\lVert G-X\right\rVert^2` where :math:`G` is an approximate correlation matrix. The norm can either be the Frobenius norm or the weighted Frobenius norm :math:`\frac{1}{2}\left\lVert W^{\frac{1}{2}}\left(G-X\right)W^{\frac{1}{2}}\right\rVert_F^2`. You can optionally specify a lower bound on the eigenvalues, :math:`\alpha`, of the computed correlation matrix, forcing the matrix to be positive definite, :math:`0 < \alpha < 1`. Note that if the weights vary by several orders of magnitude from one another the algorithm may fail to converge. .. _g02ab-py2-py-references: **References** Borsdorf, R and Higham, N J, 2010, `A preconditioned (Newton) algorithm for the nearest correlation matrix`, IMA Journal of Numerical Analysis (30(1)), 94--107 Qi, H and Sun, D, 2006, `A quadratically convergent Newton method for computing the nearest correlation matrix`, SIAM J. Matrix AnalAppl (29(2)), 360--385 """ raise NotImplementedError
[docs]def corrmat_nearest_kfactor(g, k, errtol=0.0, maxit=0): r""" ``corrmat_nearest_kfactor`` computes the factor loading matrix associated with the nearest correlation matrix with :math:`k`-factor structure, in the Frobenius norm, to a given square, input matrix. .. _g02ae-py2-py-doc: For full information please refer to the NAG Library document for g02ae https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02aef.html .. _g02ae-py2-py-parameters: **Parameters** **g** : float, array-like, shape :math:`\left(n, n\right)` :math:`G`, the initial matrix. **k** : int :math:`k`, the number of factors and columns of :math:`X`. **errtol** : float, optional The termination tolerance for the projected gradient norm. See references for further details. If :math:`\mathrm{errtol}\leq 0.0`, :math:`0.01` is used. This is often a suitable default value. **maxit** : int, optional Specifies the maximum number of iterations in the spectral projected gradient method. If :math:`\mathrm{maxit}\leq 0`, :math:`40000` is used. **Returns** **x** : float, ndarray, shape :math:`\left(n, \mathrm{k}\right)` Contains the matrix :math:`X`. **itera** : int The number of steps taken in the spectral projected gradient method. **feval** : int The number of evaluations of :math:`\left\lVert G-XX^\mathrm{T}+\mathrm{diag}\left(XX^\mathrm{T}-I\right)\right\rVert_F`. **nrmpgd** : float The norm of the projected gradient at the final iteration. .. _g02ae-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{k} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0 < \mathrm{k}\leq n`. (`errno` :math:`2`) Spectral gradient method fails to converge in :math:`\langle\mathit{\boldsymbol{value}}\rangle` iterations. .. _g02ae-py2-py-notes: **Notes** A correlation matrix :math:`C` with :math:`k`-factor structure may be characterised as a real square matrix that is symmetric, has a unit diagonal, is positive semidefinite and can be written as :math:`C = XX^\mathrm{T}+\mathrm{diag}\left(I-XX^\mathrm{T}\right)`, where :math:`I` is the identity matrix and :math:`X` has :math:`n` rows and :math:`k` columns. :math:`X` is often referred to as the factor loading matrix. ``corrmat_nearest_kfactor`` applies a spectral projected gradient method to the modified problem :math:`\mathrm{min}\left(\left\lVert G-XX^\mathrm{T}+\mathrm{diag}\left(XX^\mathrm{T}-I\right)\right\rVert \right)_F` such that :math:`\left\lVert x_{\textit{i}}^\mathrm{T}\right\rVert_2\leq 1`, for :math:`\textit{i} = 1,2,\ldots,n`, where :math:`x_i` is the :math:`i`\ th row of the factor loading matrix, :math:`X`, which gives us the solution. .. _g02ae-py2-py-references: **References** Birgin, E G, Martínez, J M and Raydan, M, 2001, `Algorithm 813: SPG--software for convex-constrained optimization`, ACM Trans. Math. Software (27), 340--349 Borsdorf, R, Higham, N J and Raydan, M, 2010, `Computing a nearest correlation matrix with factor structure`, SIAM J. Matrix Anal. Appl. (31(5)), 2603--2622 """ raise NotImplementedError
[docs]def corrmat_h_weight(g, alpha, h, errtol=0.0, maxit=0): r""" ``corrmat_h_weight`` computes the nearest correlation matrix, using element-wise weighting in the Frobenius norm and optionally with bounds on the eigenvalues, to a given square, input matrix. .. _g02aj-py2-py-doc: For full information please refer to the NAG Library document for g02aj https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02ajf.html .. _g02aj-py2-py-parameters: **Parameters** **g** : float, array-like, shape :math:`\left(n, n\right)` :math:`G`, the initial matrix. **alpha** : float The value of :math:`\alpha`. If :math:`\mathrm{alpha} < 0.0`, :math:`0.0` is used. **h** : float, array-like, shape :math:`\left(n, n\right)` The matrix of weights :math:`H`. **errtol** : float, optional The termination tolerance for the iteration. If :math:`\mathrm{errtol}\leq 0.0`, :math:`n\times \sqrt{\text{machine precision}}` is used. See `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02ajf.html#accuracy>`__ for further details. **maxit** : int, optional Specifies the maximum number of iterations to be used. If :math:`\mathrm{maxit}\leq 0`, :math:`200` is used. **Returns** **x** : float, ndarray, shape :math:`\left(n, n\right)` Contains the nearest correlation matrix. **itera** : int The number of iterations taken. **norm** : float The value of :math:`\left\lVert H\circ \left(G-X\right)\right\rVert_F` after the final iteration. .. _g02aj-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 0`. (`errno` :math:`5`) On entry, :math:`\mathrm{alpha} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{alpha} < 1.0`. (`errno` :math:`6`) On entry, one or more of the off-diagonal elements of :math:`H` were negative. (`errno` :math:`7`) Function failed to converge in :math:`\langle\mathit{\boldsymbol{value}}\rangle` iterations. Increase :math:`\mathrm{maxit}` or check the call to the function. **Warns** **NagAlgorithmicWarning** (`errno` :math:`8`) Failure to solve intermediate eigenproblem. This should not occur. Please contact `NAG <https://www.nag.com>`__ with details of your call. .. _g02aj-py2-py-notes: **Notes** ``corrmat_h_weight`` finds the nearest correlation matrix, :math:`X`, to an approximate correlation matrix, :math:`G`, using element-wise weighting, this minimizes :math:`\left\lVert H\circ \left(G-X\right)\right\rVert_F`, where :math:`C = A\circ B` denotes the matrix :math:`C` with elements :math:`C_{{ij}} = A_{{ij}}\times B_{{ij}}`. You can optionally specify a lower bound on the eigenvalues, :math:`\alpha`, of the computed correlation matrix, forcing the matrix to be strictly positive definite, if :math:`0 < \alpha < 1`. Zero elements in :math:`H` should be used when you wish to put no emphasis on the corresponding element of :math:`G`. The algorithm scales :math:`H` so that the maximum element is :math:`1`. It is this scaled matrix that is used in computing the norm above and for the stopping criteria described in `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02ajf.html#accuracy>`__. Note that if the elements in :math:`H` vary by several orders of magnitude from one another the algorithm may fail to converge. .. _g02aj-py2-py-references: **References** Borsdorf, R and Higham, N J, 2010, `A preconditioned (Newton) algorithm for the nearest correlation matrix`, IMA Journal of Numerical Analysis (30(1)), 94--107 Jiang, K, Sun, D and Toh, K-C, 2012, `An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP`, SIAM J. Optim. (22(3)), 1042--1064 Qi, H and Sun, D, 2006, `A quadratically convergent Newton method for computing the nearest correlation matrix`, SIAM J. Matrix AnalAppl (29(2)), 360--385 """ raise NotImplementedError
[docs]def corrmat_nearest_rank(g, rank, errtol=0.0, ranktol=0.0, maxits=0, maxit=0): r""" ``corrmat_nearest_rank`` computes the nearest correlation matrix of maximum prescribed rank, in the Frobenius norm, to a given square, input matrix. .. _g02ak-py2-py-doc: For full information please refer to the NAG Library document for g02ak https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02akf.html .. _g02ak-py2-py-parameters: **Parameters** **g** : float, array-like, shape :math:`\left(n, n\right)` :math:`\tilde{G}`, the initial matrix. **rank** : int :math:`r`, the upper bound for the rank of :math:`X`. **errtol** : float, optional The termination tolerance for the convergence measure of the objective function value. If :math:`\mathrm{errtol}\leq 0.0`, then a value of :math:`1.0e-5` is used. See `Algorithmic Details <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02akf.html#algdetails>`__ for further details. **ranktol** : float, optional The feasibility tolerance for the rank constraint. If :math:`\mathrm{ranktol}\leq 0.0`, :math:`\sqrt{\text{machine precision}}` is used. See `Algorithmic Details <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02akf.html#algdetails>`__ for further details. **maxits** : int, optional Specifies the maximum number of iterations used for the majorization approach to solve penalized problems at each level of penalty parameter. If :math:`\mathrm{maxits}\leq 0`, then a value of :math:`100` is used. **maxit** : int, optional Specifies the maximum number of iterations for the penalty method, i.e., the maximum level of penalty parameter. If :math:`\mathrm{maxit}\leq 0`, then a value of :math:`200` is used. **Returns** **x** : float, ndarray, shape :math:`\left(n, n\right)` :math:`X`, the nearest correlation matrix of rank :math:`r`. **f** : float The difference between :math:`X` and :math:`G` given by :math:`\frac{1}{2}\left\lVert X-G\right\rVert_F^2`. **rankerr** : float The rank error of :math:`X`, defined as :math:`\sum_{{i = r+1}}^n\left(\lambda_i\right)`, given that :math:`\lambda_i` denote eigenvalues of :math:`X` sorted in non-increasing order. **nsub** : int The total number of majorized problems that have been solved, i.e., the total number of calls for :meth:`corrmat_nearest`. .. _g02ak-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 0`. (`errno` :math:`4`) On entry, :math:`\mathrm{rank} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0 < \mathrm{rank}\leq n`. (`errno` :math:`5`) Majorized penalty approach fails to converge in :math:`\mathrm{maxit}` level of penalty iterations. **Warns** **NagAlgorithmicWarning** (`errno` :math:`6`) Convergence is limited by machine precision. The objective function value or rank is decreasing very slowly. The array returned in :math:`\mathrm{x}` may still be of interest. .. _g02ak-py2-py-notes: **Notes** ``corrmat_nearest_rank`` finds the nearest correlation matrix :math:`X` of maximum prescribed rank :math:`r` to an approximate correlation matrix :math:`G` in the Frobenius norm. The solver is based on the Majorized Penalty Approach (MPA) proposed by Gao and Sun (2010). One of the key elements in this type of method is that the subproblems are similar to the nearest correlation matrix problem without rank constraint, and can be solved efficiently by :meth:`corrmat_nearest`. The total number of subproblems solved is controlled by the arguments :math:`\mathrm{maxit}` and :math:`\mathrm{maxits}`. The algorithm behaviour and solver accuracy can be modified by these and other input arguments. The default values for these arguments are chosen to work well in the general case but it is recommended that you tune them to your particular problem. For a detailed description of the algorithm see `Algorithmic Details <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02akf.html#algdetails>`__. .. _g02ak-py2-py-references: **References** Bai, S, Qi, H--D and Xiu, N, 2015, `Constrained best Euclidean distance embedding on a sphere: A matrix optimization approach`, SIAM J. Optim. (25(1)), 439--467 Gao, Y and Sun, D, 2010, `A majorized penalty approach for calibrating rank constrained correlation matrix problems`, Technical report, Department of Mathematics, National University of Singapore Qi, H--D and Yuan, X, 2014, `Computing the nearest Euclidean distance matrix with low embedding dimensions`, Mathematical Programming (147(1--2)), 351--389 """ raise NotImplementedError
[docs]def corrmat_shrinking(g, k, errtol=0.0, eigtol=0.0): r""" ``corrmat_shrinking`` computes a correlation matrix, subject to preserving a leading principal submatrix and applying the smallest relative perturbation to the remainder of the approximate input matrix. .. _g02an-py2-py-doc: For full information please refer to the NAG Library document for g02an https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02anf.html .. _g02an-py2-py-parameters: **Parameters** **g** : float, array-like, shape :math:`\left(n, n\right)` :math:`G`, the initial matrix. **k** : int :math:`k`, the order of the leading principal submatrix :math:`A`. **errtol** : float, optional The termination tolerance for the iteration. If :math:`\mathrm{errtol}\leq 0.0`, :math:`\sqrt{\text{machine precision}}` is used. See `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02anf.html#accuracy>`__ for further details. **eigtol** : float, optional The tolerance used in determining the definiteness of :math:`A`. If :math:`\lambda_{\mathrm{min}}\left(A\right) > n\times λ_{\mathrm{max}}\left(A\right)\times \mathrm{eigtol}`, where :math:`\lambda_{\mathrm{min}}\left(A\right)` and :math:`\lambda_{\mathrm{max}}\left(A\right)` denote the minimum and maximum eigenvalues of :math:`A` respectively, :math:`A` is positive definite. If :math:`\mathrm{eigtol}\leq 0`, machine precision is used. **Returns** **x** : float, ndarray, shape :math:`\left(n, n\right)` Contains the matrix :math:`X`. **alpha** : float :math:`\alpha`. **itera** : int The number of iterations taken. **eigmin** : float The smallest eigenvalue of the leading principal submatrix :math:`A`. **norm** : float The value of :math:`\left\lVert G-X\right\rVert_F` after the final iteration. .. _g02an-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 0`. (`errno` :math:`3`) On entry, :math:`\mathrm{k} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq {\mathrm{k} > 0}`. (`errno` :math:`5`) The :math:`k\times k` principal leading submatrix of the initial matrix :math:`G` is not positive definite. (`errno` :math:`6`) Failure to solve intermediate eigenproblem. This should not occur. Please contact `NAG <https://www.nag.com>`__. .. _g02an-py2-py-notes: **Notes** ``corrmat_shrinking`` finds a correlation matrix, :math:`X`, starting from an approximate correlation matrix, :math:`G`, with positive definite leading principal submatrix of order :math:`k`. The returned correlation matrix, :math:`X`, has the following structure: .. math:: X = \alpha \begin{pmatrix}A&0\\0&I\end{pmatrix}+\left(1-α\right)G where :math:`A` is the :math:`k\times k` leading principal submatrix of the input matrix :math:`G` and positive definite, and :math:`\alpha \in \left[0, 1\right]`. ``corrmat_shrinking`` utilizes a shrinking method to find the minimum value of :math:`\alpha` such that :math:`X` is positive definite with unit diagonal. .. _g02an-py2-py-references: **References** Higham, N J, Strabić, N and Šego, V, 2014, `Restoring definiteness via shrinking, with an application to correlation matrices with a fixed block`, MIMS EPrint 2014.54, Manchester Institute for Mathematical Sciences, The University of Manchester, UK """ raise NotImplementedError
[docs]def corrmat_target(g, theta, h, errtol=0.0, eigtol=0.0): r""" ``corrmat_target`` computes a correlation matrix, by using a positive definite **target** matrix derived from weighting the approximate input matrix, with an optional bound on the minimum eigenvalue. .. _g02ap-py2-py-doc: For full information please refer to the NAG Library document for g02ap https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02apf.html .. _g02ap-py2-py-parameters: **Parameters** **g** : float, array-like, shape :math:`\left(n, n\right)` :math:`G`, the initial matrix. **theta** : float The value of :math:`\theta`. If :math:`\mathrm{theta} < 0.0`, :math:`0.0` is used. **h** : float, array-like, shape :math:`\left(n, n\right)` The matrix of weights :math:`H`. **errtol** : float, optional The termination tolerance for the iteration. If :math:`\mathrm{errtol}\leq 0.0`, :math:`\sqrt{\text{machine precision}}` is used. See `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02apf.html#accuracy>`__ for further details. **eigtol** : float, optional The tolerance used in determining the definiteness of the target matrix :math:`T = H\circ G`. If :math:`\lambda_{\mathrm{min}}\left(T\right) > n\times λ_{\mathrm{max}}\left(T\right)\times \mathrm{eigtol}`, where :math:`\lambda_{\mathrm{min}}\left(T\right)` and :math:`\lambda_{\mathrm{max}}\left(T\right)` denote the minimum and maximum eigenvalues of :math:`T` respectively, :math:`T` is positive definite. If :math:`\mathrm{eigtol}\leq 0`, machine precision is used. **Returns** **h** : float, ndarray, shape :math:`\left(n, n\right)` A symmetric matrix :math:`\frac{1}{2}\left(H+H^\mathrm{T}\right)` with its diagonal elements set to :math:`1.0`. **x** : float, ndarray, shape :math:`\left(n, n\right)` Contains the matrix :math:`X`. **alpha** : float The constant :math:`\alpha` used in the formation of :math:`X`. **itera** : int The number of iterations taken. **eigmin** : float The smallest eigenvalue of the target matrix :math:`T`. **norm** : float The value of :math:`\left\lVert G-X\right\rVert_F` after the final iteration. .. _g02ap-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 0`. (`errno` :math:`3`) On entry, :math:`\mathrm{theta} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{theta} < 1.0`. (`errno` :math:`6`) The target matrix is not positive definite. (`errno` :math:`7`) Failure to solve intermediate eigenproblem. This should not occur. Please contact `NAG <https://www.nag.com>`__. .. _g02ap-py2-py-notes: **Notes** Starting from an approximate correlation matrix, :math:`G`, ``corrmat_target`` finds a correlation matrix, :math:`X`, which has the form .. math:: X = \alpha T+\left(1-\alpha \right)G\text{,} where :math:`\alpha \in \left[0, 1\right]` and :math:`T = H\circ G` is a target matrix. :math:`C = A\circ B` denotes the matrix :math:`C` with elements :math:`C_{{ij}} = A_{{ij}}\times B_{{ij}}`. :math:`H` is a matrix of weights that defines the target matrix. The target matrix must be positive definite and thus have off-diagonal elements less than :math:`1` in magnitude. A value of :math:`1` in :math:`H` essentially fixes an element in :math:`G` so it is unchanged in :math:`X`. ``corrmat_target`` utilizes a shrinking method to find the minimum value of :math:`\alpha` such that :math:`X` is positive definite with unit diagonal and with a smallest eigenvalue of at least :math:`\theta \in \left[0, 1\right)` times the smallest eigenvalue of the target matrix. .. _g02ap-py2-py-references: **References** Higham, N J, Strabić, N and Šego, V, 2014, `Restoring definiteness via shrinking, with an application to correlation matrices with a fixed block`, MIMS EPrint 2014.54, Manchester Institute for Mathematical Sciences, The University of Manchester, UK """ raise NotImplementedError
[docs]def corrmat_fixed(g, alpha, h, m, errtol=0.0, maxit=0): r""" ``corrmat_fixed`` computes the nearest correlation matrix, in the Frobenius norm, while fixing elements and optionally with bounds on the eigenvalues, to a given square input matrix. .. _g02as-py2-py-doc: For full information please refer to the NAG Library document for g02as https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02asf.html .. _g02as-py2-py-parameters: **Parameters** **g** : float, array-like, shape :math:`\left(n, n\right)` :math:`\tilde{G}`, the initial matrix. **alpha** : float The value of :math:`\alpha`. If :math:`\mathrm{alpha} < 0.0`, a value of :math:`0.0` is used. **h** : int, array-like, shape :math:`\left(n, n\right)` The symmetric matrix :math:`H`. If an element of :math:`H` is :math:`1` then the corresponding element in :math:`G` is fixed in the output :math:`X`. Only the strictly lower triangular part of :math:`H` need be set. **m** : int The number of previous iterates to use in the Anderson acceleration. If :math:`\mathrm{m} = 0`, Anderson acceleration is not used. See `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02asf.html#accuracy>`__ for further details. If :math:`\mathrm{m} < 0`, a value of :math:`4` is used. **errtol** : float, optional The termination tolerance for the iteration. If :math:`\mathrm{errtol}\leq 0.0`, :math:`\sqrt{\text{machine precision}}` is used. See `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02asf.html#accuracy>`__ for further details. **maxit** : int, optional Specifies the maximum number of iterations. If :math:`\mathrm{maxit}\leq 0`, a value of :math:`200` is used. **Returns** **x** : float, ndarray, shape :math:`\left(n, n\right)` Contains the matrix :math:`X`. **its** : int The number of iterations taken. **fnorm** : float The value of :math:`\left\lVert G-X\right\rVert_F` after the final iteration. .. _g02as-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 0`. (`errno` :math:`4`) On entry, :math:`\mathrm{alpha} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{alpha} < 1.0`. (`errno` :math:`5`) On entry, :math:`\mathrm{m} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{m} \leq {n\times n}`. (`errno` :math:`7`) Function failed to converge in :math:`\langle\mathit{\boldsymbol{value}}\rangle` iterations. A solution may not exist, however, try increasing :math:`\mathrm{maxit}`. (`errno` :math:`8`) Failure during Anderson acceleration. Consider setting :math:`\mathrm{m} = 0` and recomputing. (`errno` :math:`9`) The fixed element :math:`G_{{ij}}`, lies outside the interval :math:`\left[-1, 1\right]`, for :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`. .. _g02as-py2-py-notes: **Notes** ``corrmat_fixed`` finds the nearest correlation matrix, :math:`X`, to a matrix, :math:`G`, in the Frobenius norm. It uses an alternating projections algorithm with Anderson acceleration. Elements in the input matrix can be fixed by supplying the value :math:`1` in the corresponding element of the matrix :math:`H`. However, note that the algorithm may fail to converge if the fixed elements do not form part of a valid correlation matrix. You can optionally specify a lower bound, :math:`\alpha`, on the eigenvalues of the computed correlation matrix, forcing the matrix to be positive definite with :math:`0\leq \alpha < 1`. .. _g02as-py2-py-references: **References** Anderson, D G, 1965, `Iterative Procedures for Nonlinear Integral Equations`, J. Assoc. Comput. Mach. (12), 547--560 Higham, N J and Strabić, N, 2016, `Anderson acceleration of the alternating projections method for computing the nearest correlation matrix`, Numer. Algor. (72), 1021--1042 """ raise NotImplementedError
[docs]def coeffs_pearson(x): r""" ``coeffs_pearson`` computes means and standard deviations of variables, sums of squares and cross-products of deviations from means, and Pearson product-moment correlation coefficients for a set of data. .. _g02ba-py2-py-doc: For full information please refer to the NAG Library document for g02ba https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02baf.html .. _g02ba-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **Returns** **xbar** : float, ndarray, shape :math:`\left(m\right)` The mean value, :math:`\bar{x}_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **std** : float, ndarray, shape :math:`\left(m\right)` The standard deviation, :math:`s_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **ssp** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{ssp}[\textit{j}-1,\textit{k}-1]` is the cross-product of deviations :math:`S_{{\textit{j}\textit{k}}}`, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. **r** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{r}[\textit{j}-1,\textit{k}-1]` is the product-moment correlation coefficient :math:`R_{{\textit{j}\textit{k}}}` between the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. .. _g02ba-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. .. _g02ba-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consist of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{1}{n}\sum_{{i = 1}}^nx_{{ij}}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Standard deviations: .. math:: s_j = \sqrt{\frac{1}{{n-1}}\sum_{{i = 1}}^n\left(x_{{ij}}-\bar{x}_j\right)^2}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Sums of squares and cross-products of deviations from means: .. math:: S_{{jk}} = \sum_{{i = 1}}^n\left(x_{{ij}}-\bar{x}_j\right)\left(x_{{ik}}-\bar{x}_k\right)\text{, }\quad j,k = 1,2,\ldots,m\text{.} (#) Pearson product-moment correlation coefficients: .. math:: R_{{jk}} = \frac{S_{{jk}}}{{\sqrt{S_{{jj}}S_{{kk}}}}}\text{, }\quad j,k = 1,2,\ldots,m\text{.} If :math:`S_{{jj}}` or :math:`S_{{kk}}` is zero, :math:`R_{{jk}}` is set to zero. """ raise NotImplementedError
[docs]def coeffs_pearson_miss_case(x, miss, xmiss): r""" ``coeffs_pearson_miss_case`` computes means and standard deviations of variables, sums of squares and cross-products of deviations from means, and Pearson product-moment correlation coefficients for a set of data omitting completely any cases with a missing observation for any variable. .. _g02bb-py2-py-doc: For full information please refer to the NAG Library document for g02bb https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bbf.html .. _g02bb-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **miss** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{miss}[j-1]` must be set equal to :math:`1` if a missing value, :math:`xm_j`, is to be specified for the :math:`j`\ th variable in the array :math:`\mathrm{x}`, or set equal to :math:`0` otherwise. Values of :math:`\mathrm{miss}` must be given for all :math:`m` variables in the array :math:`\mathrm{x}`. **xmiss** : float, array-like, shape :math:`\left(m\right)` :math:`\mathrm{xmiss}[j-1]` must be set to the missing value, :math:`xm_j`, to be associated with the :math:`j`\ th variable in the array :math:`\mathrm{x}`, for those variables for which missing values are specified by means of the array :math:`\mathrm{miss}` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bbf.html#accuracy>`__). **Returns** **xbar** : float, ndarray, shape :math:`\left(m\right)` The mean value, :math:`\bar{x}_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **std** : float, ndarray, shape :math:`\left(m\right)` The standard deviation, :math:`s_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **ssp** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{ssp}[\textit{j}-1,\textit{k}-1]` is the cross-product of deviations :math:`S_{{\textit{j}\textit{k}}}`, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. **r** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{r}[\textit{j}-1,\textit{k}-1]` is the product-moment correlation coefficient :math:`R_{{\textit{j}\textit{k}}}` between the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. **ncases** : int The number of cases actually used in the calculations (when cases involving missing values have been eliminated). .. _g02bb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. (`errno` :math:`4`) After observations with missing values were omitted, no cases remained. (`errno` :math:`5`) After observations with missing values were omitted, only one case remained. .. _g02bb-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consist of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),{j = 1,2,\ldots,m\left(m\geq 2\right)\text{,}} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable. In addition, each of the :math:`m` variables may optionally have associated with it a value which is to be considered as representing a missing observation for that variable; the missing value for the :math:`j`\ th variable is denoted by :math:`xm_j`. Missing values need not be specified for all variables. Let :math:`w_i = 0` if observation :math:`i` contains a missing value for any of those variables for which missing values have been declared, i.e., if :math:`x_{{ij}} = xm_j` for any :math:`j` for which an :math:`xm_j` has been assigned (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bbf.html#accuracy>`__); and :math:`w_i = 1` otherwise, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{{\sum_{{i = 1}}^nw_ix_{{ij}}}}{{\sum_{{i = 1}}^nw_i}}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Standard deviations: .. math:: s_j = \sqrt{\frac{{\sum_{{i = 1}}^nw_i\left(x_{{ij}}-\bar{x}_j\right)^2}}{{\sum_{{i = 1}}^nw_i-1}}}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Sums of squares and cross-products of deviations from means: .. math:: S_{{jk}} = \sum_{{i = 1}}^nw_i\left(x_{{ij}}-\bar{x}_j\right)\left(x_{{ik}}-\bar{x}_k\right)\text{, }\quad j,k = 1,2,\ldots,m\text{.} (#) Pearson product-moment correlation coefficients: .. math:: R_{{jk}} = \frac{S_{{jk}}}{{\sqrt{S_{{jj}}S_{{kk}}}}}\text{, }\quad j,k = 1,2,\ldots,m\text{.} If :math:`S_{{jj}}` or :math:`S_{{kk}}` is zero, :math:`R_{{jk}}` is set to zero. """ raise NotImplementedError
[docs]def coeffs_pearson_miss_pair(n, x, miss, xmiss): r""" ``coeffs_pearson_miss_pair`` computes means and standard deviations of variables, sums of squares and cross-products of deviations from means, and Pearson product-moment correlation coefficients for a set of data omitting cases with missing values from only those calculations involving the variables for which the values are missing. .. _g02bc-py2-py-doc: For full information please refer to the NAG Library document for g02bc https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bcf.html .. _g02bc-py2-py-parameters: **Parameters** **n** : int :math:`n`, the number of observations or cases. **x** : float, array-like, shape :math:`\left(\mathrm{n}, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **miss** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{miss}[j-1]` must be set equal to :math:`1` if a missing value, :math:`xm_j`, is to be specified for the :math:`j`\ th variable in the array :math:`\mathrm{x}`, or set equal to :math:`0` otherwise. Values of :math:`\mathrm{miss}` must be given for all :math:`m` variables in the array :math:`\mathrm{x}`. **xmiss** : float, array-like, shape :math:`\left(m\right)` :math:`\mathrm{xmiss}[j-1]` must be set to the missing value, :math:`xm_j`, to be associated with the :math:`j`\ th variable in the array :math:`\mathrm{x}`, for those variables for which missing values are specified by means of the array :math:`\mathrm{miss}` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bcf.html#accuracy>`__). **Returns** **xbar** : float, ndarray, shape :math:`\left(m\right)` The mean value, :math:`\bar{x}_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **std** : float, ndarray, shape :math:`\left(m\right)` The standard deviation, :math:`s_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **ssp** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{ssp}[\textit{j}-1,\textit{k}-1]` is the cross-product of deviations :math:`S_{{\textit{j}\textit{k}}}`, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. **r** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{r}[\textit{j}-1,\textit{k}-1]` is the product-moment correlation coefficient :math:`R_{{\textit{j}\textit{k}}}` between the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. **ncases** : int The minimum number of cases used in the calculation of any of the sums of squares and cross-products and correlation coefficients (when cases involving missing values have been eliminated). **cnt** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{cnt}[\textit{j}-1,\textit{k}-1]` is the number of cases, :math:`c_{{\textit{j}\textit{k}}}`, actually used in the calculation of :math:`S_{{\textit{j}\textit{k}}}`, and :math:`R_{{\textit{j}\textit{k}}}`, the sum of cross-products and correlation coefficient for the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. .. _g02bc-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{n} \geq 2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`4`) After observations with missing values were omitted, fewer than two cases remained for at least one pair of variables. (The pairs of variables involved can be determined by examination of the contents of the array :math:`\mathrm{cnt}`). All means, standard deviations, sums of squares and cross-products, and correlation coefficients based on two or more cases are returned by the function even if :math:`\mathrm{errno}` = 4. .. _g02bc-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consist of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable. In addition, each of the :math:`m` variables may optionally have associated with it a value which is to be considered as representing a missing observation for that variable; the missing value for the :math:`j`\ th variable is denoted by :math:`\textit{xm}_j`. Missing values need not be specified for all variables. Let :math:`w_{{\textit{i}\textit{j}}} = 0` if the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th variable is a missing value, i.e., if a missing value, :math:`\textit{xm}_{\textit{j}}`, has been declared for the :math:`\textit{j}`\ th variable, and :math:`x_{{\textit{i}\textit{j}}} = \textit{xm}_{\textit{j}}` (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bcf.html#accuracy>`__); and :math:`w_{{\textit{i}\textit{j}}} = 1` otherwise, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{{\sum_{{i = 1}}^nw_{{ij}}x_{{ij}}}}{{\sum_{{i = 1}}^nw_{{ij}}}}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Standard deviations: .. math:: s_j = \sqrt{\frac{{\sum_{{i = 1}}^nw_{{ij}}\left(x_{{ij}}-\bar{x}_j\right)^2}}{{\left(\sum_{{i = 1}}^nw_{{ij}}\right)-1}}}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Sums of squares and cross-products of deviations from means: .. math:: S_{{jk}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}\left(x_{{ij}}-\bar{x}_{{j\left(k\right)}}\right)\left(x_{{ik}}-\bar{x}_{{k\left(j\right)}}\right)\text{, }\quad j,k = 1,2,\ldots,m\text{,} where .. math:: \bar{x}_{{j\left(k\right)}} = \frac{{\sum_{{i = 1}}^nw_{{ij}}w_{{ik}}x_{{ij}}}}{{\sum_{{i = 1}}^nw_{{ij}}w_{{ik}}}}\quad \text{ and }\quad \bar{x}_{{k\left(j\right)}} = \frac{{\sum_{{i = 1}}^nw_{{ik}}w_{{ij}}x_{{ik}}}}{{\sum_{{i = 1}}^nw_{{ik}}w_{{ij}}}}\text{,} (i.e., the means used in the calculation of the sums of squares and cross-products of deviations are based on the same set of observations as are the cross-products.) (#) Pearson product-moment correlation coefficients: .. math:: R_{{jk}} = \frac{S_{{jk}}}{{\sqrt{S_{{jj\left(k\right)}}S_{{kk\left(j\right)}}}}}\text{, }\quad j,k, = 1,2,\ldots,m\text{,} where :math:`S_{{jj\left(k\right)}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}{\left(x_{{ij}}-\bar{x}_{{j\left(k\right)}}\right)}^2` and :math:`S_{{kk\left(j\right)}} = \sum_{{i = 1}}^nw_{{ik}}w_{{ij}}{\left(x_{{ik}}-\bar{x}_{{k\left(j\right)}}\right)}^2` and :math:`\bar{x}_{{j\left(k\right)}}` and :math:`\bar{x}_{{k\left(j\right)}}` are as defined in (c) above (i.e., the sums of squares of deviations used in the denominator are based on the same set of observations as are used in the calculation of the numerator). If :math:`S_{{jj\left(k\right)}}` or :math:`S_{{kk\left(j\right)}}` is zero, :math:`R_{{jk}}` is set to zero. (#) The number of cases used in the calculation of each of the correlation coefficients: .. math:: c_{{jk}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}\text{, }\quad j,k = 1,2,\ldots,m\text{.} (The diagonal terms, :math:`c_{{\textit{j}\textit{j}}}`, for :math:`\textit{j} = 1,2,\ldots,m`, also give the number of cases used in the calculation of the means, :math:`\bar{x}_{\textit{j}}`, and the standard deviations, :math:`s_{\textit{j}}`.) """ raise NotImplementedError
[docs]def coeffs_zero(x): r""" ``coeffs_zero`` computes means and standard deviations of variables, sums of squares and cross-products about zero, and correlation-like coefficients for a set of data. .. _g02bd-py2-py-doc: For full information please refer to the NAG Library document for g02bd https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bdf.html .. _g02bd-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to the value of :math:`x_{{\textit{i}\textit{j}}}`, the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **Returns** **xbar** : float, ndarray, shape :math:`\left(m\right)` :math:`\mathrm{xbar}[\textit{j}-1]` contains the mean value, :math:`\bar{x}_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **std** : float, ndarray, shape :math:`\left(m\right)` The standard deviation, :math:`s_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **sspz** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{sspz}[\textit{j}-1,\textit{k}-1]` is the cross-product about zero, :math:`\tilde{S}_{{\textit{j}\textit{k}}}`, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. **rz** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{rz}[\textit{j}-1,\textit{k}-1]` is the correlation-like coefficient, :math:`\tilde{R}_{{\textit{j}\textit{k}}}`, between the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. .. _g02bd-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. .. _g02bd-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right)\text{, }\quad j = 1,2,\ldots,m\quad \text{ }\quad \left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{1}{n}\sum_{{i = 1}}^nx_{{ij}}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Standard deviations: .. math:: s_j = \sqrt{\frac{1}{{n-1}}\sum_{{i = 1}}^n\left(x_{{ij}}-\bar{x}_j\right)^2}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Sums of squares and cross-products about zero: .. math:: \tilde{S}_{{jk}} = \sum_{{i = 1}}^nx_{{ij}}x_{{ik}}\text{, }\quad j,k = 1,2,\ldots,m\text{.} (#) Correlation-like coefficients: .. math:: \tilde{R}_{{jk}} = \frac{\tilde{S}_{{jk}}}{{\sqrt{\tilde{S}_{{jj}}\tilde{S}_{{kk}}}}}\text{, }\quad j,k = 1,2,\ldots,m\text{.} If :math:`\tilde{S}_{{jj}}` or :math:`\tilde{S}_{{kk}}` is zero, :math:`\tilde{R}_{{jk}}` is set to zero. """ raise NotImplementedError
[docs]def coeffs_zero_miss_case(x, miss, xmiss): r""" ``coeffs_zero_miss_case`` computes means and standard deviations of variables, sums of squares and cross-products about zero, and correlation-like coefficients for a set of data omitting completely any cases with a missing observation for any variable. .. _g02be-py2-py-doc: For full information please refer to the NAG Library document for g02be https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bef.html .. _g02be-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **miss** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{miss}[j-1]` must be set equal to :math:`1` if a missing value, :math:`xm_j`, is to be specified for the :math:`j`\ th variable in the array :math:`\mathrm{x}`, or set equal to :math:`0` otherwise. Values of :math:`\mathrm{miss}` must be given for all :math:`m` variables in the array :math:`\mathrm{x}`. **xmiss** : float, array-like, shape :math:`\left(m\right)` :math:`\mathrm{xmiss}[j-1]` must be set to the missing value, :math:`xm_j`, to be associated with the :math:`j`\ th variable in the array :math:`\mathrm{x}`, for those variables for which missing values are specified by means of the array :math:`\mathrm{miss}` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bef.html#accuracy>`__). **Returns** **xbar** : float, ndarray, shape :math:`\left(m\right)` The mean value, :math:`\bar{x}_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **std** : float, ndarray, shape :math:`\left(m\right)` The standard deviation, :math:`s_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **sspz** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{sspz}[\textit{j}-1,\textit{k}-1]` is the cross-product about zero, :math:`\tilde{S}_{{\textit{j}\textit{k}}}`, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. **rz** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{rz}[\textit{j}-1,\textit{k}-1]` is the correlation-like coefficient, :math:`\tilde{R}_{{\textit{j}\textit{k}}}`, between the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. **ncases** : int The number of cases actually used in the calculations (when cases involving missing values have been eliminated). .. _g02be-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. (`errno` :math:`4`) After observations with missing values were omitted, no cases remained. (`errno` :math:`5`) After observations with missing values were omitted, only one case remained. .. _g02be-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable. In addition, each of the :math:`m` variables may optionally have associated with it a value which is to be considered as representing a missing observation for that variable; the missing value for the :math:`j`\ th variable is denoted by :math:`\textit{xm}_j`. Missing values need not be specified for all variables. Let :math:`w_i = 0` if observation :math:`i` contains a missing value for any of those variables for which missing values have been declared, i.e., if :math:`x_{{ij}} = \textit{xm}_j` for any :math:`j` for which an :math:`\textit{xm}_j` has been assigned (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bef.html#accuracy>`__); and :math:`w_i = 1` otherwise, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{{\sum_{{i = 1}}^nw_ix_{{ij}}}}{{\sum_{{i = 1}}^nw_i}}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Standard deviations: .. math:: s_j = \sqrt{\frac{{\sum_{{i = 1}}^nw_i\left(x_{{ij}}-\bar{x}_j\right)^2}}{{\sum_{{i = 1}}^nw_i-1}}}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Sums of squares and cross-products about zero: .. math:: \tilde{S}_{{jk}} = \sum_{{i = 1}}^nw_ix_{{ij}}x_{{ik}}\text{, }\quad j,k = 1,2,\ldots,m\text{.} (#) Correlation-like coefficients: .. math:: \tilde{R}_{{jk}} = \frac{\tilde{S}_{{jk}}}{{\sqrt{\tilde{S}_{{jj}}\tilde{S}_{{kk}}}}}\text{, }\quad j,k = 1,2,\ldots,m\text{.} If :math:`\tilde{S}_{{jj}}` or :math:`\tilde{S}_{{kk}}` is zero, :math:`\tilde{R}_{{jk}}` is set to zero. """ raise NotImplementedError
[docs]def coeffs_zero_miss_pair(x, miss, xmiss): r""" ``coeffs_zero_miss_pair`` computes means and standard deviations of variables, sums of squares and cross-products about zero and correlation-like coefficients for a set of data omitting cases with missing values from only those calculations involving the variables for which the values are missing. .. _g02bf-py2-py-doc: For full information please refer to the NAG Library document for g02bf https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bff.html .. _g02bf-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **miss** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{miss}[j-1]` must be set equal to :math:`1` if a missing value, :math:`xm_j`, is to be specified for the :math:`j`\ th variable in the array :math:`\mathrm{x}`, or set equal to :math:`0` otherwise. Values of :math:`\mathrm{miss}` must be given for all :math:`m` variables in the array :math:`\mathrm{x}`. **xmiss** : float, array-like, shape :math:`\left(m\right)` :math:`\mathrm{xmiss}[j-1]` must be set to the missing value, :math:`xm_j`, to be associated with the :math:`j`\ th variable in the array :math:`\mathrm{x}`, for those variables for which missing values are specified by means of the array :math:`\mathrm{miss}` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bff.html#accuracy>`__). **Returns** **xbar** : float, ndarray, shape :math:`\left(m\right)` The mean value, :math:`\bar{x}_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **std** : float, ndarray, shape :math:`\left(m\right)` The standard deviation, :math:`s_{\textit{j}}`, of the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`. **sspz** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{sspz}[\textit{j}-1,\textit{k}-1]` is the cross-product about zero, :math:`\tilde{S}_{{\textit{j}\textit{k}}}`, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. **rz** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{rz}[\textit{j}-1,\textit{k}-1]` is the correlation-like coefficient, :math:`\tilde{R}_{{\textit{j}\textit{k}}}`, between the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. **ncases** : int The minimum number of cases used in the calculation of any of the sums of squares and cross-products and correlation-like coefficients (when cases involving missing values have been eliminated). **cnt** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{cnt}[\textit{j}-1,\textit{k}-1]` is the number of cases, :math:`c_{{\textit{j}\textit{k}}}`, actually used in the calculation of :math:`\tilde{S}_{{\textit{j}\textit{k}}}`, and :math:`\tilde{R}_{{\textit{j}\textit{k}}}`, the sum of cross-products and correlation-like coefficient for the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. .. _g02bf-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`4`) After observations with missing values were omitted, fewer than two cases remained for at least one pair of variables. (The pairs of variables involved can be determined by examination of the contents of the array :math:`\mathrm{cnt}`). All means, standard deviations, sums of squares and cross-products, and correlation-like coefficients based on two or more cases are returned by the function even if :math:`\mathrm{errno}` = 4. .. _g02bf-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable. In addition, each of the :math:`m` variables may optionally have associated with it a value which is to be considered as representing a missing observation for that variable; the missing value for the :math:`j`\ th variable is denoted by :math:`\textit{xm}_j`. Missing values need not be specified for all variables. Let :math:`w_{{\textit{i}\textit{j}}} = 0` if the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th variable is a missing value, i.e., if a missing value, :math:`\textit{xm}_{\textit{j}}`, has been declared for the :math:`\textit{j}`\ th variable, and :math:`x_{{\textit{i}\textit{j}}} = \textit{xm}_{\textit{j}}` (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bff.html#accuracy>`__); and :math:`w_{{\textit{i}\textit{j}}} = 1` otherwise, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{{\sum_{{i = 1}}^nw_{{ij}}x_{{ij}}}}{{\sum_{{i = 1}}^nw_{{ij}}}}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Standard deviations: .. math:: s_j = \sqrt{\frac{{\sum_{{i = 1}}^nw_{{ij}}\left(x_{{ij}}-\bar{x}_j\right)^2}}{{\sum_{{i = 1}}^nw_{{ij}}-1}}}\text{, }\quad j = 1,2,\ldots,m\text{.} (#) Sums of squares and cross-products about zero: .. math:: \tilde{S}_{{jk}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}x_{{ij}}x_{{ik}}\text{, }\quad j,k = 1,2,\ldots,m\text{.} (#) Correlation-like coefficients: .. math:: \tilde{R}_{{jk}} = \frac{\tilde{S}_{{jk}}}{{\sqrt{\tilde{S}_{{jj\left(k\right)}}j\left(k\right)\tilde{S}_{{kk\left(j\right)}}}}}\text{, }\quad j,k = 1,2,\ldots,m\text{,} where :math:`\tilde{S}_{{jj\left(k\right)}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}x_{{ij}}^2` and :math:`\tilde{S}_{{kk\left(j\right)}} = \sum_{{i = 1}}^nw_{{ik}}w_{{ij}}x_{{ik}}^2` (i.e., the sums of squares about zero are based on the same set of observations as are used in the calculation of the numerator). If :math:`\tilde{S}_{{jj\left(k\right)}}` or :math:`\tilde{S}_{{kk\left(j\right)}}` is zero, :math:`\tilde{R}_{{jk}}` is set to zero. (#) The number of cases used in the calculation of each of the correlation-like coefficients: .. math:: c_{{jk}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}\text{, }\quad j,k = 1,2,\ldots,m\text{.} (The diagonal terms, :math:`c_{{\textit{j}\textit{j}}}`, for :math:`\textit{j} = 1,2,\ldots,m`, also give the number of cases used in the calculation of the means :math:`\bar{x}_{\textit{j}}` and the standard deviations :math:`s_{\textit{j}}`.) """ raise NotImplementedError
[docs]def coeffs_pearson_subset(x, kvar): r""" ``coeffs_pearson_subset`` computes means and standard deviations, sums of squares and cross-products of deviations from means, and Pearson product-moment correlation coefficients for selected variables. .. _g02bg-py2-py-doc: For full information please refer to the NAG Library document for g02bg https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bgf.html .. _g02bg-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **kvar** : int, array-like, shape :math:`\left(\textit{nvars}\right)` :math:`\mathrm{kvar}[\textit{j}]` must be set to the column number in :math:`\mathrm{x}` of the :math:`\textit{j}`\ th variable for which information is required, for :math:`\textit{j} = 0,\ldots,p-1`. **Returns** **xbar** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The mean value, :math:`\bar{x}_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **std** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The standard deviation, :math:`s_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **ssp** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{ssp}[\textit{j}-1,\textit{k}-1]` is the cross-product of deviations, :math:`S_{{\textit{j}\textit{k}}}`, for the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. **r** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{r}[\textit{j}-1,\textit{k}-1]` is the product-moment correlation coefficient, :math:`R_{{\textit{j}\textit{k}}}`, between the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. .. _g02bg-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\textit{nvars} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nvars} \geq 2` and :math:`\textit{nvars} \leq m`. (`errno` :math:`4`) On entry, :math:`\textit{i} = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{kvar}[\textit{i}-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{kvar}[\textit{i}-1]\leq m`. .. _g02bg-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consist of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable, together with the subset of these variables, :math:`v_1,v_2,\ldots,v_p`, for which information is required. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{1}{n}\sum_{{i = 1}}^nx_{{ij}}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Standard deviations: .. math:: s_j = \sqrt{\frac{1}{{n-1}}\sum_{{i = 1}}^n\left(x_{{ij}}-\bar{x}_j\right)^2}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Sums of squares and cross-products of deviations from zero: .. math:: S_{{jk}} = \sum_{{i = 1}}^n\left(x_{{ij}}-\bar{x}_j\right)\left(x_{{ik}}-\bar{x}_k\right)\text{, }\quad j,k = v_1,v_2,\ldots,v_p\text{.} (#) Pearson product-moment correlation coefficients: .. math:: R_{{jk}} = \frac{S_{{jk}}}{{\sqrt{S_{{jj}}S_{{kk}}}}}\text{, }\quad j,k = v_1,v_2,\ldots v_p\text{.} If :math:`S_{{jj}}` or :math:`S_{{kk}}` is zero, :math:`R_{{jk}}` is set to zero. """ raise NotImplementedError
[docs]def coeffs_pearson_subset_miss_case(x, miss, xmiss, mistyp, kvar): r""" ``coeffs_pearson_subset_miss_case`` computes means and standard deviations, sums of squares and cross-products of deviations from means, and Pearson product-moment correlation coefficients for selected variables omitting completely any cases with a missing observation for any variable (either over all variables in the dataset or over only those variables in the selected subset). .. _g02bh-py2-py-doc: For full information please refer to the NAG Library document for g02bh https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bhf.html .. _g02bh-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **miss** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{miss}[j-1]` must be set equal to :math:`1` if a missing value, :math:`xm_j`, is to be specified for the :math:`j`\ th variable in the array :math:`\mathrm{x}`, or set equal to :math:`0` otherwise. Values of :math:`\mathrm{miss}` must be given for all :math:`m` variables in the array :math:`\mathrm{x}`. **xmiss** : float, array-like, shape :math:`\left(m\right)` :math:`\mathrm{xmiss}[j-1]` must be set to the missing value, :math:`xm_j`, to be associated with the :math:`j`\ th variable in the array :math:`\mathrm{x}`, for those variables for which missing values are specified by means of the array :math:`\mathrm{miss}` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bhf.html#accuracy>`__). **mistyp** : int Indicates the manner in which missing observations are to be treated. :math:`\mathrm{mistyp} = 1` A case is excluded if it contains a missing value for any of the variables :math:`1,2,\ldots,m`. :math:`\mathrm{mistyp} = 0` A case is excluded if it contains a missing value for any of the :math:`p\left(\leq m\right)` variables specified in the array :math:`\mathrm{kvar}`. **kvar** : int, array-like, shape :math:`\left(\textit{nvars}\right)` :math:`\mathrm{kvar}[\textit{j}]` must be set to the column number in :math:`\mathrm{x}` of the :math:`\textit{j}`\ th variable for which information is required, for :math:`\textit{j} = 0,\ldots,p-1`. **Returns** **xbar** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The mean value, of :math:`\bar{x}_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **std** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The standard deviation, :math:`s_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **ssp** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{ssp}[\textit{j}-1,\textit{k}-1]` is the cross-product of deviations, :math:`S_{{\textit{j}\textit{k}}}`, for the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. **r** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{r}[\textit{j}-1,\textit{k}-1]` is the product-moment correlation coefficient, :math:`R_{{\textit{j}\textit{k}}}`, between the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. **ncases** : int The number of cases actually used in the calculations (when cases involving missing values have been eliminated). .. _g02bh-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\textit{nvars} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nvars} \geq 2` and :math:`\textit{nvars} \leq m`. (`errno` :math:`4`) On entry, :math:`\textit{i} = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{kvar}[\textit{i}-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{kvar}[\textit{i}-1]\leq m`. (`errno` :math:`5`) On entry, :math:`\mathrm{mistyp} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mistyp} = 0` or :math:`1`. (`errno` :math:`6`) After observations with missing values were omitted, no cases remained. (`errno` :math:`7`) After observations with missing values were omitted, only one case remained. .. _g02bh-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable, together with the subset of these variables, :math:`v_1,v_2,\ldots,v_p`, for which information is required. In addition, each of the :math:`m` variables may optionally have associated with it a value which is to be considered as representing a missing observation for that variable; the missing value for the :math:`j`\ th variable is denoted by :math:`\textit{xm}_j`. Missing values need not be specified for all variables. The missing values can be utilized in two slightly different ways; you can indicate which scheme is required. Firstly, let :math:`w_i = 0` if observation :math:`i` contains a missing value for any of those variables in the set :math:`1,2,\ldots,m` for which missing values have been declared, i.e., if :math:`x_{{ij}} = \textit{xm}_j` for any :math:`j` (:math:`j = 1,2,\ldots,m`) for which an :math:`\textit{xm}_j` has been assigned (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bhf.html#accuracy>`__); and :math:`w_i = 1` otherwise, for :math:`\textit{i} = 1,2,\ldots,n`. Secondly, let :math:`w_i = 0` if observation :math:`i` contains a missing value for any of those variables in the selected subset :math:`v_1,v_2,\ldots,v_p` for which missing values have been declared, i.e., if :math:`x_{{ij}} = \textit{xm}_j` for any :math:`j` (:math:`j = v_1,v_2,\ldots,v_p`) for which an :math:`\textit{xm}_j` has been assigned (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bhf.html#accuracy>`__); and :math:`w_i = 1` otherwise, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{{\sum_{{i = 1}}^nw_ix_{{ij}}}}{{\sum_{{i = 1}}^nw_i}}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Standard deviations: .. math:: s_j = \sqrt{\frac{{\sum_{{i = 1}}^nw_i\left(x_{{ij}}-\bar{x}_j\right)^2}}{{\sum_{{i = 1}}^nw_i-1}}}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Sums of squares and cross-products of deviations from means: .. math:: S_{{jk}} = \sum_{{i = 1}}^nw_i\left(x_{{ij}}-\bar{x}_j\right)\left(x_{{ik}}-\bar{x}_k\right)\text{, }\quad j,k = v_1,v_2,\ldots,v_p\text{.} (#) Pearson product-moment correlation coefficients: .. math:: R_{{jk}} = \frac{S_{{jk}}}{{\sqrt{S_{{jj}}S_{{kk}}}}}\text{, }\quad j,k = v_1,v_2,\ldots,v_p\text{.} If :math:`S_{{jj}}` or :math:`S_{{kk}}` is zero, :math:`R_{{jk}}` is set to zero. """ raise NotImplementedError
[docs]def coeffs_pearson_subset_miss_pair(x, miss, xmiss, kvar): r""" ``coeffs_pearson_subset_miss_pair`` computes means and standard deviations, sums of squares and cross-products of deviations from means, and Pearson product-moment correlation coefficients for selected variables omitting cases with missing values from only those calculations involving the variables for which the values are missing. .. _g02bj-py2-py-doc: For full information please refer to the NAG Library document for g02bj https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bjf.html .. _g02bj-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **miss** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{miss}[j-1]` must be set equal to :math:`1` if a missing value, :math:`xm_j`, is to be specified for the :math:`j`\ th variable in the array :math:`\mathrm{x}`, or set equal to :math:`0` otherwise. Values of :math:`\mathrm{miss}` must be given for all :math:`m` variables in the array :math:`\mathrm{x}`. **xmiss** : float, array-like, shape :math:`\left(m\right)` :math:`\mathrm{xmiss}[j-1]` must be set to the missing value, :math:`xm_j`, to be associated with the :math:`j`\ th variable in the array :math:`\mathrm{x}`, for those variables for which missing values are specified by means of the array :math:`\mathrm{miss}` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bjf.html#accuracy>`__). **kvar** : int, array-like, shape :math:`\left(\textit{nvars}\right)` :math:`\mathrm{kvar}[\textit{j}]` must be set to the column number in :math:`\mathrm{x}` of the :math:`\textit{j}`\ th variable for which information is required, for :math:`\textit{j} = 0,\ldots,p-1`. **Returns** **xbar** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The mean value, :math:`\bar{x}_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **std** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The standard deviation, :math:`s_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **ssp** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{ssp}[\textit{j}-1,\textit{k}-1]` is the cross-product of deviations, :math:`S_{{\textit{j}\textit{k}}}`, for the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. **r** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{r}[\textit{j}-1,\textit{k}-1]` is the product-moment correlation coefficient, :math:`R_{{\textit{j}\textit{k}}}`, between the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. **ncases** : int The minimum number of cases used in the calculation of any of the sums of squares and cross-products and correlation coefficients (when cases involving missing values have been eliminated). **cnt** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{cnt}[\textit{j}-1,\textit{k}-1]` is the number of cases, :math:`c_{{\textit{j}\textit{k}}}`, actually used in the calculation of :math:`S_{{\textit{j}\textit{k}}}`, and :math:`R_{{\textit{j}\textit{k}}}`, the sum of cross-products and correlation coefficient for the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. .. _g02bj-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\textit{nvars} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nvars} \geq 2` and :math:`\textit{nvars} \leq m`. (`errno` :math:`4`) On entry, :math:`\textit{i} = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{kvar}[\textit{i}-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{kvar}[\textit{i}-1]\leq m`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`5`) After observations with missing values were omitted, fewer than two cases remained. .. _g02bj-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable, together with the subset of these variables, :math:`v_1,v_2,\ldots,v_p`, for which information is required. In addition, each of the :math:`m` variables may optionally have associated with it a value which is to be considered as representing a missing observation for that variable; the missing value for the :math:`j`\ th variable is denoted by :math:`\textit{xm}_j`. Missing values need not be specified for all variables. Let :math:`w_{{\textit{i}\textit{j}}} = 0` if the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th variable is a missing value, i.e., if a missing value, :math:`\textit{xm}_{\textit{j}}`, has been declared for the :math:`\textit{j}`\ th variable, and :math:`x_{{\textit{i}\textit{j}}} = \textit{xm}_{\textit{j}}` (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bjf.html#accuracy>`__); and :math:`w_{{\textit{i}\textit{j}}} = 1` otherwise, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{{\sum_{{i = 1}}^nw_{{ij}}x_{{ij}}}}{{\sum_{{i = 1}}^nw_{{ij}}}}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Standard deviations: .. math:: s_j = \sqrt{\frac{{\sum_{{i = 1}}^nw_{{ij}}\left(x_{{ij}}-\bar{x}_j\right)^2}}{{\sum_{{i = 1}}^nw_{{ij}}-1}}}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Sums of squares and cross-products of deviations from means: .. math:: S_{{jk}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}\left(x_{{ij}}-\bar{x}_{{j\left(k\right)}}\right)\left(x_{{ik}}-\bar{x}_{{k\left(j\right)}}\right)\text{, }\quad j,k = v_1,v_2,\ldots,v_p\text{,} where .. math:: \bar{x}_{{j\left(k\right)}} = \frac{{\sum_{{i = 1}}^nw_{{ij}}w_{{ik}}x_{{ij}}}}{{\sum_{{i = 1}}^nw_{{ij}}w_{{ik}}}}\quad \text{ and }\quad \bar{x}_{{k\left(j\right)}} = \frac{{\sum_{{i = 1}}^nw_{{ik}}w_{{ij}}x_{{ik}}}}{{\sum_{{i = 1}}^nw_{{ik}}w_{{ij}}}}\text{,} (i.e., the means used in the calculation of the sum of squares and cross-products of deviations are based on the same set of observations as are the cross-products). (#) Pearson product-moment correlation coefficients: .. math:: R_{{jk}} = \frac{S_{{jk}}}{{\sqrt{S_{{jj\left(k\right)}}S_{{kk\left(j\right)}}}}}\text{, }\quad j,k = v_1,v_2,\ldots,v_p\text{,} where .. math:: S_{{jj\left(k\right)}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}{\left(x_{{ij}}-\bar{x}_{{j\left(k\right)}}\right)}^2\quad \text{ and }\quad S_{{kk\left(j\right)}} = \sum_{{i = 1}}^nw_{{ik}}w_{{ij}}{\left(x_{{ik}}-\bar{x}_{{k\left(j\right)}}\right)}^2\text{,} (i.e., the sums of squares of deviations used in the denominator are based on the same set of observations as are used in the calculation of the numerator). If :math:`S_{{jj\left(k\right)}}` or :math:`S_{{kk\left(j\right)}}` is zero, :math:`R_{{jk}}` is set to zero. (#) The number of cases used in the calculation of each of the correlation coefficients: .. math:: c_{{jk}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}\text{, }\quad j,k = v_1,v_2,\ldots,v_p\text{.} (The diagonal terms, :math:`c_{{jj}}`, for :math:`j = v_1,v_2,\ldots,v_p`, also give the number of cases used in the calculation of the means, :math:`\bar{x}_j`, and the standard deviations, :math:`s_j`.) """ raise NotImplementedError
[docs]def coeffs_zero_subset(x, kvar): r""" ``coeffs_zero_subset`` computes means and standard deviations, sums of squares and cross-products about zero, and correlation-like coefficients for selected variables. .. _g02bk-py2-py-doc: For full information please refer to the NAG Library document for g02bk https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bkf.html .. _g02bk-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **kvar** : int, array-like, shape :math:`\left(\textit{nvars}\right)` :math:`\mathrm{kvar}[\textit{j}]` must be set to the column number in :math:`\mathrm{x}` of the :math:`\textit{j}`\ th variable for which information is required, for :math:`\textit{j} = 0,\ldots,p-1`. **Returns** **xbar** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The mean value, :math:`\bar{x}_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **std** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The standard deviation, :math:`s_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **sspz** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{sspz}[\textit{j}-1,\textit{k}-1]` is the cross-product about zero, :math:`\tilde{S}_{{\textit{j}\textit{k}}}`, for the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. **rz** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{rz}[\textit{j}-1,\textit{k}-1]` is the correlation-like coefficient, :math:`\tilde{R}_{{\textit{j}\textit{k}}}`, between the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. .. _g02bk-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\textit{nvars} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nvars} \geq 2` and :math:`\textit{nvars} \leq m`. (`errno` :math:`4`) On entry, :math:`\textit{i} = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{kvar}[\textit{i}-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{kvar}[\textit{i}-1]\leq m`. .. _g02bk-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable, together with the subset of these variables, :math:`v_1,v_2,\ldots,v_p`, for which information is required. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{{\sum_{{i = 1}}^nx_{{ij}}}}{n}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Standard deviations: .. math:: s_j = \sqrt{\frac{1}{{n-1}}\sum_{{i = 1}}^n\left(x_{{ij}}-\bar{x}_j\right)^2}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Sums of squares and cross-products about zero: .. math:: \tilde{S}_{{jk}} = \sum_{{i = 1}}^nx_{{ij}}x_{{ik}}\text{, }\quad j,k = v_1,v_2,\ldots,v_p\text{.} (#) Correlation-like coefficients: .. math:: \tilde{R}_{{jk}} = \frac{\tilde{S}_{{jk}}}{{\sqrt{\tilde{S}_{{jj}}\tilde{S}_{{kk}}}}}\text{, }\quad j,k = v_1,v_2,\ldots,v_p\text{.} If :math:`\tilde{S}_{{jj}}` or :math:`\tilde{S}_{{kk}}` is zero, :math:`\tilde{R}_{{jk}}` is set to zero. """ raise NotImplementedError
[docs]def coeffs_zero_subset_miss_case(x, miss, xmiss, mistyp, kvar): r""" ``coeffs_zero_subset_miss_case`` computes means and standard deviations, sums of squares and cross-products about zero, and correlation-like coefficients for selected variables omitting completely any cases with a missing observation for any variable (either over all variables in the dataset or over only those variables in the selected subset). .. _g02bl-py2-py-doc: For full information please refer to the NAG Library document for g02bl https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02blf.html .. _g02bl-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **miss** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{miss}[j-1]` must be set equal to :math:`1` if a missing value, :math:`xm_j`, is to be specified for the :math:`j`\ th variable in the array :math:`\mathrm{x}`, or set equal to :math:`0` otherwise. Values of :math:`\mathrm{miss}` must be given for all :math:`m` variables in the array :math:`\mathrm{x}`. **xmiss** : float, array-like, shape :math:`\left(m\right)` :math:`\mathrm{xmiss}[j-1]` must be set to the missing value, :math:`xm_j`, to be associated with the :math:`j`\ th variable in the array :math:`\mathrm{x}`, for those variables for which missing values are specified by means of the array :math:`\mathrm{miss}` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02blf.html#accuracy>`__). **mistyp** : int Indicates the manner in which missing observations are to be treated. :math:`\mathrm{mistyp} = 1` A case is excluded if it contains a missing value for any of the variables :math:`1,2,\ldots,m`. :math:`\mathrm{mistyp} = 0` A case is excluded if it contains a missing value for any of the :math:`p\left(\leq m\right)` variables specified in the array :math:`\mathrm{kvar}`. **kvar** : int, array-like, shape :math:`\left(\textit{nvars}\right)` :math:`\mathrm{kvar}[\textit{j}]` must be set to the column number in :math:`\mathrm{x}` of the :math:`\textit{j}`\ th variable for which information is required, for :math:`\textit{j} = 0,\ldots,p-1`. **Returns** **xbar** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The mean value, :math:`\bar{x}_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **std** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The standard deviation, :math:`s_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **sspz** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{sspz}[\textit{j}-1,\textit{k}-1]` is the cross-product about zero, :math:`\tilde{S}_{{\textit{j}\textit{k}}}`, for the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. **rz** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{rz}[\textit{j}-1,\textit{k}-1]` is the correlation-like coefficient, :math:`\tilde{R}_{{\textit{j}\textit{k}}}`, between the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. **ncases** : int The number of cases actually used in the calculations (when cases involving missing values have been eliminated). .. _g02bl-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\textit{nvars} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nvars} \geq 2` and :math:`\textit{nvars} \leq m`. (`errno` :math:`4`) On entry, :math:`\textit{i} = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{kvar}[\textit{i}-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{kvar}[\textit{i}-1]\leq m`. (`errno` :math:`5`) On entry, :math:`\mathrm{mistyp} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mistyp} = 0` or :math:`1`. (`errno` :math:`6`) After observations with missing values were omitted, no cases remained. (`errno` :math:`7`) After observations with missing values were omitted, only one case remained. .. _g02bl-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consist of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right)\text{ and }j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable, together with the subset of these variables, :math:`v_1,v_2,\ldots,v_p`, for which information is required. In addition, each of the :math:`m` variables may optionally have associated with it a value which is to be considered as representing a missing observation for that variable; the missing value for the :math:`j`\ th variable is denoted by :math:`\textit{xm}_j`. Missing values need not be specified for all variables. The missing values can be utilized in two slightly different ways, you can indicate which scheme is required. Firstly, let :math:`w_i = 0` if observation :math:`i` contains a missing value for any of those variables in the set :math:`1,2,\ldots,m` for which missing values have been declared, i.e., if :math:`x_{{ij}} = \textit{xm}_j` for any :math:`j` (:math:`j = 1,2,\ldots,m`) for which an :math:`\textit{xm}_j` has been assigned (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02blf.html#accuracy>`__); and :math:`w_i = 1` otherwise, for :math:`\textit{i} = 1,2,\ldots,n`. Secondly, let :math:`w_i = 0` if observation :math:`i` contains a missing value for any of those variables in the selected subset :math:`v_1,v_2,\ldots,v_p` for which missing values have been declared, i.e., if :math:`x_{{ij}} = \textit{xm}_j` for any :math:`j\left({j = v_1}, v_2, \ldots, v_p\right)` for which an :math:`\textit{xm}_j` has been assigned (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02blf.html#accuracy>`__); and :math:`w_i = 1` otherwise, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{{\sum_{{i = 1}}^nw_ix_{{ij}}}}{{\sum_{{i = 1}}^nw_i}}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Standard deviations: .. math:: S_j = \sqrt{\frac{{\sum_{{i = 1}}^nw_i\left(x_{{ij}}-\bar{x}_j\right)^2}}{{\sum_{{i = 1}}^nw_i-1}}}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Sums of squares and cross-products about zero: .. math:: \tilde{S}_{{jk}} = \sum_{{i = 1}}^nw_ix_{{ij}}x_{{ik\text{, }\quad j,k}} = v_1,v_2,\ldots,v_p\text{.} (#) Correlation-like coefficients: .. math:: \tilde{R}_{{jk}} = \frac{\tilde{S}_{{jk}}}{{\sqrt{\tilde{S}_{{jj}}\tilde{S}_{{kk}}}}}\text{, }\quad j,k = v_1,v_2,\ldots,v_p\text{.} If :math:`\tilde{S}_{{jj}}` or :math:`\tilde{S}_{{kk}}` is zero, :math:`\tilde{R}_{{jk}}` is set to zero. """ raise NotImplementedError
[docs]def coeffs_zero_subset_miss_pair(x, miss, xmiss, kvar): r""" ``coeffs_zero_subset_miss_pair`` computes means and standard deviations, sums of squares and cross-products about zero, and correlation-like coefficients for selected variables omitting cases with missing values from only those calculations involving the variables for which the values are missing. .. _g02bm-py2-py-doc: For full information please refer to the NAG Library document for g02bm https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bmf.html .. _g02bm-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **miss** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{miss}[j-1]` must be set equal to :math:`1` if a missing value, :math:`xm_j`, is to be specified for the :math:`j`\ th variable in the array :math:`\mathrm{x}`, or set equal to :math:`0` otherwise. Values of :math:`\mathrm{miss}` must be given for all :math:`m` variables in the array :math:`\mathrm{x}`. **xmiss** : float, array-like, shape :math:`\left(m\right)` :math:`\mathrm{xmiss}[j-1]` must be set to the missing value, :math:`xm_j`, to be associated with the :math:`j`\ th variable in the array :math:`\mathrm{x}`, for those variables for which missing values are specified by means of the array :math:`\mathrm{miss}` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bmf.html#accuracy>`__). **kvar** : int, array-like, shape :math:`\left(\textit{nvars}\right)` :math:`\mathrm{kvar}[\textit{j}]` must be set to the column number in :math:`\mathrm{x}` of the :math:`\textit{j}`\ th variable for which information is required, for :math:`\textit{j} = 0,\ldots,p-1`. **Returns** **xbar** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The mean value, :math:`\bar{x}_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **std** : float, ndarray, shape :math:`\left(\textit{nvars}\right)` The standard deviation, :math:`s_{\textit{j}}`, of the variable specified in :math:`\mathrm{kvar}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,p`. **sspz** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{sspz}[\textit{j}-1,\textit{k}-1]` is the cross-product about zero, :math:`\tilde{S}_{{\textit{j}\textit{k}}}`, for the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. **rz** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{rz}[\textit{j}-1,\textit{k}-1]` is the correlation-like coefficient, :math:`\tilde{R}_{{\textit{j}\textit{k}}}`, between the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. **ncases** : int The minimum number of cases used in the calculation of any of the sums of squares and cross-products and correlation-like coefficients (when cases involving missing values have been eliminated). **cnt** : float, ndarray, shape :math:`\left(\textit{nvars}, \textit{nvars}\right)` :math:`\mathrm{cnt}[\textit{j}-1,\textit{k}-1]` is the number of cases, :math:`c_{{\textit{j}\textit{k}}}`, actually used in the calculation of the sum of cross-product and correlation-like coefficient for the variables specified in :math:`\mathrm{kvar}[\textit{j}-1]` and :math:`\mathrm{kvar}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,p`, for :math:`\textit{j} = 1,2,\ldots,p`. .. _g02bm-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\textit{nvars} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nvars} \geq 2` and :math:`\textit{nvars} \leq m`. (`errno` :math:`4`) On entry, :math:`\textit{i} = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{kvar}[\textit{i}-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{kvar}[\textit{i}-1]\leq m`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`5`) After observations with missing values were omitted, fewer than two cases remained for at least one pair of variables. (The pairs of variables involved can be determined by examination of the contents of the array :math:`\mathrm{cnt}`). All means, standard deviations, sums of squares and cross-products, and correlation-like coefficients based on two or more cases are returned by the function even if :math:`\mathrm{errno}` = 5. .. _g02bm-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),j = 1,2,\ldots,m\quad \text{ }\quad \left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable, together with the subset of these variables, :math:`v_1,v_2,\ldots,v_p`, for which information is required. In addition, each of the :math:`m` variables may optionally have associated with it a value which is to be considered as representing a missing observation for that variable; the missing value for the :math:`j`\ th variable is denoted by :math:`\textit{xm}_j`. Missing values need not be specified for all variables. Let :math:`w_{{\textit{i}\textit{j}}} = 0`, if the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th variable is a missing value, i.e., if a missing value, :math:`\textit{xm}_{\textit{j}}`, has been declared for the :math:`\textit{j}`\ th variable, and :math:`x_{{\textit{i}\textit{j}}} = \textit{xm}_{\textit{j}}` (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bmf.html#accuracy>`__); and :math:`w_{{\textit{i}\textit{j}}} = 1` otherwise, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Means: .. math:: \bar{x}_j = \frac{{\sum_{{i = 1}}^nw_{{ij}}x_{{ij}}}}{{\sum_{{i = 1}}^nw_{{ij}}}}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Standard deviations: .. math:: s_j = \sqrt{\frac{{\sum_{{i = 1}}^nw_{{ij}}\left(x_{{ij}}-\bar{x}_j\right)^2}}{{\sum_{{i = 1}}^nw_{{ij}}-1}}}\text{, }\quad j = v_1,v_2,\ldots,v_p\text{.} (#) Sums of squares and cross-products about zero: .. math:: \tilde{S}_{{jk}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}x_{{ij}}x_{{ik}}\text{, }\quad j,k = v_1,v_2,\ldots,v_p\text{.} (#) Correlation-like coefficients: .. math:: \tilde{R}_{{jk}} = \frac{\tilde{S}_{{jk}}}{{\sqrt{\tilde{S}_{{jj\left(k\right)}}\tilde{S}_{{kk\left(j\right)}}}}}\text{, }\quad j,k = v_{{1,}}v_2,\ldots,v_p\text{,} where :math:`\tilde{S}_{{jj\left(k\right)}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}x_{{ij}}^2` and :math:`\tilde{S}_{{kk\left(j\right)}} = \sum_{{i = 1}}^nw_{{ik}}w_{{ij}}x_{{ik}}^2` (i.e., the sums of squares about zero are based on the same set of observations as are used in the calculation of the numerator). If :math:`\tilde{S}_{{jj\left(k\right)}}` or :math:`\tilde{S}_{{kk\left(j\right)}}` is zero, :math:`\tilde{R}_{{jk}}` is set to zero. (#) The number of cases used in the calculation of each of the correlation-like coefficients: .. math:: c_{{jk}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}\text{, }\quad j,k = v_1,v_2,\ldots,v_p\text{.} (The diagonal terms, :math:`c_{{\textit{j}\textit{j}}}`, for :math:`\textit{j} = 1,2,\ldots,n`, also give the number of cases used in the calculation of the means :math:`\bar{x}_{\textit{j}}` and the standard deviations :math:`s_{\textit{j}}`.) """ raise NotImplementedError
[docs]def coeffs_kspearman_overwrite(x, itype): r""" ``coeffs_kspearman_overwrite`` computes Kendall and/or Spearman nonparametric rank correlation coefficients for a set of data; the data array is overwritten with the ranks of the observations. .. _g02bn-py2-py-doc: For full information please refer to the NAG Library document for g02bn https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bnf.html .. _g02bn-py2-py-parameters: **Parameters** **x** : float, ndarray, shape :math:`\left(n, m\right)`, modified in place `On entry`: :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. `On exit`: :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` contains the rank :math:`y_{{\textit{i}\textit{j}}}` of the observation :math:`x_{{\textit{i}\textit{j}}}`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **itype** : int The type of correlation coefficients which are to be calculated. :math:`\mathrm{itype} = -1` Only Kendall's tau coefficients are calculated. :math:`\mathrm{itype} = 0` Both Kendall's tau and Spearman's coefficients are calculated. :math:`\mathrm{itype} = 1` Only Spearman's coefficients are calculated. **Returns** **rr** : float, ndarray, shape :math:`\left(m, m\right)` The requested correlation coefficients. If only Kendall's tau coefficients are requested (:math:`\mathrm{itype} = -1`), :math:`\mathrm{rr}[j-1,k-1]` contains Kendall's tau for the :math:`j`\ th and :math:`k`\ th variables. If only Spearman's coefficients are requested (:math:`\mathrm{itype} = 1`), :math:`\mathrm{rr}[j-1,k-1]` contains Spearman's rank correlation coefficient for the :math:`j`\ th and :math:`k`\ th variables. If both Kendall's tau and Spearman's coefficients are requested (:math:`\mathrm{itype} = 0`), the upper triangle of :math:`\mathrm{rr}` contains the Spearman coefficients and the lower triangle the Kendall coefficients. That is, for the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, where :math:`\textit{j}` is less than :math:`\textit{k}`, :math:`\mathrm{rr}[\textit{j}-1,\textit{k}-1]` contains the Spearman rank correlation coefficient, and :math:`\mathrm{rr}[\textit{k}-1,\textit{j}-1]` contains Kendall's tau, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. (Diagonal terms, :math:`\mathrm{rr}[j-1,j-1]`, are unity for all three values of :math:`\mathrm{itype}`.) .. _g02bn-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. (`errno` :math:`4`) On entry, :math:`\mathrm{itype} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{itype} = -1` or :math:`1`. .. _g02bn-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation of the :math:`j`\ th variable. The quantities calculated are: (a) Ranks For a given variable, :math:`j` say, each of the :math:`n` observations, :math:`x_{{1j}},x_{{2j}},\ldots,x_{{nj}}`, has associated with it an additional number, the 'rank' of the observation, which indicates the magnitude of that observation relative to the magnitudes of the other :math:`n-1` observations on that same variable. The smallest observation for variable :math:`j` is assigned the rank :math:`1`, the second smallest observation for variable :math:`j` the rank :math:`2`, the third smallest the rank :math:`3`, and so on until the largest observation for variable :math:`j` is given the rank :math:`n`. If a number of cases all have the same value for the given variable, :math:`j`, then they are each given an 'average' rank, e.g., if in attempting to assign the rank :math:`h+1`, :math:`k` observations were found to have the same value, then instead of giving them the ranks .. math:: h+1,h+2,\ldots,h+k\text{,} all :math:`k` observations would be assigned the rank .. math:: \frac{{2h+k+1}}{2} and the next value in ascending order would be assigned the rank .. math:: h+k+1\text{.} The process is repeated for each of the :math:`m` variables. Let :math:`y_{{ij}}` be the rank assigned to the observation :math:`x_{{ij}}` when the :math:`j`\ th variable is being ranked. The actual observations :math:`x_{{ij}}` are replaced by the ranks :math:`y_{{ij}}`. (#) Nonparametric rank correlation coefficients (i) Kendall's tau: .. math:: R_{{jk}} = \frac{{\sum_{{h = 1}}^n\sum_{{i = 1}}^n\mathrm{sign}\left(y_{{hj}}-y_{{ij}}\right)\mathrm{sign}\left(y_{{hk}}-y_{{ik}}\right)}}{{\sqrt{\left[n\left(n-1\right)-T_j\right]\left[n\left(n-1\right)-T_k\right]}}}\text{, }\quad j,k = 1,2,\ldots,m\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+----------------------------------------------------------+ |where|:math:`\mathrm{sign}\left(u\right) = 1` if :math:`u > 0`, | +-----+----------------------------------------------------------+ | |:math:`\mathrm{sign}\left(u\right) = 0` if :math:`u = 0`, | +-----+----------------------------------------------------------+ | |:math:`\mathrm{sign}\left(u\right) = -1` if :math:`u < 0`,| +-----+----------------------------------------------------------+ and :math:`T_j = \sum t_j\left(t_j-1\right)`, where :math:`t_j` is the number of ties of a particular value of variable :math:`j`, and the summation is over all tied values of variable :math:`j`. (#) Spearman's: .. math:: R_{{jk}}^* = \frac{{n\left(n^2-1\right)-6\sum_{{i = 1}}^n\left(y_{{ij}}-y_{{ik}}\right)^2-\frac{1}{2}\left(T_j^*+T_k^*\right)}}{{\sqrt{\left[n\left(n^2-1\right)-T_j^*\right]\left[n\left(n^2-1\right)-T_k^*\right]}}}\text{, }\quad j,k = 1,2,\ldots,m\text{,} where :math:`T_j^* = \sum t_j\left(t_j^2-1\right)`, :math:`t_j` being the number of ties of a particular value of variable :math:`j`, and the summation being over all tied values of variable :math:`j`. .. _g02bn-py2-py-references: **References** Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill """ raise NotImplementedError
[docs]def coeffs_kspearman_miss_case_overwrite(x, miss, xmiss, itype): r""" ``coeffs_kspearman_miss_case_overwrite`` computes Kendall and/or Spearman nonparametric rank correlation coefficients for a set of data omitting completely any cases with a missing observation for any variable; the data array is overwritten with the ranks of the observations. .. _g02bp-py2-py-doc: For full information please refer to the NAG Library document for g02bp https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bpf.html .. _g02bp-py2-py-parameters: **Parameters** **x** : float, ndarray, shape :math:`\left(n, m\right)`, modified in place `On entry`: :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. `On exit`: :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` contains the rank :math:`y_{{\textit{i}\textit{j}}}` of the observation :math:`x_{{\textit{i}\textit{j}}}`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. (For those observations containing missing values, and, therefore, excluded from the calculation, :math:`y_{{\textit{i}\textit{j}}} = 0`, for :math:`\textit{j} = 1,2,\ldots,m`.) **miss** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{miss}[j-1]` must be set to :math:`1` if a missing value, :math:`xm_j`, is to be specified for the :math:`j`\ th variable in the array :math:`\mathrm{x}`, or set equal to :math:`0` otherwise. Values of :math:`\mathrm{miss}` must be given for all :math:`m` variables in the array :math:`\mathrm{x}`. **xmiss** : float, array-like, shape :math:`\left(m\right)` :math:`\mathrm{xmiss}[j-1]` must be set to the missing value, :math:`xm_j`, to be associated with the :math:`j`\ th variable in the array :math:`\mathrm{x}`, for those variables for which missing values are specified by means of the array :math:`\mathrm{miss}` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bpf.html#accuracy>`__). **itype** : int The type of correlation coefficients which are to be calculated. :math:`\mathrm{itype} = -1` Only Kendall's tau coefficients are calculated. :math:`\mathrm{itype} = 0` Both Kendall's tau and Spearman's coefficients are calculated. :math:`\mathrm{itype} = 1` Only Spearman's coefficients are calculated. **Returns** **rr** : float, ndarray, shape :math:`\left(m, m\right)` The requested correlation coefficients. If only Kendall's tau coefficients are requested (:math:`\mathrm{itype} = -1`), :math:`\mathrm{rr}[j-1,k-1]` contains Kendall's tau for the :math:`j`\ th and :math:`k`\ th variables. If only Spearman's coefficients are requested (:math:`\mathrm{itype} = 1`), :math:`\mathrm{rr}[j-1,k-1]` contains Spearman's rank correlation coefficient for the :math:`j`\ th and :math:`k`\ th variables. If both Kendall's tau and Spearman's coefficients are requested (:math:`\mathrm{itype} = 0`), the upper triangle of :math:`\mathrm{rr}` contains the Spearman coefficients and the lower triangle the Kendall coefficients. That is, for the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, where :math:`\textit{j}` is less than :math:`\textit{k}`, :math:`\mathrm{rr}[\textit{j}-1,\textit{k}-1]` contains the Spearman rank correlation coefficient, and :math:`\mathrm{rr}[\textit{k}-1,\textit{j}-1]` contains Kendall's tau, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. (Diagonal terms, :math:`\mathrm{rr}[j-1,j-1]`, are unity for all three values of :math:`\mathrm{itype}`.) **ncases** : int The number of cases, :math:`n_{\mathrm{c}}`, actually used in the calculations (when cases involving missing values have been eliminated). **incase** : int, ndarray, shape :math:`\left(n\right)` :math:`\mathrm{incase}[\textit{i}-1]` holds the value :math:`1` if the :math:`\textit{i}`\ th case was included in the calculations, and the value :math:`0` if the :math:`\textit{i}`\ th case contained a missing value for at least one variable. That is, :math:`\mathrm{incase}[\textit{i}-1] = w_{\textit{i}}` (see :ref:`Notes <g02bp-py2-py-notes>`), for :math:`\textit{i} = 1,2,\ldots,n`. .. _g02bp-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. (`errno` :math:`4`) On entry, :math:`\mathrm{itype} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{itype} = -1` or :math:`1`. (`errno` :math:`5`) After observations with missing values were omitted, fewer than two cases remained. .. _g02bp-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\text{ }\left(n\geq 2\right),\quad \text{ }\quad j = 1,2,\ldots,m\text{ }\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable. In addition, each of the :math:`m` variables may optionally have associated with it a value which is to be considered as representing a missing observation for that variable; the missing value for the :math:`j`\ th variable is denoted by :math:`\textit{xm}_j`. Missing values need not be specified for all variables. Let :math:`w_i = 0` if observation :math:`i` contains a missing value for any of those variables for which missing values have been declared; i.e., if :math:`x_{{ij}} = \textit{xm}_j` for any :math:`j` for which an :math:`\textit{xm}_j` has been assigned (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bpf.html#accuracy>`__); and :math:`w_i = 1` otherwise, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Ranks For a given variable, :math:`j` say, each of the observations :math:`x_{{ij}}` for which :math:`w_i = 1`, for :math:`\textit{i} = 1,2,\ldots,n`, has associated with it an additional number, the 'rank' of the observation, which indicates the magnitude of that observation relative to the magnitudes of the other observations on that same variable for which :math:`w_i = 1`. The smallest of these valid observations for variable :math:`j` is assigned the rank :math:`1`, the second smallest observation for variable :math:`j` the rank :math:`2`, the third smallest the rank :math:`3`, and so on until the largest such observation is given the rank :math:`n_c`, where :math:`n_c = \sum_{{i = 1}}^nw_i`. If a number of cases all have the same value for the given variable, :math:`j`, then they are each given an 'average' rank, e.g., if in attempting to assign the rank :math:`h+1`, :math:`k` observations for which :math:`w_i = 1` were found to have the same value, then instead of giving them the ranks .. math:: h+1,h+2,\ldots,h+k\text{,} all :math:`k` observations would be assigned the rank .. math:: \frac{{2h+k+1}}{2} and the next value in ascending order would be assigned the rank .. math:: h+k+1\text{.} The process is repeated for each of the :math:`m` variables. Let :math:`y_{{ij}}` be the rank assigned to the observation :math:`x_{{ij}}` when the :math:`j`\ th variable is being ranked. For those observations, :math:`i`, for which :math:`w_i = 0`, :math:`y_{{ij}} = 0`, for :math:`j = 1,2,\ldots,m`. The actual observations :math:`x_{{ij}}` are replaced by the ranks :math:`y_{{ij}}`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. (#) Nonparametric rank correlation coefficients (i) Kendall's tau: .. math:: R_{{jk}} = \frac{{\sum_{{h = 1}}^n\sum_{{i = 1}}^nw_hw_i\mathrm{sign}\left(y_{{hj}}-y_{{ij}}\right)\mathrm{sign}\left(y_{{hk}}-y_{{ik}}\right)}}{{\sqrt{\left[n_c\left(n_c-1\right)-T_j\right]\left[n_c\left(n_c-1\right)-T_k\right]}}}\text{, }\quad j,k = 1,2,\ldots,m\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+---------------------------------------------------------+ |where|:math:`n_c = \sum_{{i = 1}}^nw_i` | +-----+---------------------------------------------------------+ |and |:math:`\mathrm{sign}\left(u\right) = 1` if :math:`u > 0` | +-----+---------------------------------------------------------+ | |:math:`\mathrm{sign}\left(u\right) = 0` if :math:`u = 0` | +-----+---------------------------------------------------------+ | |:math:`\mathrm{sign}\left(u\right) = -1` if :math:`u < 0`| +-----+---------------------------------------------------------+ and :math:`T_j = \sum t_j\left(t_j-1\right)` where :math:`t_j` is the number of ties of a particular value of variable :math:`j`, and the summation is over all tied values of variable :math:`j`. (#) Spearman's: .. math:: R_{{jk}}^* = \frac{{n_c\left(n_c^2-1\right)-6\sum_{{i = 1}}^nw_i\left(y_{{ij}}-y_{{ik}}\right)^2-\frac{1}{2}\left(T_j^*+T_k^*\right)}}{{\sqrt{\left[n_c\left(n_c^2-1\right)-T_j^*\right]\left[n_c\left(n_c^2-1\right)-T_k^*\right]}}}\text{, }\quad j,k = 1,2,\ldots,m\text{,} where :math:`n_c = \sum_{{i = 1}}^nw_i` and :math:`T_j^* = \sum t_j\left(t_j^2-1\right)` where :math:`t_j` is the number of ties of a particular value of variable :math:`j`, and the summation is over all tied values of variable :math:`j`. .. _g02bp-py2-py-references: **References** Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill """ raise NotImplementedError
[docs]def coeffs_kspearman(x, itype): r""" ``coeffs_kspearman`` computes Kendall and/or Spearman nonparametric rank correlation coefficients for a set of data; the data array is preserved, and the ranks of the observations are not available on exit from the function. .. _g02bq-py2-py-doc: For full information please refer to the NAG Library document for g02bq https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bqf.html .. _g02bq-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to data value :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **itype** : int The type of correlation coefficients which are to be calculated. :math:`\mathrm{itype} = -1` Only Kendall's tau coefficients are calculated. :math:`\mathrm{itype} = 0` Both Kendall's tau and Spearman's coefficients are calculated. :math:`\mathrm{itype} = 1` Only Spearman's coefficients are calculated. **Returns** **rr** : float, ndarray, shape :math:`\left(m, m\right)` The requested correlation coefficients. If only Kendall's tau coefficients are requested (:math:`\mathrm{itype} = -1`), :math:`\mathrm{rr}[j-1,k-1]` contains Kendall's tau for the :math:`j`\ th and :math:`k`\ th variables. If only Spearman's coefficients are requested (:math:`\mathrm{itype} = 1`), :math:`\mathrm{rr}[j-1,k-1]` contains Spearman's rank correlation coefficient for the :math:`j`\ th and :math:`k`\ th variables. If both Kendall's tau and Spearman's coefficients are requested (:math:`\mathrm{itype} = 0`), the upper triangle of :math:`\mathrm{rr}` contains the Spearman coefficients and the lower triangle the Kendall coefficients. That is, for the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, where :math:`\textit{j}` is less than :math:`\textit{k}`, :math:`\mathrm{rr}[\textit{j}-1,\textit{k}-1]` contains the Spearman rank correlation coefficient, and :math:`\mathrm{rr}[\textit{k}-1,\textit{j}-1]` contains Kendall's tau, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. (Diagonal terms, :math:`\mathrm{rr}[j-1,j-1]`, are unity for all three values of :math:`\mathrm{itype}`.) .. _g02bq-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. (`errno` :math:`4`) On entry, :math:`\mathrm{itype} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{itype} = -1` or :math:`1`. .. _g02bq-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right),j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable. The observations are first ranked, as follows. For a given variable, :math:`j` say, each of the :math:`n` observations, :math:`x_{{1j}},x_{{2j}},\ldots,x_{{nj}}`, has associated with it an additional number, the 'rank' of the observation, which indicates the magnitude of that observation relative to the magnitude of the other :math:`n-1` observations on that same variable. The smallest observation for variable :math:`j` is assigned the rank :math:`1`, the second smallest observation for variable :math:`j` the rank :math:`2`, the third smallest the rank :math:`3`, and so on until the largest observation for variable :math:`j` is given the rank :math:`n`. If a number of cases all have the same value for the given variable, :math:`j`, then they are each given an 'average' rank -- e.g., if in attempting to assign the rank :math:`h+1`, :math:`k` observations were found to have the same value, then instead of giving them the ranks .. math:: h+1,h+2,\ldots,h+k\text{,} all :math:`k` observations would be assigned the rank .. math:: \frac{{2h+k+1}}{2} and the next value in ascending order would be assigned the rank .. math:: h+k+1\text{.} The process is repeated for each of the :math:`m` variables. Let :math:`y_{{ij}}` be the rank assigned to the observation :math:`x_{{ij}}` when the :math:`j`\ th variable is being ranked. The quantities calculated are: (a) Kendall's tau rank correlation coefficients: .. math:: R_{{jk}} = \frac{{\sum_{{h = 1}}^n\sum_{{i = 1}}^n\mathrm{sign}\left(y_{{hj}}-y_{{ij}}\right)\mathrm{sign}\left(y_{{hk}}-y_{{ik}}\right)}}{{\sqrt{\left[n\left(n-1\right)-T_j\right]\left[n\left(n-1\right)-T_k\right]}}}\text{, }\quad j,k = 1,2,\ldots,m\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+---------------------------------------------------------+ |where|:math:`\mathrm{sign}\left(u\right) = 1` if :math:`u > 0` | +-----+---------------------------------------------------------+ | |:math:`\mathrm{sign}\left(u\right) = 0` if :math:`u = 0` | +-----+---------------------------------------------------------+ | |:math:`\mathrm{sign}\left(u\right) = -1` if :math:`u < 0`| +-----+---------------------------------------------------------+ and :math:`T_j = \sum t_j\left(t_j-1\right)`, :math:`t_j` being the number of ties of a particular value of variable :math:`j`, and the summation being over all tied values of variable :math:`j`. (#) Spearman's rank correlation coefficients: .. math:: R_{{jk}}^* = \frac{{n\left(n^2-1\right)-6\sum_{{i = 1}}^n\left(y_{{ij}}-y_{{ik}}\right)^2-\frac{1}{2}\left(T_j^*+T_k^*\right)}}{{\sqrt{\left[n\left(n^2-1\right)-T_j^*\right]\left[n\left(n^2-1\right)-T_k^*\right]}}}\text{, }\quad j,k = 1,2,\ldots,m\text{,} where :math:`T_j^* = \sum t_j\left(t_j^2-1\right)` where :math:`t_j` is the number of ties of a particular value of variable :math:`j`, and the summation is over all tied values of variable :math:`j`. .. _g02bq-py2-py-references: **References** Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill """ raise NotImplementedError
[docs]def coeffs_kspearman_miss_case(x, miss, xmiss, itype): r""" ``coeffs_kspearman_miss_case`` computes Kendall and/or Spearman nonparametric rank correlation coefficients for a set of data, omitting completely any cases with a missing observation for any variable; the data array is preserved, and the ranks of the observations are not available on exit from the function. .. _g02br-py2-py-doc: For full information please refer to the NAG Library document for g02br https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02brf.html .. _g02br-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[i-1,j-1]` must be set to :math:`x_{{ij}}`, the value of the :math:`i`\ th observation on the :math:`j`\ th variable, where :math:`i = 1,2,\ldots,n` and :math:`j = 1,2,\ldots,m\text{.}` **miss** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{miss}[j-1]` must be set equal to :math:`1` if a missing value, :math:`xm_j`, is to be specified for the :math:`j`\ th variable in the array :math:`\mathrm{x}`, or set equal to :math:`0` otherwise. Values of :math:`\mathrm{miss}` must be given for all :math:`m` variables in the array :math:`\mathrm{x}`. **xmiss** : float, array-like, shape :math:`\left(m\right)` :math:`\mathrm{xmiss}[j-1]` must be set to the missing value, :math:`xm_j`, to be associated with the :math:`j`\ th variable in the array :math:`\mathrm{x}`, for those variables for which missing values are specified by means of the array :math:`\mathrm{miss}` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02brf.html#accuracy>`__). **itype** : int The type of correlation coefficients which are to be calculated. :math:`\mathrm{itype} = -1` Only Kendall's tau coefficients are calculated. :math:`\mathrm{itype} = 0` Both Kendall's tau and Spearman's coefficients are calculated. :math:`\mathrm{itype} = 1` Only Spearman's coefficients are calculated. **Returns** **rr** : float, ndarray, shape :math:`\left(m, m\right)` The requested correlation coefficients. If only Kendall's tau coefficients are requested (:math:`\mathrm{itype} = -1`), :math:`\mathrm{rr}[j-1,k-1]` contains Kendall's tau for the :math:`j`\ th and :math:`k`\ th variables. If only Spearman's coefficients are requested (:math:`\mathrm{itype} = 1`), :math:`\mathrm{rr}[j-1,k-1]` contains Spearman's rank correlation coefficient for the :math:`j`\ th and :math:`k`\ th variables. If both Kendall's tau and Spearman's coefficients are requested (:math:`\mathrm{itype} = 0`), the upper triangle of :math:`\mathrm{rr}` contains the Spearman coefficients and the lower triangle the Kendall coefficients. That is, for the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, where :math:`\textit{j}` is less than :math:`\textit{k}`, :math:`\mathrm{rr}[\textit{j}-1,\textit{k}-1]` contains the Spearman rank correlation coefficient, and :math:`\mathrm{rr}[\textit{k}-1,\textit{j}-1]` contains Kendall's tau, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. (Diagonal terms, :math:`\mathrm{rr}[j-1,j-1]`, are unity for all three values of :math:`\mathrm{itype}`.) **ncases** : int The number of cases, :math:`n_{\mathrm{c}}`, actually used in the calculations (when cases involving missing values have been eliminated). **incase** : int, ndarray, shape :math:`\left(n\right)` :math:`\mathrm{incase}[\textit{i}-1]` holds the value :math:`1` if the :math:`\textit{i}`\ th case was included in the calculations, and the value :math:`0` if the :math:`\textit{i}`\ th case contained a missing value for at least one variable. That is, :math:`\mathrm{incase}[\textit{i}-1] = w_{\textit{i}}` (see :ref:`Notes <g02br-py2-py-notes>`), for :math:`\textit{i} = 1,2,\ldots,n`. .. _g02br-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. (`errno` :math:`4`) On entry, :math:`\mathrm{itype} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{itype} = -1` or :math:`1`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`5`) After observations with missing values were omitted, fewer than two cases remained. .. _g02br-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\text{}\left(n\geq 2\right), j = 1,2,\ldots,m\text{}\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable. In addition, each of the :math:`m` variables may optionally have associated with it a value which is to be considered as representing a missing observation for that variable; the missing value for the :math:`j`\ th variable is denoted by :math:`\textit{xm}_j`. Missing values need not be specified for all variables. Let :math:`w_i = 0` if observation :math:`i` contains a missing value for any of those variables for which missing values have been declared, i.e., if :math:`x_{{ij}} = \textit{xm}_j` for any :math:`j` for which an :math:`\textit{xm}_j` has been assigned (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02brf.html#accuracy>`__); and :math:`w_i = 1` otherwise, for :math:`\textit{i} = 1,2,\ldots,n`. The observations are first ranked as follows. For a given variable, :math:`j` say, each of the observations :math:`x_{{ij}}` for which :math:`w_i = 1`, (:math:`i = 1,2,\ldots,n`) has associated with it an additional number, the 'rank' of the observation, which indicates the magnitude of that observation relative to the magnitudes of the other observations on that same variable for which :math:`w_i = 1`. The smallest of these valid observations for variable :math:`j` is assigned the rank :math:`1`, the second smallest observation for variable :math:`j` the rank :math:`2`, the third smallest the rank :math:`3`, and so on until the largest such observation is given the rank :math:`n_c`, where :math:`n_c = \sum_{{i = 1}}^nw_i`. If a number of cases all have the same value for the given variable, :math:`j`, then they are each given an 'average' rank, e.g., if in attempting to assign the rank :math:`h+1`, :math:`k` observations for which :math:`w_i = 1` were found to have the same value, then instead of giving them the ranks .. math:: h+1,h+2,\ldots,h+k\text{,} all :math:`k` observations would be assigned the rank .. math:: \frac{{2h+k+1}}{2} and the next value in ascending order would be assigned the rank .. math:: h+k+1\text{.} The process is repeated for each of the :math:`m` variables. Let :math:`y_{{ij}}` be the rank assigned to the observation :math:`x_{{ij}}` when the :math:`j`\ th variable is being ranked. For those observations, :math:`i`, for which :math:`w_i = 0`, :math:`y_{{ij}} = 0`, for :math:`j = 1,2,\ldots,m`. The quantities calculated are: (a) Kendall's tau rank correlation coefficients: .. math:: R_{{jk}} = \frac{{\sum_{{h = 1}}^n\sum_{{i = 1}}^nw_hw_i\mathrm{sign}\left(y_{{hj}}-y_{{ij}}\right)\mathrm{sign}\left(y_{{hk}}-y_{{ik}}\right)}}{{\sqrt{\left[n_c\left(n_c-1\right)-T_j\right]\left[n_c\left(n_c-1\right)-T_k\right]}}}\text{, }\quad j,k = 1,2,\ldots,m\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+---------------------------------------------------------+ |where|:math:`n_c = \sum_{{i = 1}}^nw_i` | +-----+---------------------------------------------------------+ |and |:math:`\mathrm{sign}\left(u\right) = 1` if :math:`u > 0` | +-----+---------------------------------------------------------+ | |:math:`\mathrm{sign}\left(u\right) = 0` if :math:`u = 0` | +-----+---------------------------------------------------------+ | |:math:`\mathrm{sign}\left(u\right) = -1` if :math:`u < 0`| +-----+---------------------------------------------------------+ and :math:`T_j = \sum t_j\left(t_j-1\right)` where :math:`t_j` is the number of ties of a particular value of variable :math:`j`, and the summation is over all tied values of variable :math:`j`. (#) Spearman's rank correlation coefficients: .. math:: R_{{jk}}^* = \frac{{n_c\left(n_c^2-1\right)-6\sum_{{i = 1}}^nw_i\left(y_{{ij}}-y_{{ik}}\right)^2-\frac{1}{2}\left(T_j^*+T_k^*\right)}}{{\sqrt{\left[n_c\left(n_c^2-1\right)-T_j^*\right]\left[n_c\left(n_c^2-1\right)-T_k^*\right]}}}\text{, }\quad j,k = 1,2,\ldots,m\text{,} where :math:`n_c = \sum_{{i = 1}}^nw_i` and :math:`T_j^* = \sum t_j\left(t_j^2-1\right)` where :math:`t_j` is the number of ties of a particular value of variable :math:`j`, and the summation is over all tied values of variable :math:`j`. .. _g02br-py2-py-references: **References** Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill See Also -------- :meth:`naginterfaces.library.examples.correg.coeffs_kspearman_miss_case_ex.main` """ raise NotImplementedError
[docs]def coeffs_kspearman_miss_pair(x, miss, xmiss, itype): r""" ``coeffs_kspearman_miss_pair`` computes Kendall and/or Spearman nonparametric rank correlation coefficients for a set of data omitting cases with missing values from only those calculations involving the variables for which the values are missing; the data array is preserved, and the ranks of the observations are not available on exit from the function. .. _g02bs-py2-py-doc: For full information please refer to the NAG Library document for g02bs https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bsf.html .. _g02bs-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must be set to :math:`x_{{\textit{i}\textit{j}}}`, the value of the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **miss** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{miss}[j-1]` must be set equal to :math:`1` if a missing value, :math:`xm_j`, is to be specified for the :math:`j`\ th variable in the array :math:`\mathrm{x}`, or set equal to :math:`0` otherwise. Values of :math:`\mathrm{miss}` must be given for all :math:`m` variables in the array :math:`\mathrm{x}`. **xmiss** : float, array-like, shape :math:`\left(m\right)` :math:`\mathrm{xmiss}[j-1]` must be set to the missing value, :math:`xm_j`, to be associated with the :math:`j`\ th variable in the array :math:`\mathrm{x}`, for those variables for which missing values are specified by means of the array :math:`\mathrm{miss}` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bsf.html#accuracy>`__). **itype** : int The type of correlation coefficients which are to be calculated. :math:`\mathrm{itype} = -1` Only Kendall's tau coefficients are calculated. :math:`\mathrm{itype} = 0` Both Kendall's tau and Spearman's coefficients are calculated. :math:`\mathrm{itype} = 1` Only Spearman's coefficients are calculated. **Returns** **rr** : float, ndarray, shape :math:`\left(m, m\right)` The requested correlation coefficients. If only Kendall's tau coefficients are requested (:math:`\mathrm{itype} = -1`), :math:`\mathrm{rr}[j-1,k-1]` contains Kendall's tau for the :math:`j`\ th and :math:`k`\ th variables. If only Spearman's coefficients are requested (:math:`\mathrm{itype} = 1`), :math:`\mathrm{rr}[j-1,k-1]` contains Spearman's rank correlation coefficient for the :math:`j`\ th and :math:`k`\ th variables. If both Kendall's tau and Spearman's coefficients are requested (:math:`\mathrm{itype} = 0`), the upper triangle of :math:`\mathrm{rr}` contains the Spearman coefficients and the lower triangle the Kendall coefficients. That is, for the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, where :math:`\textit{j}` is less than :math:`\textit{k}`, :math:`\mathrm{rr}[\textit{j}-1,\textit{k}-1]` contains the Spearman rank correlation coefficient, and :math:`\mathrm{rr}[\textit{k}-1,\textit{j}-1]` contains Kendall's tau, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. (Diagonal terms, :math:`\mathrm{rr}[j-1,j-1]`, are unity for all three values of :math:`\mathrm{itype}`.) **ncases** : int The minimum number of cases used in the calculation of any of the correlation coefficients (when cases involving missing values have been eliminated). **cnt** : float, ndarray, shape :math:`\left(m, m\right)` The number of cases, :math:`n_{{\textit{j}\textit{k}}}`, actually used in the calculation of the rank correlation coefficient for the :math:`\textit{j}`\ th and :math:`\textit{k}`\ th variables, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. .. _g02bs-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. (`errno` :math:`4`) On entry, :math:`\mathrm{itype} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{itype} = -1` or :math:`1`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`5`) After observations with missing values were omitted, fewer than two cases remained for at least one pair of variables. (The pairs of variables involved can be determined by examination of the contents of the array :math:`\mathrm{cnt}`). All means, standard deviations, sums of squares and cross-products, and correlation-like coefficients based on two or more cases are returned by the function even if :math:`\mathrm{errno}` = 5. .. _g02bs-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` The input data consists of :math:`n` observations for each of :math:`m` variables, given as an array .. math:: \left[x_{{ij}}\right]\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right)\text{ and }j = 1,2,\ldots,m\left(m\geq 2\right)\text{,} where :math:`x_{{ij}}` is the :math:`i`\ th observation on the :math:`j`\ th variable. In addition each of the :math:`m` variables may optionally have associated with it a value which is to be considered as representing a missing observation for that variable; the missing value for the :math:`j`\ th variable is denoted by :math:`\textit{xm}_j`. Missing values need not be specified for all variables. Let :math:`w_{{\textit{i}\textit{j}}} = 0` if the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th variable is a missing value, i.e., if a missing value, :math:`\textit{xm}_{\textit{j}}`, has been declared for the :math:`\textit{j}`\ th variable, and :math:`x_{{\textit{i}\textit{j}}} = \textit{xm}_{\textit{j}}` (see also `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bsf.html#accuracy>`__); and :math:`w_{{\textit{i}\textit{j}}} = 1` otherwise, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. The observations are first ranked, a pair of variables at a time as follows: For a given pair of variables, :math:`j` and :math:`l` say, each of the observations :math:`x_{{\textit{i}j}}` for which the product :math:`w_{{\textit{i}j}}w_{{\textit{i}l}} = 1`, for :math:`\textit{i} = 1,2,\ldots,n`, has associated with it an additional number, the 'rank' of the observation, which indicates the magnitude of that observation relative to the magnitude of the other observations on variable :math:`j` for which :math:`w_{{ij}}w_{{il}} = 1`. The smallest of these valid observations for variable :math:`j` is assigned to rank :math:`1`, the second smallest valid observation for variable :math:`j` the rank :math:`2`, the third smallest rank :math:`3`, and so on until the largest such observation is given the rank :math:`n_{{jl}}`, where .. math:: n_{{jl}} = \sum_{{i = 1}}^nw_{{ij}}w_{{il}}\text{.} If a number of cases all have the same value for the variable :math:`j`, then they are each given an 'average' rank, e.g., if in attempting to assign the rank :math:`h+1`, :math:`k` observations for which :math:`w_{{ij}}w_{{il}} = 1` were found to have the same value, then instead of giving them the ranks .. math:: h+1,h+2,\ldots,h+k\text{,} all :math:`k` observations would be assigned the rank .. math:: \frac{{2h+k+1}}{2} and the next value in ascending order would be assigned the rank .. math:: h+k+1\text{.} The variable :math:`\textit{l}` is then ranked in a similar way. The process is then repeated for all pairs of variables :math:`\textit{j}` and :math:`\textit{l}`, for :math:`\textit{l} = \textit{j},\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. Let :math:`y_{{\textit{i}\textit{j}\left(\textit{l}\right)}}` be the rank assigned to the observation :math:`x_{{\textit{i}\textit{j}}}` when the :math:`\textit{j}`\ th and :math:`\textit{l}`\ th variables are being ranked, and :math:`y_{{\textit{i}\textit{l}\left(\textit{j}\right)}}` be the rank assigned to the observation :math:`x_{{\textit{i}\textit{l}}}` during the same process, for :math:`\textit{l} = j,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Kendall's tau rank correlation coefficients: .. math:: R_{{jk}} = \frac{{\sum_{{h = 1}}^n\sum_{{i = 1}}^nw_{{hj}}w_{{hk}}w_{{ij}}w_{{ik}}\mathrm{sign}\left(y_{{hj\left(k\right)}}-y_{{ij\left(k\right)}}\right)\mathrm{sign}\left(y_{{hk\left(j\right)}}-y_{{ik\left(j\right)}}\right)}}{{\sqrt{\left[n_{{jk}}\left(n_{{jk}}-1\right)-T_{{j\left(k\right)}}\right]\left[n_{{jk}}\left(n_{{jk}}-1\right)-T_{{k\left(j\right)}}\right]}}}\text{, }\quad j,k = 1,2,\ldots,m\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+---------------------------------------------------------+ |where|:math:`n_{{jk}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}` | +-----+---------------------------------------------------------+ |and |:math:`\mathrm{sign}\left(u\right) = 1` if :math:`u > 0` | +-----+---------------------------------------------------------+ | |:math:`\mathrm{sign}\left(u\right) = 0` if :math:`u = 0` | +-----+---------------------------------------------------------+ | |:math:`\mathrm{sign}\left(u\right) = -1` if :math:`u < 0`| +-----+---------------------------------------------------------+ and :math:`T_{{j\left(k\right)}} = \sum t_j\left(t_j-1\right)` where :math:`t_j` is the number of ties of a particular value of variable :math:`j` when the :math:`j`\ th and :math:`k`\ th variables are being ranked, and the summation is over all tied values of variable :math:`j`. (#) Spearman's rank correlation coefficients: .. math:: R_{{jk}}^* = \frac{{n_{{jk}}\left(n_{{jk}}^2-1\right)-6\sum_{{i = 1}}^nw_{{ij}}w_{{ik}}{\left(y_{{ij\left(k\right)}}-y_{{ik\left(j\right)}}\right)}^2-\frac{1}{2}\left(T_{{j\left(k\right)}}^*+T_{{k\left(j\right)}}^*\right)}}{{\sqrt{\left[n_{{jk}}\left(n_{{jk}}^2-1\right)-T_{{j\left(k\right)}}^*\right]\left[n_{{jk}}\left(n_{{jk}}^2-1\right)-T_{{k\left(j\right)}}^*\right]}}}\text{, }\quad j,k = 1,2,\ldots,m\text{,} where :math:`n_{{jk}} = \sum_{{i = 1}}^nw_{{ij}}w_{{ik}}` and :math:`T_{{j\left(k\right)}}^* = \sum t_j\left(t_j^2-1\right)`, where :math:`t_j` is the number of ties of a particular value of variable :math:`j` when the :math:`j`\ th and :math:`k`\ th variables are being ranked, and the summation is over all tied values of variable :math:`j`. .. _g02bs-py2-py-references: **References** Siegel, S, 1956, `Non-parametric Statistics for the Behavioral Sciences`, McGraw--Hill """ raise NotImplementedError
[docs]def ssqmat_update(wt, x, sw, xbar, c, mean='M'): r""" ``ssqmat_update`` updates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean, for a new observation. The data may be weighted. .. _g02bt-py2-py-doc: For full information please refer to the NAG Library document for g02bt https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02btf.html .. _g02bt-py2-py-parameters: **Parameters** **wt** : float The weight to use for the current observation, :math:`w_i`. For unweighted means and cross-products set :math:`\mathrm{wt} = 1.0`. The use of a suitable negative value of :math:`\mathrm{wt}`, e.g., :math:`{-w_i}` will have the effect of deleting the observation. **x** : float, array-like, shape :math:`\left(m\times \textit{incx}\right)` :math:`\mathrm{x}[\left(j-1\right)\times \textit{incx}]` must contain the value of the :math:`j`\ th variable for the current observation, :math:`j = 1,2,\ldots,m`. **sw** : float The sum of weights for the previous observations, :math:`W_{{i-1}}`. :math:`\mathrm{sw} = 0.0` The update procedure is initialized. :math:`\mathrm{sw}+\mathrm{wt} = 0.0` All elements of :math:`\mathrm{xbar}` and :math:`\mathrm{c}` are set to zero. **xbar** : float, array-like, shape :math:`\left(m\right)` If :math:`\mathrm{sw} = 0.0`, :math:`\mathrm{xbar}` is initialized, otherwise :math:`\mathrm{xbar}[\textit{j}-1]` must contain the weighted mean of the :math:`\textit{j}`\ th variable for the previous :math:`\left(\textit{i}-1\right)` observations, :math:`\bar{x}_{\textit{j}}\left(\textit{i}-1\right)`, for :math:`\textit{j} = 1,2,\ldots,m`. **c** : float, array-like, shape :math:`\left(\left(m\times m+m\right)/2\right)` If :math:`\mathrm{sw}\neq 0.0`, :math:`\mathrm{c}` must contain the upper triangular part of the matrix of weighted sums of squares and cross-products or weighted sums of squares and cross-products of deviations about the mean. It is stored packed form by column, i.e., the cross-product between the :math:`j`\ th and :math:`k`\ th variable, :math:`k\geq j`, is stored in :math:`\mathrm{c}[k\times \left(k-1\right)/2+j-1]`. **mean** : str, length 1, optional Indicates whether ``ssqmat_update`` is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean. :math:`\mathrm{mean} = \texttt{'M'}` The sums of squares and cross-products of deviations about the mean are calculated. :math:`\mathrm{mean} = \texttt{'Z'}` The sums of squares and cross-products are calculated. **Returns** **sw** : float Contains the updated sum of weights, :math:`W_i`. **xbar** : float, ndarray, shape :math:`\left(m\right)` :math:`\mathrm{xbar}[\textit{j}-1]` contains the weighted mean of the :math:`\textit{j}`\ th variable, :math:`\bar{x}_{\textit{j}}\left(\textit{i}\right)`, for :math:`\textit{j} = 1,2,\ldots,m`. **c** : float, ndarray, shape :math:`\left(\left(m\times m+m\right)/2\right)` The update sums of squares and cross-products stored as on input. .. _g02bt-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{incx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{incx} \geq 1`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{sw} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{sw}\geq 0.0`. (`errno` :math:`3`) On entry, :math:`{\left(\mathrm{sw}+\mathrm{wt}\right)} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`{\left(\mathrm{sw}+\mathrm{wt}\right)} \geq 0.0`. (`errno` :math:`4`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. .. _g02bt-py2-py-notes: **Notes** ``ssqmat_update`` is an adaptation of West's WV2 algorithm; see West (1979). This function updates the weighted means of variables and weighted sums of squares and cross-products or weighted sums of squares and cross-products of deviations about the mean for observations on :math:`m` variables :math:`X_j`, for :math:`j = 1,2,\ldots,m`. For the first :math:`i-1` observations let the mean of the :math:`j`\ th variable be :math:`\bar{x}_j\left(i-1\right)`, the cross-product about the mean for the :math:`j`\ th and :math:`k`\ th variables be :math:`c_{{jk}}\left(i-1\right)` and the sum of weights be :math:`W_{{i-1}}`. These are updated by the :math:`i`\ th observation, :math:`x_{{ij}}`, for :math:`\textit{j} = 1,2,\ldots,m`, with weight :math:`w_i` as follows: .. math:: W_i = W_{{i-1}}+w_i\text{, }\quad \bar{x}_j\left(i\right) = \bar{x}_j\left(i-1\right)+\frac{w_i}{W_i}\left(x_j-\bar{x}_j\left(i-1\right)\right)\text{, }\quad j = 1,2,\ldots,m and .. math:: c_{{jk}}\left(i\right) = c_{{jk}}\left(i-1\right)+\frac{w_i}{W_i}\left(x_j-\bar{x}_j\left(i-1\right)\right)\left(x_k-\bar{x}_k\left(i-1\right)\right)W_{{i-1}}\text{, }\quad j = 1,2,\ldots,m\text{;}k = j,j+1,2,\ldots,m\text{.} The algorithm is initialized by taking :math:`\bar{x}_j\left(1\right) = x_{{1j}}`, the first observation and :math:`c_{{ij}}\left(1\right) = 0.0`. For the unweighted case :math:`w_i = 1` and :math:`W_i = i` for all :math:`i`. .. _g02bt-py2-py-references: **References** Chan, T F, Golub, G H and Leveque, R J, 1982, `Updating Formulae and a Pairwise Algorithm for Computing Sample Variances`, Compstat, Physica-Verlag West, D H D, 1979, `Updating mean and variance estimates: An improved method`, Comm. ACM (22), 532--555 """ raise NotImplementedError
[docs]def ssqmat(x, mean='M', wt=None): r""" ``ssqmat`` calculates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations from the mean, in a single pass for a set of data. The data may be weighted. .. _g02bu-py2-py-doc: For full information please refer to the NAG Library document for g02bu https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02buf.html .. _g02bu-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **mean** : str, length 1, optional Indicates whether ``ssqmat`` is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean. :math:`\mathrm{mean} = \texttt{'M'}` The sums of squares and cross-products of deviations about the mean are calculated. :math:`\mathrm{mean} = \texttt{'Z'}` The sums of squares and cross-products are calculated. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional The optional weights of each observation. If weights are not provided then :math:`\mathrm{wt}` must be set to **None**, otherwise :math:`\mathrm{wt}[i-1]` must contain the weight for the :math:`i`\ th observation. **Returns** **sw** : float The sum of weights. If :math:`\mathrm{wt}\text{ is }\mathbf{None}`, :math:`\mathrm{sw}` contains the number of observations, :math:`n`. **wmean** : float, ndarray, shape :math:`\left(m\right)` The sample means. :math:`\mathrm{wmean}[j-1]` contains the mean for the :math:`j`\ th variable. **c** : float, ndarray, shape :math:`\left(\left(m\times m+m\right)/2\right)` The cross-products. If :math:`\mathrm{mean} = \texttt{'M'}`, :math:`\mathrm{c}` contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products of deviations about the mean. If :math:`\mathrm{mean} = \texttt{'Z'}`, :math:`\mathrm{c}` contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products. These are stored packed by columns, i.e., the cross-product between the :math:`j`\ th and :math:`k`\ th variable, :math:`k\geq j`, is stored in :math:`\mathrm{c}[k\times \left(k-1\right)/2+j-1]`. .. _g02bu-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. (`errno` :math:`3`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'W'}` or :math:`\texttt{'U'}`. (`errno` :math:`4`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] < 0.0`. Constraint: :math:`\mathrm{wt}[i-1] \geq 0.0`, for :math:`i = 1,2,\ldots,n`. .. _g02bu-py2-py-notes: **Notes** ``ssqmat`` is an adaptation of West's WV2 algorithm; see West (1979). This function calculates the (optionally weighted) sample means and (optionally weighted) sums of squares and cross-products or sums of squares and cross-products of deviations from the (weighted) mean for a sample of :math:`n` observations on :math:`m` variables :math:`X_j`, for :math:`\textit{j} = 1,2,\ldots,m`. The algorithm makes a single pass through the data. For the first :math:`i-1` observations let the mean of the :math:`j`\ th variable be :math:`\bar{x}_j\left(i-1\right)`, the cross-product about the mean for the :math:`j`\ th and :math:`k`\ th variables be :math:`c_{{jk}}\left(i-1\right)` and the sum of weights be :math:`W_{{i-1}}`. These are updated by the :math:`i`\ th observation, :math:`x_{{ij}}`, for :math:`\textit{j} = 1,2,\ldots,m`, with weight :math:`w_i` as follows: .. math:: \begin{array}{c} W_i = W_{{i-1}} + w_i \\ \bar{x}_j \left(i\right) = \bar{x}_j \left(i-1\right) + \frac{w_i}{W_i} \left(x_j-\bar{x}_j\left(i-1\right)\right) \text{, }\quad j = 1,2,\ldots,m \end{array} and .. math:: c_{{jk}}\left(i\right) = c_{{jk}}\left(i-1\right)+\frac{w_i}{W_i}\left(x_j-\bar{x}_j\left(i-1\right)\right)\left(x_k-\bar{x}_k\left(i-1\right)\right)W_{{i-1}}\text{, }\quad j = 1,2,\ldots,m\text{ and }k = j,j+1,\ldots,m\text{.} The algorithm is initialized by taking :math:`\bar{x}_j\left(1\right) = x_{{1j}}`, the first observation, and :math:`c_{{ij}}\left(1\right) = 0.0`. For the unweighted case :math:`w_i = 1` and :math:`W_i = i` for all :math:`i`. Note that only the upper triangle of the matrix is calculated and returned packed by column. .. _g02bu-py2-py-references: **References** Chan, T F, Golub, G H and Leveque, R J, 1982, `Updating Formulae and a Pairwise Algorithm for Computing Sample Variances`, Compstat, Physica-Verlag West, D H D, 1979, `Updating mean and variance estimates: An improved method`, Comm. ACM (22), 532--555 See Also -------- :meth:`naginterfaces.library.examples.correg.lars_param_ex.main` :meth:`naginterfaces.library.examples.correg.linregm_fit_stepwise_ex.main` """ raise NotImplementedError
[docs]def ssqmat_to_corrmat(m, r): r""" ``ssqmat_to_corrmat`` calculates a matrix of Pearson product-moment correlation coefficients from sums of squares and cross-products of deviations about the mean. .. _g02bw-py2-py-doc: For full information please refer to the NAG Library document for g02bw https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bwf.html .. _g02bw-py2-py-parameters: **Parameters** **m** : int :math:`m`, the number of variables. **r** : float, array-like, shape :math:`\left(\left(\mathrm{m}\times \mathrm{m}+\mathrm{m}\right)/2\right)` Contains the upper triangular part of the sums of squares and cross-products matrix of deviations from the mean. These are stored packed by column, i.e., the cross-product between variable :math:`j` and :math:`k`, :math:`k\geq j`, is stored in :math:`\mathrm{r}[\left(k\times \left(k-1\right)/2+j\right)-1]`. **Returns** **r** : float, ndarray, shape :math:`\left(\left(\mathrm{m}\times \mathrm{m}+\mathrm{m}\right)/2\right)` The Pearson product-moment correlation coefficients. These are stored packed by column corresponding to the input cross-products. .. _g02bw-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{m} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{m} \geq 1`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`2`) A variable has a zero variance. All correlations involving the variable with zero variance will be returned as zero. .. _g02bw-py2-py-notes: **Notes** ``ssqmat_to_corrmat`` calculates a matrix of Pearson product-moment correlation coefficients from sums of squares and cross-products about the mean for observations on :math:`m` variables which can be computed by a single call to :meth:`ssqmat` or a series of calls to :meth:`ssqmat_update`. The sums of squares and cross-products are stored in an array packed by column and are overwritten by the correlation coefficients. Let :math:`c_{{jk}}` be the cross-product of deviations from the mean, for :math:`\textit{k} = j,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`, then the product-moment correlation coefficient, :math:`r_{{jk}}` is given by .. math:: r_{{jk}} = \frac{c_{{jk}}}{{\sqrt{c_{{jj}}c_{{kk}}}}}\text{.} """ raise NotImplementedError
[docs]def corrmat(x, nonzwt='W', wt=None): r""" ``corrmat`` calculates the sample means, the standard deviations, the variance-covariance matrix, and the matrix of Pearson product-moment correlation coefficients for a set of data. Weights may be used. .. _g02bx-py2-py-doc: For full information please refer to the NAG Library document for g02bx https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bxf.html .. _g02bx-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **nonzwt** : str, length 1, optional The variance calculation uses a divisor which is either the number of weights or the number of nonzero weights. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional :math:`w`, the optional frequency weighting for each observation, with :math:`\mathrm{wt}[i-1] = w_i`. Usually :math:`w_i` will be an integral value corresponding to the number of observations associated with the :math:`i`\ th data value, or zero if the :math:`i`\ th data value is to be ignored. If :math:`\mathrm{wt}\text{ is }\mathbf{None}`, :math:`w_i` is set to :math:`1` for all :math:`i`. **Returns** **xbar** : float, ndarray, shape :math:`\left(m\right)` The sample means. :math:`\mathrm{xbar}[j-1]` contains the mean of the :math:`j`\ th variable. **std** : float, ndarray, shape :math:`\left(m\right)` The standard deviations. :math:`\mathrm{std}[j-1]` contains the standard deviation for the :math:`j`\ th variable. **v** : float, ndarray, shape :math:`\left(m, m\right)` The variance-covariance matrix. :math:`\mathrm{v}[\textit{j}-1,\textit{k}-1]` contains the covariance between variables :math:`\textit{j}` and :math:`\textit{k}`, for :math:`\textit{k} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`. **r** : float, ndarray, shape :math:`\left(m, m\right)` The matrix of Pearson product-moment correlation coefficients. :math:`\mathrm{r}[j-1,k-1]` contains the correlation coefficient between variables :math:`j` and :math:`k`. .. _g02bx-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 1`. (`errno` :math:`2`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'U'}`, :math:`\texttt{'V'}` or :math:`\texttt{'W'}` (`errno` :math:`3`) On entry, at least one value of :math:`\mathrm{wt}` is negative. Constraint: :math:`\mathrm{wt}[i-1] \geq 0`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`4`) On entry, :math:`\langle\mathit{\boldsymbol{value}}\rangle` observations have nonzero weight. Constraint: at least two observations must have a nonzero weight. (`errno` :math:`4`) On entry, Sum of the weights is :math:`\langle\mathit{\boldsymbol{value}}\rangle`. Constraint: Sum of the weights must be greater than :math:`1`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`5`) A variable has a zero variance. In this case :math:`\mathrm{v}` and :math:`\mathrm{std}` are returned as calculated but :math:`\mathrm{r}` will contain zero for any correlation involving a variable with zero variance. .. _g02bx-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` For :math:`n` observations on :math:`m` variables the one-pass algorithm of West (1979) as implemented in :meth:`ssqmat` is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for :math:`p` selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by: (a) The means .. math:: \bar{ x }_j = \frac{{\sum_{{i = 1}}^nw_ix_{{ij}}}}{{\sum_{{i = 1}}^nw_i}}\quad \text{ }\quad j = 1,\ldots,p (b) The variance-covariance matrix .. math:: C_{{jk}} = \frac{{\sum_{{i = 1}}^nw_i\left(x_{{ij}}-\bar{ x }_j\right)\left(x_{{ik}}-\bar{ x }_k\right)}}{{\sum_{{i = 1}}^nw_i-1}}\quad \text{ }\quad j,k = 1,\ldots,p (c) The standard deviations .. math:: s_j = \sqrt{C_{{jj}}}\quad \text{ }\quad j = 1,\ldots,p (d) The Pearson product-moment correlation coefficients .. math:: R_{{jk}} = \frac{C_{{jk}}}{\sqrt{{C_{{jj}}C_{{kk}}}}}\quad \text{ }\quad j,k = 1,\ldots,p where :math:`x_{{ij}}` is the value of the :math:`i`\ th observation on the :math:`j`\ th variable and :math:`w_i` is the weight for the :math:`i`\ th observation which will be :math:`1` in the unweighted case. Note that the denominator for the variance-covariance is :math:`\sum_{{i = 1}}^nw_i-1`, so the weights should be scaled so that the sum of weights reflects the true sample size. .. _g02bx-py2-py-references: **References** Chan, T F, Golub, G H and Leveque, R J, 1982, `Updating Formulae and a Pairwise Algorithm for Computing Sample Variances`, Compstat, Physica-Verlag West, D H D, 1979, `Updating mean and variance estimates: An improved method`, Comm. ACM (22), 532--555 """ raise NotImplementedError
[docs]def corrmat_partial(ny, nx, isz, r): r""" ``corrmat_partial`` computes a partial correlation/variance-covariance matrix from a correlation or variance-covariance matrix computed by :meth:`corrmat`. .. _g02by-py2-py-doc: For full information please refer to the NAG Library document for g02by https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02byf.html .. _g02by-py2-py-parameters: **Parameters** **ny** : int The number of :math:`Y` variables, :math:`n_y`, for which partial correlation coefficients are to be computed. **nx** : int The number of :math:`X` variables, :math:`n_x`, which are to be considered as fixed. **isz** : int, array-like, shape :math:`\left(m\right)` Indicates which variables belong to set :math:`X` and :math:`Y`. :math:`\mathrm{isz}[i-1] < 0` The :math:`\textit{i}`\ th variable is a :math:`Y` variable, for :math:`\textit{i} = 1,2,\ldots,m`. :math:`\mathrm{isz}[i-1] > 0` The :math:`i`\ th variable is a :math:`X` variable. :math:`\mathrm{isz}[i-1] = 0` The :math:`i`\ th variable is not included in the computations. **r** : float, array-like, shape :math:`\left(m, m\right)` The variance-covariance or correlation matrix for the :math:`\textit{m}` variables as given by :meth:`corrmat`. Only the upper triangle need be given. **Note:** the matrix must be a full rank variance-covariance or correlation matrix and so be positive definite. This condition is not directly checked by the function. **Returns** **p** : float, ndarray, shape :math:`\left(\mathrm{ny}, \mathrm{ny}\right)` The strict upper triangle of :math:`\mathrm{p}` contains the strict upper triangular part of the :math:`n_y\times n_y` partial correlation matrix. The lower triangle contains the lower triangle of the :math:`n_y\times n_y` partial variance-covariance matrix if the matrix given in :math:`\mathrm{r}` is a variance-covariance matrix. If the matrix given in :math:`\mathrm{r}` is a partial correlation matrix then the variance-covariance matrix is for standardized variables. .. _g02by-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{ny} = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{nx} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`{\mathrm{ny}+\mathrm{nx}} \leq m`. (`errno` :math:`1`) On entry, :math:`\mathrm{nx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{nx} \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{ny} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ny} > 1`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m > 2`. (`errno` :math:`2`) On entry, :math:`\langle\mathit{\boldsymbol{value}}\rangle` values of :math:`\mathrm{isz}` are greater than zero and :math:`\mathrm{nx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: exactly :math:`\mathrm{nx}` values of :math:`\mathrm{isz}` must be greater than zero. (`errno` :math:`2`) On entry, :math:`\langle\mathit{\boldsymbol{value}}\rangle` values of :math:`\mathrm{isz}` are less than zero and :math:`\mathrm{ny} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: exactly :math:`\mathrm{ny}` values of :math:`\mathrm{isz}` must be less than zero. (`errno` :math:`3`) On entry, the square root of the correlation matrix of the independent variables is singular. Try removing some of the :math:`X` variables by setting the appropriate element of :math:`\mathrm{isz} = 0`. (`errno` :math:`3`) On entry, the correlation matrix of the independent variables is singular. Try removing some of the :math:`X` variables by setting the appropriate element of :math:`\mathrm{isz} = 0`. (`errno` :math:`4`) On entry, an element of the partial correlation matrix is greater than :math:`1`. Constraint: :math:`\mathrm{r}` must be positive definite. (`errno` :math:`4`) On entry, a diagonal element of the partial covariance matrix is zero. Constraint: :math:`\mathrm{r}` must be positive definite. (`errno` :math:`4`) On entry, a diagonal element of the partial covariance matrix is zero and an element of the partial correlation matrix is greater than :math:`1`. Constraint: :math:`\mathrm{r}` must be positive definite. .. _g02by-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` Partial correlation can be used to explore the association between pairs of random variables in the presence of other variables. For three variables, :math:`y_1`, :math:`y_2` and :math:`x_3`, the partial correlation coefficient between :math:`y_1` and :math:`y_2` given :math:`x_3` is computed as: .. math:: \frac{{r_{12}-r_{13}r_{23}}}{{\sqrt{\left(1-r_{13}^2\right)\left(1-r_{23}^2\right)}}}\text{,} where :math:`r_{{ij}}` is the product-moment correlation coefficient between variables with subscripts :math:`i` and :math:`j`. The partial correlation coefficient is a measure of the linear association between :math:`y_1` and :math:`y_2` having eliminated the effect due to both :math:`y_1` and :math:`y_2` being linearly associated with :math:`x_3`. That is, it is a measure of association between :math:`y_1` and :math:`y_2` conditional upon fixed values of :math:`x_3`. Like the full correlation coefficients the partial correlation coefficient takes a value in the range (:math:`-1,1`) with the value :math:`0` indicating no association. In general, let a set of variables be partitioned into two groups :math:`Y` and :math:`X` with :math:`n_y` variables in :math:`Y` and :math:`n_x` variables in :math:`X` and let the variance-covariance matrix of all :math:`n_y+n_x` variables be partitioned into, .. math:: \left[\begin{array}{ll}\Sigma_{{xx}}&\Sigma_{{xy}}\\\Sigma_{{yx}}&\Sigma_{{yy}}\end{array}\right]\text{.} The variance-covariance of :math:`Y` conditional on fixed values of the :math:`X` variables is given by: .. math:: \Sigma_{{y | x}} = \Sigma_{{yy}}-\Sigma_{{yx}}\Sigma_{{xx}}^{-1}\Sigma_{{xy}}\text{.} The partial correlation matrix is then computed by standardizing :math:`\Sigma_{{y | x}}`, .. math:: \mathrm{diag}\left({\left(\Sigma_{{y | x}}\right)}^{{-\frac{1}{2}}}\right)\Sigma_{{y | x}}\mathrm{diag}\left({\left(\Sigma_{{y | x}}\right)}^{{-\frac{1}{2}}}\right)\text{.} To test the hypothesis that a partial correlation is zero under the assumption that the data has an approximately Normal distribution a test similar to the test for the full correlation coefficient can be used. If :math:`r` is the computed partial correlation coefficient then the appropriate :math:`t` statistic is .. math:: r\sqrt{\frac{{n-n_x-2}}{{1-r^2}}}\text{,} which has approximately a Student's :math:`t`-distribution with :math:`n-n_x-2` degrees of freedom, where :math:`n` is the number of observations from which the full correlation coefficients were computed. .. _g02by-py2-py-references: **References** Krzanowski, W J, 1990, `Principles of Multivariate Analysis`, Oxford University Press Morrison, D F, 1967, `Multivariate Statistical Methods`, McGraw--Hill Osborn, J F, 1979, `Statistical Exercises in Medical Research`, Blackwell Snedecor, G W and Cochran, W G, 1967, `Statistical Methods`, Iowa State University Press """ raise NotImplementedError
[docs]def ssqmat_combine(xsw, xmean, xc, ysw, ymean, yc, mean='M'): r""" ``ssqmat_combine`` combines two sets of sample means and sums of squares and cross-products matrices. It is designed to be used in conjunction with :meth:`ssqmat` to allow large datasets to be summarised. .. _g02bz-py2-py-doc: For full information please refer to the NAG Library document for g02bz https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02bzf.html .. _g02bz-py2-py-parameters: **Parameters** **xsw** : float :math:`W_x`, the sum of weights, from the first set of data, :math:`X`. If the data is unweighted then this will be the number of observations in the first dataset. **xmean** : float, array-like, shape :math:`\left(m\right)` :math:`\mu_x`, the sample means for the first set of data, :math:`X`. **xc** : float, array-like, shape :math:`\left(\left(m\times m+m\right)/2\right)` :math:`C_x`, the sums of squares and cross-products matrix for the first set of data, :math:`X`, as returned by :meth:`ssqmat`. :meth:`ssqmat`, returns this matrix packed by columns, i.e., the cross-product between the :math:`j`\ th and :math:`k`\ th variable, :math:`k\geq j`, is stored in :math:`\mathrm{xc}[k\times \left(k-1\right)/2+j-1]`. No check is made that :math:`C_x` is a valid cross-products matrix. **ysw** : float :math:`W_y`, the sum of weights, from the second set of data, :math:`Y`. If the data is unweighted then this will be the number of observations in the second dataset. **ymean** : float, array-like, shape :math:`\left(m\right)` :math:`\mu_y`, the sample means for the second set of data, :math:`Y`. **yc** : float, array-like, shape :math:`\left(\left(m\times m+m\right)/2\right)` :math:`C_y`, the sums of squares and cross-products matrix for the second set of data, :math:`Y`, as returned by :meth:`ssqmat`. :meth:`ssqmat`, returns this matrix packed by columns, i.e., the cross-product between the :math:`j`\ th and :math:`k`\ th variable, :math:`k\geq j`, is stored in :math:`\mathrm{yc}[k\times \left(k-1\right)/2+j-1]`. No check is made that :math:`C_y` is a valid cross-products matrix. **mean** : str, length 1, optional Indicates whether the matrices supplied in :math:`\mathrm{xc}` and :math:`\mathrm{yc}` are sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean. :math:`\mathrm{mean} = \text{‘M'}` Sums of squares and cross-products of deviations about the mean have been supplied. :math:`\mathrm{mean} = \text{‘Z'}` Sums of squares and cross-products have been supplied. **Returns** **xsw** : float :math:`W_z`, the sum of weights, from the combined dataset, :math:`Z`. If both datasets are unweighted then this will be the number of observations in the combined dataset. **xmean** : float, ndarray, shape :math:`\left(m\right)` :math:`\mu_z`, the sample means for the combined data, :math:`Z`. **xc** : float, ndarray, shape :math:`\left(\left(m\times m+m\right)/2\right)` :math:`C_z`, the sums of squares and cross-products matrix for the combined dataset, :math:`Z`. This matrix is again stored packed by columns. .. _g02bz-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \text{‘M'} \text{ or } \text{‘Z'}`. (`errno` :math:`21`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`31`) On entry, :math:`\mathrm{xsw} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{xsw}\geq 0.0`. (`errno` :math:`61`) On entry, :math:`\mathrm{ysw} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ysw}\geq 0.0`. .. _g02bz-py2-py-notes: **Notes** Let :math:`X` and :math:`Y` denote two sets of data, each with :math:`m` variables and :math:`n_x` and :math:`n_y` observations respectively. Let :math:`\mu_x` denote the (optionally weighted) vector of :math:`m` means for the first dataset and :math:`C_x` denote either the sums of squares and cross-products of deviations from :math:`\mu_x` .. math:: C_x = \left(X-e\mu_x^\mathrm{T}\right)^\mathrm{T}D_x\left(X-e\mu_x^\mathrm{T}\right) or the sums of squares and cross-products, in which case .. math:: C_x = X^\mathrm{T}D_xX where :math:`e` is a vector of :math:`n_x` ones and :math:`D_x` is a diagonal matrix of (optional) weights and :math:`W_x` is defined as the sum of the diagonal elements of :math:`D`. Similarly, let :math:`\mu_y`, :math:`C_y` and :math:`W_y` denote the same quantities for the second dataset. Given :math:`\mu_x,\mu_y,C_x,C_y,W_x` and :math:`W_y` ``ssqmat_combine`` calculates :math:`\mu_z`, :math:`C_z` and :math:`W_z` as if a dataset :math:`Z`, with :math:`m` variables and :math:`n_x+n_y` observations were supplied to :meth:`ssqmat`, with :math:`Z` constructed as .. math:: Z = \left(\begin{array}{c}X\\Y\end{array}\right)\text{.} ``ssqmat_combine`` has been designed to combine the results from two calls to :meth:`ssqmat` allowing large datasets, or cases where all the data is not available at the same time, to be summarised. .. _g02bz-py2-py-references: **References** Bennett, J, Pebay, P, Roe, D and Thompson, D, 2009, `Numerically stable, single-pass, parallel statistics algorithms`, Proceedings of IEEE International Conference on Cluster Computing """ raise NotImplementedError
[docs]def linregs_const(x, y): r""" ``linregs_const`` performs a simple linear regression with dependent variable :math:`y` and independent variable :math:`x`. .. _g02ca-py2-py-doc: For full information please refer to the NAG Library document for g02ca https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02caf.html .. _g02ca-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{x}[\textit{i}-1]` must contain :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **y** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{y}[\textit{i}-1]` must contain :math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **Returns** **result** : float, ndarray, shape :math:`\left(20\right)` The following information: .. rst-class:: nag-rules-none nag-align-left +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[0]` |:math:`\bar{x}`, the mean value of the independent variable, :math:`x`; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[1]` |:math:`\bar{y}`, the mean value of the dependent variable, :math:`y`; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[2]` |:math:`s_x` the standard deviation of the independent variable, :math:`x`; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[3]` |:math:`s_y` the standard deviation of the dependent variable, :math:`y`; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[4]` |:math:`r`, the Pearson product-moment correlation between the independent variable :math:`x` and the dependent variable :math:`y`;| +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[5]` |:math:`b`, the regression coefficient; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[6]` |:math:`a`, the regression constant; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[7]` |:math:`se\left(b\right)`, the standard error of the regression coefficient; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[8]` |:math:`se\left(a\right)`, the standard error of the regression constant; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[9]` |:math:`t\left(b\right)`, the :math:`t` value for the regression coefficient; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[10]`|:math:`t\left(a\right)`, the :math:`t` value for the regression constant; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[11]`|:math:`\mathrm{SSR}`, the sum of squares attributable to the regression; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[12]`|:math:`\mathrm{DFR}`, the degrees of freedom attributable to the regression; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[13]`|:math:`\mathrm{MSR}`, the mean square attributable to the regression; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[14]`|:math:`F`, the :math:`F` value for the analysis of variance; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[15]`|:math:`\mathrm{SSD}`, the sum of squares of deviations about the regression; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[16]`|:math:`\mathrm{DFD}`, the degrees of freedom of deviations about the regression | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[17]`|:math:`\mathrm{MSD}`, the mean square of deviations about the regression; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[18]`|:math:`\mathrm{SST}`, the total sum of squares; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[19]`|DFT, the total degrees of freedom. | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ .. _g02ca-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 2`. (`errno` :math:`2`) On entry, all :math:`\textit{n}` values of at least one of :math:`\mathrm{x}` and :math:`\mathrm{y}` are identical. .. _g02ca-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` ``linregs_const`` fits a straight line of the form .. math:: y = a+bx to the data points .. math:: \left(x_1, y_1\right),\left(x_2, y_2\right),\ldots,\left(x_n, y_n\right)\text{,} such that .. math:: y_i = a+bx_i+e_i\text{, }\quad i = 1,2,\ldots,n\left(n > 2\right)\text{.} The function calculates the regression coefficient, :math:`b`, the regression constant, :math:`a` (and various other statistical quantities) by minimizing .. math:: \sum_{{i = 1}}^ne_i^2\text{.} The input data consist of the :math:`n` pairs of observations .. math:: \left(x_1, y_1\right),\left(x_2, y_2\right),\ldots,\left(x_n, y_n\right) on the independent variable :math:`x` and the dependent variable :math:`y`. The quantities calculated are: (a) Means: .. math:: \bar{x} = \frac{1}{n}\sum_{{i = 1}}^nx_i\text{; }\quad \bar{y} = \frac{1}{n}\sum_{{i = 1}}^ny_i\text{.} (#) Standard deviations: .. math:: s_x = \sqrt{\frac{1}{{n-1}}\sum_{{i = 1}}^n\left(x_i-\bar{x}\right)^2}\text{; }\quad s_y = \sqrt{\frac{1}{{n-1}}\sum_{{i = 1}}^n\left(y_i-\bar{y}\right)^2}\text{.} (#) Pearson product-moment correlation coefficient: .. math:: r = \frac{{\sum_{{i = 1}}^n\left(x_i-\bar{x}\right)\left(y_i-\bar{y}\right)}}{{\sqrt{\sum_{{i = 1}}^n\left(x_i-\bar{x}\right)^2\sum_{{i = 1}}^n\left(y_i-\bar{y}\right)^2}}}\text{.} (#) The regression coefficient, :math:`b`, and the regression constant, :math:`a`: .. math:: b = \frac{{\sum_{{i = 1}}^n\left(x_i-\bar{x}\right)\left(y_i-\bar{y}\right)}}{{\sum_{{i = 1}}^n\left(x_i-\bar{x}\right)^2}}\text{;}a = \bar{y}-b\bar{x}\text{.} (#) The sum of squares attributable to the regression, :math:`\mathrm{SSR}`, the sum of squares of deviations about the regression, :math:`\mathrm{SSD}`, and the total sum of squares, :math:`\mathrm{SST}`: .. math:: \mathrm{SST} = \sum_{{i = 1}}^n\left(y_i-\bar{y}\right)^2\text{;}\mathrm{SSD} = \sum_{{i = 1}}^n\left(y_i-a-bx_i\right)^2\text{;}\mathrm{SSR} = \mathrm{SST}-\mathrm{SSD}\text{.} (#) The degrees of freedom attributable to the regression, :math:`\mathrm{DFR}`, the degrees of freedom of deviations about the regression, :math:`\mathrm{DFD}`, and the total degrees of freedom, :math:`\mathrm{DFT}`: .. math:: \mathrm{DFT} = n-1\text{; }\mathrm{DFD} = n-2\text{; }\mathrm{DFR} = 1\text{.} (#) The mean square attributable to the regression, :math:`\mathrm{MSR}`, and the mean square of deviations about the regression, :math:`\mathrm{MSD}`: .. math:: \mathrm{MSR} = \mathrm{SSR}/\mathrm{DFR}\text{;}\mathrm{MSD} = \mathrm{SSD}/\mathrm{DFD}\text{.} (#) The :math:`F` value for the analysis of variance: .. math:: F = \mathrm{MSR}/\mathrm{MSD}\text{.} (#) The standard error of the regression coefficient, :math:`se\left(b\right)`, and the standard error of the regression constant, :math:`se\left(a\right)`: .. math:: se\left(b\right) = \sqrt{\frac{\mathrm{MSD}}{{\sum_{{i = 1}}^n\left(x_i-\bar{x}\right)^2}}}\text{; }\quad se\left(a\right) = \sqrt{\mathrm{MSD}\left(\frac{1}{n}+\frac{{\bar{x}^2}}{{\sum_{{i = 1}}^n\left(x_i-\bar{x}\right)^2}}\right)}\text{.} (#) The :math:`t` value for the regression coefficient, :math:`t\left(b\right)`, and the :math:`t` value for the regression constant, :math:`t\left(a\right)`: .. math:: t\left(b\right) = \frac{b}{{se\left(b\right)}}\text{; }\quad t\left(a\right) = \frac{a}{{se\left(a\right)}}\text{.} .. _g02ca-py2-py-references: **References** Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley """ raise NotImplementedError
[docs]def linregs_noconst(x, y): r""" ``linregs_noconst`` performs a simple linear regression with no constant, with dependent variable :math:`y` and independent variable :math:`x`. .. _g02cb-py2-py-doc: For full information please refer to the NAG Library document for g02cb https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02cbf.html .. _g02cb-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{x}[\textit{i}-1]` must contain :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **y** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{y}[\textit{i}-1]` must contain :math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **Returns** **result** : float, ndarray, shape :math:`\left(20\right)` The following information: .. rst-class:: nag-rules-none nag-align-left +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[0]` |:math:`\bar{x}`, the mean value of the independent variable, :math:`x`; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[1]` |:math:`\bar{y}`, the mean value of the dependent variable, :math:`y`; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[2]` |:math:`s_x`, the standard deviation of the independent variable, :math:`x`; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[3]` |:math:`s_y`, the standard deviation of the dependent variable, :math:`y`; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[4]` |:math:`r`, the Pearson product-moment correlation between the independent variable :math:`x` and the dependent variable :math:`y`;| +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[5]` |:math:`b`, the regression coefficient; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[6]` |the value :math:`0.0`; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[7]` |:math:`se\left(b\right)`, the standard error of the regression coefficient; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[8]` |the value :math:`0.0`; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[9]` |:math:`t\left(b\right)`, the :math:`t` value for the regression coefficient; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[10]`|the value :math:`0.0`; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[11]`|:math:`SSR`, the sum of squares attributable to the regression; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[12]`|:math:`DFR`, the degrees of freedom attributable to the regression; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[13]`|:math:`MSR`, the mean square attributable to the regression; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[14]`|:math:`F`, the :math:`F` value for the analysis of variance; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[15]`|:math:`SSD`, the sum of squares of deviations about the regression; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[16]`|:math:`DFD`, the degrees of freedom of deviations about the regression; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[17]`|:math:`MSD`, the mean square of deviations about the regression; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[18]`|:math:`SST`, the total sum of squares; | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[19]`|:math:`DFT`, the total degrees of freedom. | +---------------------------+----------------------------------------------------------------------------------------------------------------------------------+ .. _g02cb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 2`. (`errno` :math:`2`) On entry, all :math:`\textit{n}` values of at least one of :math:`\mathrm{x}` and :math:`\mathrm{y}` are identical. .. _g02cb-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` ``linregs_noconst`` fits a straight line of the form .. math:: y = bx to the data points .. math:: \left(x_1, y_1\right),\left(x_2, y_2\right),\ldots,\left(x_n, y_n\right)\text{,} such that .. math:: y_i = bx_i+e_i\text{, }\quad i = 1,2,\ldots,n\left(n\geq 2\right)\text{.} The function calculates the regression coefficient, :math:`b`, and the various other statistical quantities by minimizing .. math:: \sum_{{i = 1}}^ne_i^2\text{.} The input data consists of the :math:`n` pairs of observations :math:`\left(x_1, y_1\right),\left(x_2, y_2\right),\ldots,\left(x_n, y_n\right)` on the independent variable :math:`x` and the dependent variable :math:`y`. The quantities calculated are: (a) Means: .. math:: \bar{x} = \frac{1}{n}\sum_{{i = 1}}^nx_i\text{; }\quad \bar{y} = \frac{1}{n}\sum_{{i = 1}}^ny_i\text{.} (#) Standard deviations: .. math:: s_x = \sqrt{\frac{1}{{n-1}}\sum_{{i = 1}}^n\left(x_i-\bar{x}\right)^2}\text{; }\quad s_y = \sqrt{\frac{1}{{n-1}}\sum_{{i = 1}}^n\left(y_i-\bar{y}\right)^2}\text{.} (#) Pearson product-moment correlation coefficient: .. math:: r = \frac{{\sum_{{i = 1}}^n\left(x_i-\bar{x}\right)\left(y_i-\bar{y}\right)}}{{\sqrt{\sum_{{i = 1}}^n\left(x_i-\bar{x}\right)^2\sum_{{i = 1}}^n\left(y_i-\bar{y}\right)^2}}}\text{.} (#) The regression coefficient, :math:`b`: .. math:: b = \frac{{\sum_{{i = 1}}^nx_iy_i}}{{\sum_{{i = 1}}^nx_i^2}}\text{.} (#) The sum of squares attributable to the regression, :math:`SSR`, the sum of squares of deviations about the regression, :math:`SSD`, and the total sum of squares, :math:`SST`: .. math:: SST = \sum_{{i = 1}}^ny_i^2\text{; }\quad SSD = \sum_{{i = 1}}^n\left(y_i-bx_i\right)^2\text{, }\quad SSR = SST-SSD\text{.} (#) The degrees of freedom attributable to the regression, :math:`DFR`, the degrees of freedom of deviations about the regression, :math:`DFD`, and the total degrees of freedom, :math:`DFT`: .. math:: DFT = n\text{; }\quad DFD = n-1\text{, }\quad DFR = 1\text{.} (#) The mean square attributable to the regression, :math:`MSR`, and the mean square of deviations about the regression, :math:`MSD\text{.}` .. math:: MSR = SSR/DFR\text{; }\quad MSD = SSD/DFD\text{.} (#) The :math:`F` value for the analysis of variance: .. math:: F = MSR/MSD\text{.} (#) The standard error of the regression coefficient: .. math:: se\left(b\right) = \sqrt{\frac{{MSD}}{{\sum_{{i = 1}}^nx_i^2}}}\text{.} (#) The :math:`t` value for the regression coefficient: .. math:: t\left(b\right) = \frac{b}{{se\left(b\right)}}\text{.} .. _g02cb-py2-py-references: **References** Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley """ raise NotImplementedError
[docs]def linregs_const_miss(x, y, xmiss, ymiss): r""" ``linregs_const_miss`` performs a simple linear regression with dependent variable :math:`y` and independent variable :math:`x`, omitting cases involving missing values. .. _g02cc-py2-py-doc: For full information please refer to the NAG Library document for g02cc https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02ccf.html .. _g02cc-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{x}[\textit{i}-1]` must contain :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **y** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{y}[\textit{i}-1]` must contain :math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **xmiss** : float The value :math:`xm` which is to be taken as the missing value for the variable :math:`x`. See `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02ccf.html#accuracy>`__. **ymiss** : float The value :math:`ym` which is to be taken as the missing value for the variable :math:`y`. See `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02ccf.html#accuracy>`__. **Returns** **result** : float, ndarray, shape :math:`\left(21\right)` The following information: .. rst-class:: nag-rules-none nag-align-left +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[0]` |:math:`\bar{x}`, the mean value of the independent variable, :math:`x`; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[1]` |:math:`\bar{y}`, the mean value of the dependent variable, :math:`y`; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[2]` |:math:`s_x`, the standard deviation of the independent variable, :math:`x`; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[3]` |:math:`s_y`, the standard deviation of the dependent variable, :math:`y`; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[4]` |:math:`r`, the Pearson product-moment correlation between the independent variable :math:`x` and the dependent variable :math:`y`| +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[5]` |:math:`b`, the regression coefficient; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[6]` |:math:`a`, the regression constant; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[7]` |:math:`se\left(b\right)`, the standard error of the regression coefficient; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[8]` |:math:`se\left(a\right)`, the standard error of the regression constant; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[9]` |:math:`t\left(b\right)`, the :math:`t` value for the regression coefficient; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[10]`|:math:`t\left(a\right)`, the :math:`t` value for the regression constant; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[11]`|:math:`SSR`, the sum of squares attributable to the regression; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[12]`|:math:`DFR`, the degrees of freedom attributable to the regression; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[13]`|:math:`MSR`, the mean square attributable to the regression; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[14]`|:math:`F`, the :math:`F` value for the analysis of variance; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[15]`|:math:`SSD`, the sum of squares of deviations about the regression; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[16]`|:math:`DFD`, the degrees of freedom of deviations about the regression; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[17]`|:math:`MSD`, the mean square of deviations about the regression; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[18]`|:math:`SST`, the total sum of squares; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[19]`|:math:`DFT`, the total degrees of freedom; | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[20]`|:math:`n_c`, the number of observations used in the calculations. | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------+ .. _g02cc-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 2`. (`errno` :math:`2`) After observations with missing values were omitted, two or fewer cases remained. (`errno` :math:`3`) After observations with missing values were omitted, all remaining values of at least one of :math:`\mathrm{x}` and :math:`\mathrm{y}` were identical. .. _g02cc-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` ``linregs_const_miss`` fits a straight line of the form .. math:: y = a+bx to those of the data points .. math:: \left(x_1, y_1\right),\left(x_2, y_2\right),\ldots,\left(x_n, y_n\right) that do not include missing values, such that .. math:: y_i = a+bx_i+e_i for those :math:`\left(x_i, y_i\right)`, :math:`i = 1,2,\ldots,n\quad \text{ }\quad \left(n > 2\right)` which do not include missing values. The function eliminates all pairs of observations :math:`\left(x_i, y_i\right)` which contain a missing value for either :math:`x` or :math:`y`, and then calculates the regression coefficient, :math:`b`, the regression constant, :math:`a`, and various other statistical quantities, by minimizing the sum of the :math:`e_i^2` over those cases remaining in the calculations. The input data consists of the :math:`n` pairs of observations :math:`\left(x_1, y_1\right),\left(x_2, y_2\right),\ldots,\left(x_n, y_n\right)` on the independent variable :math:`x` and the dependent variable :math:`y`. In addition two values, :math:`\textit{xm}` and :math:`\textit{ym}`, are given which are considered to represent missing observations for :math:`x` and :math:`y` respectively. (See `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02ccf.html#accuracy>`__). Let :math:`w_i = 0` if the :math:`i`\ th observation of either :math:`x` or :math:`y` is missing, i.e., if :math:`x_i = \textit{xm}` and/or :math:`y_i = \textit{ym}`; and :math:`w_i = 1` otherwise, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Means: .. math:: \bar{x} = \frac{{\sum_{{i = 1}}^nw_ix_i}}{{\sum_{{i = 1}}^nw_i}}\text{; }\quad \bar{y} = \frac{{\sum_{{i = 1}}^nw_iy_i}}{{\sum_{{i = 1}}^nw_i}}\text{.} (#) Standard deviations: .. math:: s_x = \sqrt{\frac{{\sum_{{i = 1}}^nw_i\left(x_i-\bar{x}\right)^2}}{{\sum_{{i = 1}}^nw_i-1}}}\text{; }\quad s_y = \sqrt{\frac{{\sum_{{i = 1}}^nw_i\left(y_i-\bar{y}\right)^2}}{{\sum_{{i = 1}}^nw_i-1}}}\text{.} (#) Pearson product-moment correlation coefficient: .. math:: r = \frac{{\sum_{{i = 1}}^nw_i\left(x_i-\bar{x}\right)\left(y_i-\bar{y}\right)}}{{\sqrt{\sum_{{i = 1}}^nw_i\left(x_i-\bar{x}\right)^2\sum_{{i = 1}}^nw_i\left(y_i-\bar{y}\right)^2}}}\text{.} (#) The regression coefficient, :math:`b`, and the regression constant, :math:`a`: .. math:: b = \frac{{\sum_{{i = 1}}^nw_i\left(x_i-\bar{x}\right)\left(y_i-\bar{y}\right)}}{{\sum_{{i = 1}}^nw_i\left(x_i-\bar{x}\right)^2}}\text{, }\quad a = \bar{y}-b\bar{x}\text{.} (#) The sum of squares attributable to the regression, :math:`SSR`, the sum of squares of deviations about the regression, :math:`SSD`, and the total sum of squares, :math:`SST`: .. math:: SST = \sum_{{i = 1}}^nw_i\left(y_i-\bar{y}\right)^2\text{; }\quad SSD = \sum_{{i = 1}}^nw_i\left(y_i-a-bx_i\right)^2\text{; }\quad SSR = SST-SSD\text{.} (#) The degrees of freedom attributable to the regression, :math:`DFR`, the degrees of freedom of deviations about the regression, :math:`DFD`, and the total degrees of freedom, :math:`DFT`: .. math:: DFT = \sum_{{i = 1}}^nw_i-1\text{; }\quad DFD = \sum_{{i = 1}}^nw_i-2\text{; }\quad DFR = 1\text{.} (#) The mean square attributable to the regression, :math:`MSR`, and the mean square of deviations about the regression, :math:`MSD`: .. math:: MSR = SSR/DFR\text{; }\quad MSD = SSD/DFD\text{.} (#) The :math:`F` value for the analysis of variance: .. math:: F = MSR/MSD\text{.} (#) The standard error of the regression coefficient, :math:`se\left(b\right)`, and the standard error of the regression constant, :math:`se\left(a\right)`: .. math:: se\left(b\right) = \sqrt{\frac{{MSD}}{{\sum_{{i = 1}}^nw_i\left(x_i-\bar{x}\right)^2}}}\text{; }\quad se\left(a\right) = \sqrt{MSD\left(\frac{1}{{\sum_{{i = 1}}^nw_i}}+\frac{{\bar{x}^2}}{{\sum_{{i = 1}}^nw_i\left(x_i-\bar{x}\right)^2}}\right)}\text{.} (#) The :math:`t` value for the regression coefficient, :math:`t\left(b\right)`, and the :math:`t` value for the regression constant, :math:`t\left(a\right)`: .. math:: t\left(b\right) = \frac{b}{{se\left(b\right)}}\text{; }\quad t\left(a\right) = \frac{a}{{se\left(a\right)}}\text{.} (#) The number of observations used in the calculations: .. math:: n_c = \sum_{{i = 1}}^nw_i\text{.} .. _g02cc-py2-py-references: **References** Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley """ raise NotImplementedError
[docs]def linregs_noconst_miss(x, y, xmiss, ymiss): r""" ``linregs_noconst_miss`` performs a simple linear regression with no constant, with dependent variable :math:`y` and independent variable :math:`x`, omitting cases involving missing values. .. _g02cd-py2-py-doc: For full information please refer to the NAG Library document for g02cd https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02cdf.html .. _g02cd-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{x}[\textit{i}-1]` must contain :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **y** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{y}[\textit{i}-1]` must contain :math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **xmiss** : float The value :math:`xm`, which is to be taken as the missing value for the variable :math:`x` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02cdf.html#accuracy>`__). **ymiss** : float The value :math:`ym`, which is to be taken as the missing value for the variable :math:`y` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02cdf.html#accuracy>`__). **Returns** **result** : float, ndarray, shape :math:`\left(21\right)` The following information: .. rst-class:: nag-rules-none nag-align-left +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[0]` |:math:`\bar{x}`, the mean value of the independent variable, :math:`x`; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[1]` |:math:`\bar{y}`, the mean value of the dependent variable, :math:`y`; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[2]` |:math:`s_x`, the standard deviation of the independent variable, :math:`x`; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[3]` |:math:`s_y`, the standard deviation of the dependent variable, :math:`y`; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[4]` |:math:`r`, the Pearson product-moment correlation between the independent variable :math:`x` and the dependent variable, :math:`y`;| +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[5]` |:math:`b`, the regression coefficient; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[6]` |the value :math:`0.0`; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[7]` |:math:`se\left(b\right)`, the standard error of the regression coefficient; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[8]` |the value :math:`0.0`; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[9]` |:math:`t\left(b\right)`, the :math:`t` value for the regression coefficient; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[10]`|the value :math:`0.0`; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[11]`|:math:`SSR`, the sum of squares attributable to the regression; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[12]`|:math:`DFR`, the degrees of freedom attributable to the regression; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[13]`|:math:`MSR`, the mean square attributable to the regression; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[14]`|:math:`F`, the :math:`F` value for the analysis of variance; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[15]`|:math:`SSD`, the sum of squares of deviations about the regression; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[16]`|:math:`DFD`, the degrees of freedom of deviations about the regression; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[17]`|:math:`MSD`, the mean square of deviations about the regression; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[18]`|:math:`SST`, the total sum of squares | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[19]`|:math:`DFT`, the total degrees of freedom; | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[20]`|:math:`n_c`, the number of observations used in the calculations. | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ .. _g02cd-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) After observations with missing values were omitted, fewer than two cases remained. (`errno` :math:`3`) After observations with missing values were omitted, all remaining values of at least one of :math:`\mathrm{x}` and :math:`\mathrm{y}` were identical. .. _g02cd-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` ``linregs_noconst_miss`` fits a straight line of the form .. math:: y = bx to those of the data points .. math:: \left(x_1, y_1\right),\left(x_2, y_2\right),\ldots,\left(x_n, y_n\right) that do not include missing values, such that .. math:: y_i = bx_i+e_i for those :math:`\left(x_i, y_i\right)`, for :math:`i = 1,2,\ldots,n\quad \text{ }\quad \left(n\geq 2\right)` which do not include missing values. The function eliminates all pairs of observations :math:`\left(x_i, y_i\right)` which contain a missing value for either :math:`x` or :math:`y`, and then calculates the regression coefficient, :math:`b`, and various other statistical quantities by minimizing the sum of the :math:`e_i^2` over those cases remaining in the calculations. The input data consists of the :math:`n` pairs of observations :math:`\left(x_1, y_1\right),\left(x_2, y_2\right),\ldots,\left(x_n, y_n\right)` on the independent variable :math:`x` and the dependent variable :math:`y`. In addition two values, :math:`\textit{xm}` and :math:`\textit{ym}`, are given which are considered to represent missing observations for :math:`x` and :math:`y` respectively. (See `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02cdf.html#accuracy>`__). Let :math:`w_{\textit{i}} = 0`, if the :math:`\textit{i}`\ th observation of either :math:`x` or :math:`y` is missing, i.e., if :math:`x_{\textit{i}} = \textit{xm}` and/or :math:`y_{\textit{i}} = \textit{ym}`; and :math:`w_{\textit{i}} = 1` otherwise, for :math:`\textit{i} = 1,2,\ldots,n`. The quantities calculated are: (a) Means: .. math:: \bar{x} = \frac{{\sum_{{i = 1}}^nw_ix_i}}{{\sum_{{i = 1}}^nw_i}}\text{; }\quad \bar{y} = \frac{{\sum_{{i = 1}}^nw_iy_i}}{{\sum_{{i = 1}}^nw_i}}\text{.} (#) Standard deviations: .. math:: s_x = \sqrt{\frac{{\sum_{{i = 1}}^nw_i\left(x_i-\bar{x}\right)^2}}{{\sum_{{i = 1}}^nw_i-1}}}\text{; }\quad s_y = \sqrt{\frac{{\sum_{{i = 1}}^nw_i\left(y_i-\bar{y}\right)^2}}{{\sum_{{i = 1}}^nw_i-1}}}\text{.} (#) Pearson product-moment correlation coefficient: .. math:: r = \frac{{\sum_{{i = 1}}^nw_i\left(x_i-\bar{x}\right)\left(y_i-\bar{y}\right)}}{{\sum_{{i = 1}}^nw_i\left(x_i-\bar{x}\right)^2\sum_{{i = 1}}^nw_i\left(y_i-\bar{y}\right)^2}}\text{.} (#) The regression coefficient, :math:`b`: .. math:: b = \frac{{\sum_{{i = 1}}^nw_ix_iy_i}}{{\sum_{{i = 1}}^nw_ix_i^2}}\text{.} (#) The sum of squares attributable to the regression, :math:`SSR`, the sum of squares of deviations about the regression, :math:`SSD`, and the total sum of squares, :math:`SST`: .. math:: SST = \sum_{{i = 1}}^nw_iy_i^2\text{; }\quad SSD = \sum_{{i = 1}}^nw_i\left(y_i-bx_i\right)^2\text{; }\quad SSR = SST-SSD\text{.} (#) The degrees of freedom attributable to the regression, :math:`DFR`, the degrees of freedom of deviations about the regression, :math:`DFD`, and the total degrees of freedom, :math:`DFT`: .. math:: DFT = \sum_{{i = 1}}^nw_i\text{; }\quad DFD = \sum_{{i = 1}}^nw_i-1\text{; }\quad DFR = 1\text{.} (#) The mean square attributable to the regression, :math:`MSR`, and the mean square of deviations about the regression, :math:`MSD`: .. math:: MSR = SSR/DFR\text{; }\quad MSD = SSD/DFD\text{.} (#) The :math:`F` value for the analysis of variance: .. math:: F = MSR/MSD\text{.} (#) The standard error of the regression coefficient: .. math:: se\left(b\right) = \sqrt{\frac{{MSD}}{{\sum_{{i = 1}}^nw_ix_i^2}}}\text{.} (#) The :math:`t` value for the regression coefficient: .. math:: t\left(b\right) = \frac{b}{{se\left(b\right)}}\text{.} (#) The number of observations used in the calculations: .. math:: n_c = \sum_{{i = 1}}^nw_i\text{.} .. _g02cd-py2-py-references: **References** Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley """ raise NotImplementedError
[docs]def linregm_service_select(xbar, std, ssp, r, korder): r""" ``linregm_service_select`` takes selected elements from two vectors (typically vectors of means and standard deviations) to form two smaller vectors, and selected rows and columns from two matrices (typically either matrices of sums of squares and cross-products of deviations from means and Pearson product-moment correlation coefficients, or matrices of sums of squares and cross-products about zero and correlation-like coefficients) to form two smaller matrices, allowing reordering of elements in the process. .. _g02ce-py2-py-doc: For full information please refer to the NAG Library document for g02ce https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02cef.html .. _g02ce-py2-py-parameters: **Parameters** **xbar** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{xbar}[\textit{i}-1]` must be set to :math:`\bar{x}_{\textit{i}}`, the mean of variable :math:`\textit{i}`, for :math:`\textit{i} = 1,2,\ldots,n`. **std** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{std}[\textit{i}-1]` must be set to :math:`s_{\textit{i}}`, the standard deviation of variable :math:`\textit{i}`, for :math:`\textit{i} = 1,2,\ldots,n`. **ssp** : float, array-like, shape :math:`\left(n, n\right)` :math:`\mathrm{ssp}[\textit{i}-1,\textit{j}-1]` must be set to the sum of cross-products of deviations from means :math:`S_{{\textit{i}\textit{j}}}` (or about zero, :math:`\tilde{S}_{{\textit{i}\textit{j}}}`) for variables :math:`\textit{i}` and :math:`\textit{j}`, for :math:`\textit{j} = 1,2,\ldots,n`, for :math:`\textit{i} = 1,2,\ldots,n`. **r** : float, array-like, shape :math:`\left(n, n\right)` :math:`\mathrm{r}[\textit{i}-1,\textit{j}-1]` must be set to the Pearson product-moment correlation coefficient :math:`R_{{\textit{i}\textit{j}}}` (or the correlation-like coefficient, :math:`\tilde{R}_{{\textit{i}\textit{j}}}`) for variables :math:`\textit{i}` and :math:`\textit{j}`, for :math:`\textit{j} = 1,2,\ldots,n`, for :math:`\textit{i} = 1,2,\ldots,n`. **korder** : int, array-like, shape :math:`\left(m\right)` :math:`\mathrm{korder}[\textit{i}-1]` must be set to the number of the original variable which is to be the :math:`\textit{i}`\ th variable in the output vectors and matrices, for :math:`\textit{i} = 1,2,\ldots,m`. **Returns** **xbar2** : float, ndarray, shape :math:`\left(m\right)` The mean of variable :math:`i`, :math:`\mathrm{xbar}[i-1]`, where :math:`i = \mathrm{korder}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,m`. (The array :math:`\mathrm{xbar2}` must differ from :math:`\mathrm{xbar}` and :math:`\mathrm{std}`.) **std2** : float, ndarray, shape :math:`\left(m\right)` The standard deviation of variable :math:`i`, :math:`\mathrm{std}[i-1]`, where :math:`i = \mathrm{korder}[\textit{k}-1]`, for :math:`\textit{k} = 1,2,\ldots,m`. (The array :math:`\mathrm{std2}` must differ from both :math:`\mathrm{xbar}` and :math:`\mathrm{std}`.) **ssp2** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{ssp2}[\textit{k}-1,\textit{l}-1]` contains the value of :math:`\mathrm{ssp}[i-1,j-1]`, where :math:`i = \mathrm{korder}[\textit{k}-1]` and :math:`j = \mathrm{korder}[\textit{l}-1]`, for :math:`\textit{l} = 1,2,\ldots,m`, for :math:`\textit{k} = 1,2,\ldots,m`. (The array :math:`\mathrm{ssp2}` must differ from both :math:`\mathrm{ssp}` and :math:`\mathrm{r}`.) That is to say: on exit, :math:`\mathrm{ssp2}[k-1,l-1]` contains the sum of cross-products of deviations from means :math:`S_{{ij}}` (or about zero, :math:`\tilde{S}_{{ij}}`). **r2** : float, ndarray, shape :math:`\left(m, m\right)` :math:`\mathrm{r2}[\textit{k}-1,\textit{l}-1]` contains the value of :math:`\mathrm{r}[i-1,j-1]`, where :math:`i = \mathrm{korder}[\textit{k}-1]` and :math:`j = \mathrm{korder}[\textit{l}-1]`, for :math:`\textit{l} = 1,2,\ldots,m`, for :math:`\textit{k} = 1,2,\ldots,m`. (The array :math:`\mathrm{r2}` must differ from both :math:`\mathrm{ssp}` and :math:`\mathrm{r}`.) That is to say: on exit, :math:`\mathrm{r2}[k-1,l-1]` contains the Pearson product-moment coefficient :math:`R_{{ij}}` (or the correlation-like coefficient, :math:`\tilde{R}_{{ij}}`). .. _g02ce-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constratint: :math:`n \geq m`. (`errno` :math:`4`) On entry, :math:`\mathrm{korder}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{korder}[i-1]\leq n`, for :math:`i = 1,2,\ldots,m`. .. _g02ce-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` Input to the function consists of: (a) A vector of means: .. math:: \left({\bar{x}_1}, {\bar{x}_2}, {\bar{x}_3}, \ldots, {\bar{x}_n}\right)\text{,} where :math:`n` is the number of input variables. (#) A vector of standard deviations: .. math:: \left(s_1, s_2, s_3, \ldots, s_n\right)\text{.} (#) A matrix of sums of squares and cross-products of deviations from means: .. math:: \begin{pmatrix}S_{11}&S_{12}&S_{13}&.&.&.&S_{{1n}}\\S_{21}&S_{22}&&&&&S_{{2n}}\\S_{31}&&&&&&.\\.&&&&&&.\\.&&&&&&.\\.&&&&&&.\\S_{{n1}}&S_{{n2}}&.&.&.&.&S_{{nn}}\end{pmatrix}\text{.} (#) A matrix of correlation coefficients: .. math:: \begin{pmatrix}R_{11}&R_{12}&R_{13}&.&.&.&R_{{1n}}\\R_{21}&R_{22}&&&&&R_{{2n}}\\R_{31}&&&&&&.\\.&&&&&&.\\.&&&&&&.\\.&&&&&&.\\R_{{n1}}&R_{{n2}}&.&.&.&.&R_{{nn}}\end{pmatrix}\text{.} (#) The number of variables, :math:`m`, in the required subset, and their row/column numbers in the input data, :math:`i_1,i_2,i_3,\ldots,i_m`, .. math:: i\leq i_k\leq n\quad \text{ for }k = 1,2,\ldots,m\quad \text{ }\quad \left({n\geq 2}, {m\geq 1\text{ and }m\leq n}\right)\text{.} New vectors and matrices are output containing the following information: (i) A vector of means: .. math:: \left({\bar{x}_{i_1}}, {\bar{x}_{i_2}}, {\bar{x}_{i_3}}, \ldots, {\bar{x}_{i_m}}\right)\text{.} (#) A vector of standard deviations: .. math:: \left({s_{i_1}}, {s_{i_2}}, {s_{i_3}}, \ldots, {s_{i_m}}\right)\text{.} (#) A matrix of sums of squares and cross-products of deviations from means: .. math:: \begin{pmatrix}S_{{i_1i_1}}&S_{{i_1i_2}}&S_{{i_1i_3}}&.&.&.&S_{{i_1i_m}}\\S_{{i_2i_1}}&S_{{i_2i_2}}&&&&&.\\S_{{i_3i_1}}&&&&&&.\\.&&&&&&.\\.&&&&&&.\\.&&&&&&.\\S_{{i_mi_1}}&S_{{i_mi_2}}&.&.&.&.&S_{{i_mi_m}}\end{pmatrix}\text{.} (#) A matrix of correlation coefficients: .. math:: \begin{pmatrix}R_{{i_1i_1}}&R_{{i_1i_2}}&R_{{i_1i_3}}&.&.&.&R_{{i_1i_m}}\\R_{{i_2i_1}}&R_{{i_2i_2}}&&&&&.\\R_{{i_3i_1}}&&&&&&.\\.&&&&&&.\\.&&&&&&.\\.&&&&&&.\\R_{{i_mi_1}}&R_{{i_mi_2}}&.&.&.&.&R_{{i_mi_m}}\end{pmatrix}\text{.} **Note:** for sums of squares of cross-products of deviations about zero and correlation-like coefficients :math:`S_{{ij}}` and :math:`R_{{ij}}` should be replaced by :math:`\tilde{S}_{{ij}}` and :math:`\tilde{R}_{{ij}}` in the description of the input and output above. """ raise NotImplementedError
[docs]def linregm_service_reorder(korder, xbar, std, ssp, r): r""" ``linregm_service_reorder`` reorders the elements in two vectors (typically vectors of means and standard deviations), and the rows and columns in two matrices (typically either matrices of sums of squares and cross-products of deviations from means and Pearson product-moment correlation coefficients, or matrices of sums of squares and cross-products about zero and correlation-like coefficients). .. _g02cf-py2-py-doc: For full information please refer to the NAG Library document for g02cf https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02cff.html .. _g02cf-py2-py-parameters: **Parameters** **korder** : int, array-like, shape :math:`\left(n\right)` :math:`\mathrm{korder}[\textit{i}-1]` must be set to the number of the original variable which is to be the :math:`\textit{i}`\ th variable in the re-arranged data, for :math:`\textit{i} = 1,2,\ldots,n`. **xbar** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{xbar}[\textit{i}-1]` must be set to the mean of variable :math:`\textit{i}`, for :math:`\textit{i} = 1,2,\ldots,n`. **std** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{std}[\textit{i}-1]` must be set to the standard deviation of variable :math:`\textit{i}`, for :math:`\textit{i} = 1,2,\ldots,n`. **ssp** : float, array-like, shape :math:`\left(n, n\right)` :math:`\mathrm{ssp}[\textit{i}-1,\textit{j}-1]` must be set to the sum of cross-products of deviations from means :math:`S_{{\textit{i}\textit{j}}}` (or about zero :math:`\tilde{S}_{{\textit{i}\textit{j}}}`) for variables :math:`\textit{i}` and :math:`\textit{j}`, for :math:`\textit{j} = 1,2,\ldots,n`, for :math:`\textit{i} = 1,2,\ldots,n`. **r** : float, array-like, shape :math:`\left(n, n\right)` :math:`\mathrm{r}[\textit{i}-1,\textit{j}-1]` must be set to the Pearson product-moment correlation coefficient :math:`R_{{\textit{i}\textit{j}}}` (or the correlation-like coefficient :math:`\tilde{R}_{{\textit{i}\textit{j}}}`) for variables :math:`\textit{i}` and :math:`\textit{j}`, for :math:`\textit{j} = 1,2,\ldots,n`, for :math:`\textit{i} = 1,2,\ldots,n`. **Returns** **xbar** : float, ndarray, shape :math:`\left(n\right)` :math:`\mathrm{xbar}[\textit{i}-1]` contains the mean of variable :math:`k` where :math:`k = \mathrm{korder}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,n`. **std** : float, ndarray, shape :math:`\left(n\right)` :math:`\mathrm{std}[\textit{i}-1]` contains the standard deviation of variable :math:`k` where :math:`k = \mathrm{korder}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,n`. **ssp** : float, ndarray, shape :math:`\left(n, n\right)` :math:`\mathrm{ssp}[i-1,j-1]` contains the sum of cross-products of deviations from means :math:`S_{{kl}}` (or about zero :math:`\tilde{S}_{{kl}}`) for variables :math:`k` and :math:`l`, where :math:`k = \mathrm{korder}[i-1]`, and :math:`l = \mathrm{korder}[j-1]`, :math:`i,j = 1,2,\ldots,n`. **r** : float, ndarray, shape :math:`\left(n, n\right)` :math:`\mathrm{r}[\textit{i}-1,\textit{j}-1]` contains the Pearson product-moment correlation coefficient :math:`R_{{kl}}` (or the correlation-like coefficient :math:`\tilde{R}_{{kl}}`) for variables :math:`k` and :math:`l`, where :math:`k = \mathrm{korder}[\textit{i}-1]` and :math:`l = \mathrm{korder}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,n`, for :math:`\textit{i} = 1,2,\ldots,n`. .. _g02cf-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`3`) On entry, :math:`\mathrm{korder}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{korder}[i-1]\leq n`, for :math:`i = 1,2,\ldots,m`. (`errno` :math:`4`) On entry, there is not a one-to-one correspondence between the old variables and the new variables; at least one of the original variables is not included in the new set, and consequently at least one other variable has been included more than once. .. _g02cf-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` Input to the function consists of: (a) A list of the order in which the :math:`n` variables are to be arranged on exit: .. math:: i_1,i_2,i_3,\ldots,i_n\text{.} (#) A vector of means: .. math:: \left({\bar{x}_1}, {\bar{x}_2}, {\bar{x}_3}, \ldots, {\bar{x}_n}\right)\text{.} (#) A vector of standard deviations: .. math:: \left(s_1, s_2, s_3, \ldots, s_n\right)\text{.} (#) A matrix of sums of squares and cross-products of deviations from means: .. math:: \begin{pmatrix}S_{11}&S_{12}&S_{13}&.&.&.&S_{{1n}}\\S_{21}&S_{22}&&&&&.\\S_{31}&&&&&&.\\.&&&&&&.\\.&&&&&&.\\.&&&&&&.\\S_{{n1}}&S_{{n2}}&.&.&.&.&S_{{nn}}\end{pmatrix}\text{.} (#) A matrix of correlation coefficients: .. math:: \begin{pmatrix}R_{11}&R_{12}&R_{13}&.&.&.&R_{{1n}}\\R_{21}&R_{22}&&&&&.\\R_{31}&&&&&&.\\.&&&&&&.\\.&&&&&&.\\.&&&&&&.\\R_{{n1}}&R_{{n2}}&.&.&.&.&R_{{nn}}\end{pmatrix}\text{.} On exit from the function, these same vectors and matrices are reordered, in the manner specified, and contain the following information: (i) The vector of means: .. math:: \left({\bar{x}_{i_1}}, {\bar{x}_{i_2}}, {\bar{x}_{i_3}}, \ldots, {\bar{x}_{i_n}}\right)\text{.} (#) The vector of standard deviations: .. math:: \left({s_{i_1}}, {s_{i_2}}, {s_{i_3}}, {\ldots s_{i_n}}\right)\text{.} (#) The matrix of sums of squares and cross-products of deviations from means: .. math:: \begin{pmatrix}S_{{i_1i_1}}&S_{{i_1i_2}}&S_{{i_1i_3}}&.&.&.&S_{{i_1i_n}}\\S_{{i_2i_1}}&S_{{i_2i_2}}&&&&&.\\S_{{i_3i_1}}&&&&&&.\\.&&&&&&.\\.&&&&&&.\\.&&&&&&.\\S_{{i_ni_1}}&S_{{i_ni_2}}&.&.&.&.&S_{{i_ni_n}}\end{pmatrix}\text{.} (#) The matrix of correlation coefficients: .. math:: \begin{pmatrix}R_{{i_1i_1}}&R_{{i_1i_2}}&R_{{i_1i_3}}&.&.&.&R_{{i_1i_n}}\\R_{{i_2i_1}}&R_{{i_2i_2}}&&&&&.\\R_{{i_3i_1}}&&&&&&.\\.&&&&&&.\\.&&&&&&.\\.&&&&&&.\\R_{{i_ni_1}}&R_{{i_ni_2}}&.&.&.&.&R_{{i_ni_n}}\end{pmatrix}\text{.} **Note:** for sums of squares of cross-products of deviations about zero and correlation-like coefficients :math:`S_{{ij}}` and :math:`R_{{ij}}` should be replaced by :math:`\tilde{S}_{{ij}}` and :math:`\tilde{R}_{{ij}}` in the description of the input and output above. """ raise NotImplementedError
[docs]def linregm_coeffs_const(n, xbar, ssp, r): r""" ``linregm_coeffs_const`` performs a multiple linear regression on a set of variables whose means, sums of squares and cross-products of deviations from means, and Pearson product-moment correlation coefficients are given. .. _g02cg-py2-py-doc: For full information please refer to the NAG Library document for g02cg https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02cgf.html .. _g02cg-py2-py-parameters: **Parameters** **n** : int The number of cases :math:`n`, used in calculating the sums of squares and cross-products and correlation coefficients. **xbar** : float, array-like, shape :math:`\left(k+1\right)` :math:`\mathrm{xbar}[\textit{i}-1]` must be set to :math:`\bar{x}_{\textit{i}}`, the mean value of the :math:`\textit{i}`\ th variable, for :math:`\textit{i} = 1,2,\ldots,k+1`; the mean of the dependent variable must be contained in :math:`\mathrm{xbar}[k]`. **ssp** : float, array-like, shape :math:`\left(k+1, k+1\right)` :math:`\mathrm{ssp}[\textit{i}-1,\textit{j}-1]` must be set to :math:`S_{{\textit{i}\textit{j}}}`, the sum of cross-products of deviations from means for the :math:`\textit{i}`\ th and :math:`\textit{j}`\ th variables, for :math:`\textit{j} = 1,2,\ldots,k+1`, for :math:`\textit{i} = 1,2,\ldots,k+1`; terms involving the dependent variable appear in row :math:`k+1` and column :math:`k+1`. **r** : float, array-like, shape :math:`\left(k+1, k+1\right)` :math:`\mathrm{r}[\textit{i}-1,\textit{j}-1]` must be set to :math:`R_{{\textit{i}\textit{j}}}`, the Pearson product-moment correlation coefficient for the :math:`\textit{i}`\ th and :math:`\textit{j}`\ th variables, for :math:`\textit{j} = 1,2,\ldots,k+1`, for :math:`\textit{i} = 1,2,\ldots,k+1`; terms involving the dependent variable appear in row :math:`k+1` and column :math:`k+1`. **Returns** **result** : float, ndarray, shape :math:`\left(13\right)` The following information: .. rst-class:: nag-rules-none nag-align-left +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[0]` |:math:`SSR`, the sum of squares attributable to the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[1]` |:math:`DFR`, the degrees of freedom attributable to the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[2]` |:math:`MSR`, the mean square attributable to the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[3]` |:math:`F`, the :math:`F` value for the analysis of variance; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[4]` |:math:`SSD`, the sum of squares of deviations about the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[5]` |:math:`DFD`, the degrees of freedom of deviations about the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[6]` |:math:`MSD`, the mean square of deviations about the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[7]` |:math:`SST`, the total sum of squares; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[8]` |:math:`DFT`, the total degrees of freedom; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[9]` |:math:`s`, the standard error estimate; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[10]`|:math:`R`, the coefficient of multiple correlation; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[11]`|:math:`R^2`, the coefficient of multiple determination; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[12]`|:math:`\bar{R}^2`, the coefficient of multiple determination corrected for the degrees of freedom.| +---------------------------+--------------------------------------------------------------------------------------------------+ **coef** : float, ndarray, shape :math:`\left(k, 3\right)` For :math:`i = 1,2,\ldots,k`, the following information: :math:`\mathrm{coef}[i-1,0]` :math:`b_i`, the regression coefficient for the :math:`i`\ th variable. :math:`\mathrm{coef}[i-1,1]` :math:`se\left(b_i\right)`, the standard error of the regression coefficient for the :math:`i`\ th variable. :math:`\mathrm{coef}[i-1,2]` :math:`t\left(b_i\right)`, the :math:`t` value of the regression coefficient for the :math:`i`\ th variable. **con** : float, ndarray, shape :math:`\left(3\right)` The following information: .. rst-class:: nag-rules-none nag-align-left +-----------------------+-------------------------------------------------------------------------+ |:math:`\mathrm{con}[0]`|:math:`a`, the regression constant; | +-----------------------+-------------------------------------------------------------------------+ |:math:`\mathrm{con}[1]`|:math:`se\left(a\right)`, the standard error of the regression constant; | +-----------------------+-------------------------------------------------------------------------+ |:math:`\mathrm{con}[2]`|:math:`t\left(a\right)`, the :math:`t` value for the regression constant.| +-----------------------+-------------------------------------------------------------------------+ **rinv** : float, ndarray, shape :math:`\left(k, k\right)` The inverse of the matrix of correlation coefficients for the independent variables; that is, the inverse of the matrix consisting of the first :math:`k` rows and columns of :math:`\mathrm{r}`. **c** : float, ndarray, shape :math:`\left(k, k\right)` The modified inverse matrix, where :math:`\mathrm{c}[\textit{i}-1,\textit{j}-1] = \mathrm{r}[\textit{i}-1,\textit{j}-1]\times \mathrm{rinv}[\textit{i}-1,\textit{j}-1]/\mathrm{ssp}[\textit{i}-1,\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,k`, for :math:`\textit{i} = 1,2,\ldots,k`. .. _g02cg-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`k = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`k \geq 1`. (`errno` :math:`3`) On entry, :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`k = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{n} > {k+1}`. (`errno` :math:`5`) The :math:`\textit{k}` by :math:`\textit{k}` partition of :math:`\mathrm{r}` which requires inversion is not positive definite. (`errno` :math:`6`) The refinement following inversion has failed. .. _g02cg-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` ``linregm_coeffs_const`` fits a curve of the form .. math:: y = a+b_1x_1+b_2x_2 + \cdots +b_kx_k to the data points .. math:: \begin{array}{c}\left(x_{11}, x_{21}, \ldots, x_{{k1}}, y_1\right)\\\left(x_{12}, x_{22}, \ldots, x_{{k2}}, y_2\right)\\ \vdots \\\left(x_{{1n}}, x_{{2n}}, \ldots, x_{{kn}}, y_n\right)\end{array} such that .. math:: y_i = a+b_1x_{{1i}}+b_2x_{{2i}} + \cdots +b_kx_{{ki}}+e_i\text{, }\quad i = 1,2,\ldots,n\text{.} The function calculates the regression coefficients, :math:`b_1,b_2,\ldots,b_k`, the regression constant, :math:`a`, and various other statistical quantities by minimizing .. math:: \sum_{{i = 1}}^ne_i^2\text{.} The actual data values :math:`\left(x_{{1i}}, x_{{2i}}, \ldots, x_{{ki}}, y_i\right)` are not provided as input to the function. Instead, input consists of: (i) The number of cases, :math:`n`, on which the regression is based. (#) The total number of variables, dependent and independent, in the regression, :math:`\left(k+1\right)`. (#) The number of independent variables in the regression, :math:`k`. (#) The means of all :math:`k+1` variables in the regression, both the independent variables :math:`\left(x_1, x_2, \ldots, x_k\right)` and the dependent variable :math:`\left(y\right)`, which is the :math:`\left(k+1\right)`\ th variable: i.e., :math:`\bar{x}_1,\bar{x}_2,\ldots,\bar{x}_k,\bar{y}`. (#) The :math:`\left(k+1\right)\times \left(k+1\right)` matrix [:math:`S_{{ij}}`] of sums of squares and cross-products of deviations from means of all the variables in the regression; the terms involving the dependent variable, :math:`y`, appear in the :math:`\left(k+1\right)`\ th row and column. (#) The :math:`\left(k+1\right)\times \left(k+1\right)` matrix [:math:`R_{{ij}}`] of the Pearson product-moment correlation coefficients for all the variables in the regression; the correlations involving the dependent variable, :math:`y`, appear in the :math:`\left(k+1\right)`\ th row and column. The quantities calculated are: (a) The inverse of the :math:`k\times k` partition of the matrix of correlation coefficients, [:math:`R_{{ij}}`], involving only the independent variables. The inverse is obtained using an accurate method which assumes that this sub-matrix is positive definite. (#) The modified inverse matrix, :math:`C = \left[c_{{ij}}\right]`, where .. math:: c_{{ij}} = \frac{{R_{{ij}}r_{{ij}}}}{S_{{ij}}}\text{, }\quad i,j = 1,2,\ldots,k\text{,} where :math:`r_{{ij}}` is the :math:`\left(i, j\right)`\ th element of the inverse matrix of [:math:`R_{{ij}}`] as described in \(a) above. Each element of :math:`C` is thus the corresponding element of the matrix of correlation coefficients multiplied by the corresponding element of the inverse of this matrix, divided by the corresponding element of the matrix of sums of squares and cross-products of deviations from means. (#) The regression coefficients: .. math:: b_i = \sum_{{j = i}}^kc_{{ij}}S_{{j\left(k+1\right)}}\text{, }\quad i = 1,2,\ldots,k\text{,} where :math:`S_{{j\left(k+1\right)}}` is the sum of cross-products of deviations from means for the independent variable :math:`x_j` and the dependent variable :math:`y`. (#) The sum of squares attributable to the regression, :math:`SSR`, the sum of squares of deviations about the regression, :math:`SSD`, and the total sum of squares, :math:`SST`: :math:`SST = S_{{\left(k+1\right)\left(k+1\right)}}`, the sum of squares of deviations from the mean for the dependent variable, :math:`y`; :math:`SSR = \sum_{{j = 1}}^kb_jS_{{j\left(k+1\right)}}\text{; }\quad SSD = SST-SSR` (#) The degrees of freedom attributable to the regression, :math:`DFR`, the degrees of freedom of deviations about the regression, :math:`DFD`, and the total degrees of freedom, :math:`DFT`: .. math:: DFR = k\text{; }\quad DFD = n-k-1\text{; }\quad DFT = n-1\text{.} (#) The mean square attributable to the regression, :math:`MSR`, and the mean square of deviations about the regression, :math:`MSD`: .. math:: MSR = SSR/DFR\text{; }\quad MSD = SSD/DFD\text{.} (#) The :math:`F` values for the analysis of variance: .. math:: F = MSR/MSD\text{.} (#) The standard error estimate: .. math:: s = \sqrt{MSD}\text{.} (#) The coefficient of multiple correlation, :math:`R`, the coefficient of multiple determination, :math:`R^2` and the coefficient of multiple determination corrected for the degrees of freedom, :math:`\bar{R}^2`; .. math:: R = \sqrt{1-\frac{{SSD}}{{SST}}}\text{; }\quad R^2 = 1-\frac{{SSD}}{{SST}}\text{; }\quad \bar{R}^2 = 1-\frac{{SSD\times DFT}}{{SST\times DFD}}\text{.} (#) The standard error of the regression coefficients: .. math:: se\left(b_i\right) = \sqrt{MSD\times c_{{ii}}}\text{, }\quad i = 1,2,\ldots,k\text{.} (#) The :math:`t` values for the regression coefficients: .. math:: t\left(b_i\right) = \frac{{b_i}}{{se\left(b_i\right)}}\text{, }\quad i = 1,2,\ldots,k\text{.} (#) The regression constant, :math:`a`, its standard error, :math:`se\left(a\right)`, and its :math:`t` value, :math:`t\left(a\right)`: .. math:: a = \bar{y}-\sum_{{i = 1}}^kb_i\bar{x}_i\text{; }\quad se\left(a\right) = \sqrt{MSD\times \left(\frac{1}{n}+\sum_{{i = 1}}^k\sum_{{j = 1}}^k\bar{x}_ic_{{ij}}\bar{x}_j\right)}\text{; }\quad t\left(a\right) = \frac{a}{{se\left(a\right)}}\text{.} .. _g02cg-py2-py-references: **References** Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley """ raise NotImplementedError
[docs]def linregm_coeffs_noconst(n, sspz, rz): r""" ``linregm_coeffs_noconst`` performs a multiple linear regression with no constant on a set of variables whose sums of squares and cross-products about zero and correlation-like coefficients are given. .. _g02ch-py2-py-doc: For full information please refer to the NAG Library document for g02ch https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02chf.html .. _g02ch-py2-py-parameters: **Parameters** **n** : int :math:`n`, the number of cases used in calculating the sums of squares and cross-products and correlation-like coefficients. **sspz** : float, array-like, shape :math:`\left(k+1, k+1\right)` :math:`\mathrm{sspz}[\textit{i}-1,\textit{j}-1]` must be set to :math:`\tilde{S}_{{\textit{i}\textit{j}}}`, the sum of cross-products about zero for the :math:`\textit{i}`\ th and :math:`\textit{j}`\ th variables, for :math:`\textit{j} = 1,2,\ldots,k+1`, for :math:`\textit{i} = 1,2,\ldots,k+1`; terms involving the dependent variable appear in row :math:`k+1` and column :math:`k+1`. **rz** : float, array-like, shape :math:`\left(k+1, k+1\right)` :math:`\mathrm{rz}[\textit{i}-1,\textit{j}-1]` must be set to :math:`\tilde{R}_{{\textit{i}\textit{j}}}`, the correlation-like coefficient for the :math:`\textit{i}`\ th and :math:`\textit{j}`\ th variables, for :math:`\textit{j} = 1,2,\ldots,k+1`, for :math:`\textit{i} = 1,2,\ldots,k+1`; coefficients involving the dependent variable appear in row :math:`k+1` and column :math:`k+1`. **Returns** **result** : float, ndarray, shape :math:`\left(13\right)` The following information: .. rst-class:: nag-rules-none nag-align-left +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[0]` |:math:`SSR`, the sum of squares attributable to the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[1]` |:math:`DFR`, the degrees of freedom attributable to the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[2]` |:math:`MSR`, the mean square attributable to the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[3]` |:math:`F`, the :math:`F` value for the analysis of variance; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[4]` |:math:`SSD`, the sum of squares of deviations about the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[5]` |:math:`DFD`, the degrees of freedom of deviations about the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[6]` |:math:`MSD`, the mean square of deviations about the regression; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[7]` |:math:`SST`, the total sum of squares; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[8]` |:math:`DFT`, the total degrees of freedom; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[9]` |:math:`s`, the standard error estimate; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[10]`|:math:`R`, the coefficient of multiple correlation; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[11]`|:math:`R^2`, the coefficient of multiple determination; | +---------------------------+--------------------------------------------------------------------------------------------------+ |:math:`\mathrm{result}[12]`|:math:`\bar{R}^2`, the coefficient of multiple determination corrected for the degrees of freedom.| +---------------------------+--------------------------------------------------------------------------------------------------+ **coef** : float, ndarray, shape :math:`\left(k, 3\right)` For :math:`i = 1,2,\ldots,k`, the following information: :math:`\mathrm{coef}[i-1,0]` :math:`b_i`, the regression coefficient for the :math:`i`\ th variable. :math:`\mathrm{coef}[i-1,1]` :math:`se\left(b_i\right)`, the standard error of the regression coefficient for the :math:`i`\ th variable. :math:`\mathrm{coef}[i-1,2]` :math:`t\left(b_i\right)`, the :math:`t` value of the regression coefficient for the :math:`i`\ th variable. **rznv** : float, ndarray, shape :math:`\left(k, k\right)` The inverse of the matrix of correlation-like coefficients for the independent variables; that is, the inverse of the matrix consisting of the first :math:`k` rows and columns of :math:`\mathrm{rz}`. **cz** : float, ndarray, shape :math:`\left(k, k\right)` The modified inverse matrix, :math:`C`, where .. math:: \mathrm{cz}[i-1,j-1] = \frac{{\mathrm{rz}[i-1,j-1]\times \mathrm{rznv}[i-1,j-1]}}{{\mathrm{sspz}[i-1,j-1]}}\text{, }\quad i,j = 1,2,\ldots,k\text{.} .. _g02ch-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`k = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`k \geq 1`. (`errno` :math:`3`) On entry, :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`k = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{n} \geq {k+1}`. (`errno` :math:`5`) The :math:`\textit{K}\times \textit{K}` partition of :math:`\mathrm{rz}` which requires inversion is not positive definite. (`errno` :math:`6`) The refinement following the actual inversion has failed. .. _g02ch-py2-py-notes: **Notes** `No equivalent traditional C interface for this routine exists in the NAG Library.` ``linregm_coeffs_noconst`` fits a curve of the form .. math:: y = b_1x_1+b_2x_2 + \cdots +b_kx_k to the data points .. math:: \begin{array}{c}\left(x_{11}, x_{21}, \ldots, x_{{k1}}, y_1\right)\\\left(x_{12}, x_{22}, \ldots, x_{{k2}}, y_2\right)\\ \vdots \\\left(x_{{1n}}, x_{{2n}}, \ldots, x_{{kn}}, y_n\right)\end{array} such that .. math:: y_i = b_1x_{{1i}}+b_2x_{{2i}} + \cdots +b_kx_{{ki}}+e_i\text{, }\quad i = 1,2,\ldots,n\text{.} The function calculates the regression coefficients, :math:`b_1,b_2,\ldots,b_k`, (and various other statistical quantities) by minimizing .. math:: \sum_{{i = 1}}^ne_i^2\text{.} The actual data values :math:`\left(x_{{1i}}, x_{{2i}}, \ldots, x_{{ki}}, y_i\right)` are not provided as input to the function. Instead, input to the function consists of: (i) The number of cases, :math:`n`, on which the regression is based. (#) The total number of variables, dependent and independent, in the regression, :math:`\left(k+1\right)`. (#) The number of independent variables in the regression, :math:`k`. (#) The :math:`\left(k+1\right)\times \left(k+1\right)` matrix :math:`\left[\tilde{S}_{{ij}}\right]` of sums of squares and cross-products about zero of all the variables in the regression; the terms involving the dependent variable, :math:`y`, appear in the :math:`\left(k+1\right)`\ th row and column. (#) The :math:`\left(k+1\right)\times \left(k+1\right)` matrix :math:`\left[\tilde{R}_{{ij}}\right]` of correlation-like coefficients for all the variables in the regression; the correlations involving the dependent variable, :math:`y`, appear in the :math:`\left(k+1\right)`\ th row and column. The quantities calculated are: (a) The inverse of the :math:`k\times k` partition of the matrix of correlation-like coefficients, :math:`\left[\tilde{R}_{{ij}}\right]`, involving only the independent variables. The inverse is obtained using an accurate method which assumes that this sub-matrix is positive definite (see `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02chf.html#fcomments>`__). (#) The modified matrix, :math:`C = \left[c_{{ij}}\right]`, where .. math:: c_{{ij}} = \frac{{\tilde{R}_{{ij}}\tilde{r}^{{ij}}}}{\tilde{S}_{{ij}}}\text{, }\quad i,j = 1,2,\ldots,k\text{,} where :math:`\tilde{r}^{{ij}}` is the :math:`\left(i, j\right)`\ th element of the inverse matrix of :math:`\left[\tilde{R}_{{ij}}\right]` as described in \(a) above. Each element of :math:`C` is thus the corresponding element of the matrix of correlation-like coefficients multiplied by the corresponding element of the inverse of this matrix, divided by the corresponding element of the matrix of sums of squares and cross-products about zero. (#) The regression coefficients: .. math:: b_i = \sum_{{j = 1}}^kc_{{ij}}\tilde{S}_{{j\left(k+1\right)}}\text{, }\quad i = 1,2,\ldots,k\text{,} where :math:`\tilde{S}_{{j\left(k+1\right)}}` is the sum of cross-products about zero for the independent variable :math:`x_j` and the dependent variable :math:`y`. (#) The sum of squares attributable to the regression, :math:`SSR`, the sum of squares of deviations about the regression, :math:`SSD`, and the total sum of squares, :math:`SST`: :math:`SST = \tilde{S}_{{\left(k+1\right)\left(k+1\right)}}`, the sum of squares about zero for the dependent variable, :math:`y`; :math:`SSR = \sum_{{j = 1}}^kb_j\tilde{S}_{{j\left(k+1\right)}}\text{; }\quad SSD = SST-SSR`. (#) The degrees of freedom attributable to the regression, :math:`DFR`, the degrees of freedom of deviations about the regression, :math:`DFD`, and the total degrees of freedom, :math:`DFT`: .. math:: DFR = k\text{; }\quad DFD = n-k\text{; }\quad DFT = n\text{.} (#) The mean square attributable to the regression, :math:`MSR`, and the mean square of deviations about the regression, :math:`MSD`: .. math:: MSR = SSR/DFR\text{; }\quad MSD = SSD/DFD\text{.} (#) The :math:`F` value for the analysis of variance: .. math:: F = MSR/MSD\text{.} (#) The standard error estimate: .. math:: s = \sqrt{MSD}\text{.} (#) The coefficient of multiple correlation, :math:`R`, the coefficient of multiple determination, :math:`R^2`, and the coefficient of multiple determination corrected for the degrees of freedom, :math:`\bar{R}^2`: .. math:: R = \sqrt{1-\frac{{SSD}}{{SST}}}\text{; }\quad R^2 = 1-\frac{{SSD}}{{SST}}\text{; }\quad \bar{R}^2 = 1-\frac{{SSD\times DFT}}{{SST\times DFD}}\text{.} (#) The standard error of the regression coefficients: .. math:: se\left(b_i\right) = \sqrt{MSD\times c_{{ii}}}\text{, }\quad i = 1,2,\ldots,k\text{.} (#) The :math:`t` values for the regression coefficients: .. math:: t\left(b_i\right) = \frac{{b_i}}{{se\left(b_i\right)}}\text{, }\quad i = 1,2,\ldots,k\text{.} .. _g02ch-py2-py-references: **References** Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley """ raise NotImplementedError
[docs]def linregm_fit(x, isx, y, mean='M', wt=None, tol=0.000001): r""" ``linregm_fit`` performs a general multiple linear regression when the independent variables may be linearly dependent. Parameter estimates, standard errors, residuals and influence statistics are computed. ``linregm_fit`` may be used to perform a weighted regression. .. _g02da-py2-py-doc: For full information please refer to the NAG Library document for g02da https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02daf.html .. _g02da-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th independent variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **isx** : int, array-like, shape :math:`\left(m\right)` Indicates which independent variables are to be included in the model. :math:`\mathrm{isx}[j-1] > 0` The variable contained in the :math:`j`\ th column of :math:`\mathrm{x}` is included in the regression model. **y** : float, array-like, shape :math:`\left(n\right)` :math:`y`, the observations on the dependent variable. **mean** : str, length 1, optional Indicates if a mean term is to be included. :math:`\mathrm{mean} = \texttt{'M'}` A mean term, intercept, will be included in the model. :math:`\mathrm{mean} = \texttt{'Z'}` The model will pass through the origin, zero-point. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If provided :math:`\mathrm{wt}` must contain the weights to be used with the model. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. The values of :math:`\mathrm{res}` and :math:`\mathrm{h}` will be set to zero for observations with zero weights. If :math:`\mathrm{wt}` is not provided the effective number of observations is :math:`n`. **tol** : float, optional The value of :math:`\mathrm{tol}` is used to decide if the independent variables are of full rank and if not what is the rank of the independent variables. The smaller the value of :math:`\mathrm{tol}` the stricter the criterion for selecting the singular value decomposition. If :math:`\mathrm{tol} = 0.0`, the singular value decomposition will never be used; this may cause run time errors or inaccurate results if the independent variables are not of full rank. **Returns** **rss** : float The residual sum of squares for the regression. **idf** : int The degrees of freedom associated with the residual sum of squares. **b** : float, ndarray, shape :math:`\left(\textit{ip}\right)` :math:`\mathrm{b}[i-1]`, :math:`i = 1,2,\ldots,\textit{ip}` contains the least squares estimates of the parameters of the regression model, :math:`\hat{\beta }`. If :math:`\mathrm{mean} = \texttt{'M'}`, :math:`\mathrm{b}[0]` will contain the estimate of the mean parameter and :math:`\mathrm{b}[i]` will contain the coefficient of the variable contained in column :math:`j` of :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. If :math:`\mathrm{mean} = \texttt{'Z'}`, :math:`\mathrm{b}[i-1]` will contain the coefficient of the variable contained in column :math:`j` of :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. **se** : float, ndarray, shape :math:`\left(\textit{ip}\right)` :math:`\mathrm{se}[i-1]`, :math:`i = 1,2,\ldots,\textit{ip}` contains the standard errors of the :math:`\textit{ip}` parameter estimates given in :math:`\mathrm{b}`. **cov** : float, ndarray, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The first :math:`\textit{ip}\times \left(\textit{ip}+1\right)/2` elements of :math:`\mathrm{cov}` contain the upper triangular part of the variance-covariance matrix of the :math:`\textit{ip}` parameter estimates given in :math:`\mathrm{b}`. They are stored packed by column, i.e., the covariance between the parameter estimate given in :math:`\mathrm{b}[i-1]` and the parameter estimate given in :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[j\times \left(j-1\right)/2+i-1]`. **res** : float, ndarray, shape :math:`\left(n\right)` The (weighted) residuals, :math:`r_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **h** : float, ndarray, shape :math:`\left(n\right)` The diagonal elements of :math:`H`, :math:`h_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **q** : float, ndarray, shape :math:`\left(n, \textit{ip}+1\right)` The results of the :math:`QR` decomposition: the first column of :math:`\mathrm{q}` contains :math:`c`; the upper triangular part of columns :math:`2` to :math:`\textit{ip}+1` contain the :math:`R` matrix; the strictly lower triangular part of columns :math:`2` to :math:`\textit{ip}+1` contain details of the :math:`Q` matrix. **svd** : bool If a singular value decomposition has been performed then :math:`\mathrm{svd}` will be :math:`\mathbf{True}`, otherwise :math:`\mathrm{svd}` will be :math:`\mathbf{False}`. **irank** : int The rank of the independent variables. If :math:`\mathrm{svd} = \mathbf{False}`, :math:`\mathrm{irank} = \textit{ip}`. If :math:`\mathrm{svd} = \mathbf{True}`, :math:`\mathrm{irank}` is an estimate of the rank of the independent variables. :math:`\mathrm{irank}` is calculated as the number of singular values greater that :math:`\mathrm{tol}\times \text{}` (largest singular value). It is possible for the SVD to be carried out but :math:`\mathrm{irank}` to be returned as :math:`\textit{ip}`. **p** : float, ndarray, shape :math:`\left(2\times \textit{ip}+\textit{ip}\times \textit{ip}\right)` Details of the :math:`QR` decomposition and SVD if used. If :math:`\mathrm{svd} = \mathbf{False}`, only the first :math:`\textit{ip}` elements of :math:`\mathrm{p}` are used these will contain the zeta values for the :math:`QR` decomposition (see :meth:`lapackeig.dgeqrf <naginterfaces.library.lapackeig.dgeqrf>` for details). If :math:`\mathrm{svd} = \mathbf{True}`, the first :math:`\textit{ip}` elements of :math:`\mathrm{p}` will contain the zeta values for the :math:`QR` decomposition (see :meth:`lapackeig.dgeqrf <naginterfaces.library.lapackeig.dgeqrf>` for details) and the next :math:`\textit{ip}` elements of :math:`\mathrm{p}` contain singular values. The following :math:`\textit{ip}` by :math:`\textit{ip}` elements contain the matrix :math:`P^*` stored by columns. **wk** : float, ndarray, shape :math:`\left(\max\left(2,{ 5 \times \left(\textit{ip}-1\right) + \textit{ip} \times \textit{ip} }\right)\right)` If on exit :math:`\mathrm{svd} = \mathbf{True}`, :math:`\mathrm{wk}` contains information which is needed by :meth:`linregm_fit_newvar`; otherwise :math:`\mathrm{wk}` is used as workspace. .. _g02da-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \leq n`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'U'}` or :math:`\texttt{'W'}`. (`errno` :math:`2`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. (`errno` :math:`3`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] < 0.0`. Constraint: :math:`\mathrm{wt}[i-1]\geq 0.0`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`4`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] < 0`. Constraint: :math:`\mathrm{isx}[i-1]\geq 0.0`, for :math:`i = 1,2,\ldots,m`. (`errno` :math:`4`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip}` must be compatible with the number of nonzero elements in :math:`\mathrm{isx}`. (`errno` :math:`6`) SVD solution failed to converge. **Warns** **NagAlgorithmicWarning** (`errno` :math:`5`) The degrees of freedom for the residuals are zero, i.e., the designated number of arguments is equal to the effective number of observations. In this case the parameter estimates will be returned along with the diagonal elements of :math:`H`, but neither standard errors nor the variance-covariance matrix will be calculated. .. _g02da-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` The general linear regression model is defined by .. math:: y = X\beta +\epsilon \text{,} where :math:`y` is a vector of :math:`n` observations on the dependent variable, :math:`X` is an :math:`n\times p` matrix of the independent variables of column rank :math:`k`, :math:`\beta` is a vector of length :math:`p` of unknown parameters, and :math:`\epsilon` is a vector of length :math:`n` of unknown random errors such that :math:`\mathrm{var}\left(\epsilon \right) = V\sigma^2`, where :math:`V` is a known diagonal matrix. If :math:`V = I`, the identity matrix, then least squares estimation is used. If :math:`V\neq I`, then for a given weight matrix :math:`W\propto V^{-1}`, weighted least squares estimation is used. The least squares estimates :math:`\hat{\beta }` of the parameters :math:`\beta` minimize :math:`\left(y-X\beta \right)^\mathrm{T}\left(y-X\beta \right)` while the weighted least squares estimates minimize :math:`\left(y-X\beta \right)^\mathrm{T}W\left(y-X\beta \right)`. ``linregm_fit`` finds a :math:`QR` decomposition of :math:`X` (or :math:`W^{{1/2}}X` in weighted case), i.e., .. math:: X = QR^*\quad \text{ }\quad \left(\text{or }\quad W^{{1/2}}X = QR^*\right)\text{,} where :math:`R^* = \begin{pmatrix}R\\0\end{pmatrix}` and :math:`R` is a :math:`p\times p` upper triangular matrix and :math:`Q` is an :math:`n\times n` orthogonal matrix. If :math:`R` is of full rank, then :math:`\hat{\beta }` is the solution to .. math:: R\hat{\beta } = c_1\text{,} where :math:`c = Q^\mathrm{T}y` (or :math:`Q^\mathrm{T}W^{{1/2}}y`) and :math:`c_1` is the first :math:`p` elements of :math:`c`. If :math:`R` is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of :math:`R`, .. math:: R = Q_*\begin{pmatrix}D&0\\0&0\end{pmatrix}P^\mathrm{T}\text{,} where :math:`D` is a :math:`k\times k` diagonal matrix with nonzero diagonal elements, :math:`k` being the rank of :math:`R`, and :math:`Q_*` and :math:`P` are :math:`p\times p` orthogonal matrices. This gives the solution .. math:: \hat{\beta } = P_1D^{-1}Q_{*_1}^\mathrm{T}c_1\text{,} :math:`P_1` being the first :math:`k` columns of :math:`P`, i.e., :math:`P = \begin{pmatrix}P_1&P_0\end{pmatrix}`, and :math:`Q_{*_1}` being the first :math:`k` columns of :math:`Q_*`. Details of the SVD, are made available, in the form of the matrix :math:`P^*`: .. math:: P^* = \begin{pmatrix}D^{-1} P_1^\mathrm{T} \\ P_0^\mathrm{T} \end{pmatrix}\text{.} This will be only one of the possible solutions. Other estimates may be obtained by applying constraints to the parameters. These solutions can be obtained by using :meth:`linregm_constrain` after using ``linregm_fit``. Only certain linear combinations of the parameters will have unique estimates; these are known as estimable functions. The fit of the model can be examined by considering the residuals, :math:`r_i = y_i-\hat{y}`, where :math:`\hat{y} = X\hat{\beta }` are the fitted values. The fitted values can be written as :math:`Hy` for an :math:`n\times n` matrix :math:`H`. The :math:`i`\ th diagonal elements of :math:`H`, :math:`h_i`, give a measure of the influence of the :math:`i`\ th values of the independent variables on the fitted regression model. The values :math:`h_i` are sometimes known as leverages. Both :math:`r_i` and :math:`h_i` are provided by ``linregm_fit``. The output of ``linregm_fit`` also includes :math:`\hat{\beta }`, the residual sum of squares and associated degrees of freedom, :math:`\left(n-k\right)`, the standard errors of the parameter estimates and the variance-covariance matrix of the parameter estimates. In many linear regression models the first term is taken as a mean term or an intercept, i.e., :math:`X_{{i,1}} = 1`, for :math:`i = 1,2,\ldots,n`. This is provided as an option. Also only some of the possible independent variables are required to be included in a model, a facility to select variables to be included in the model is provided. Details of the :math:`QR` decomposition and, if used, the SVD, are made available. These allow the regression to be updated by adding or deleting an observation using :meth:`linregm_obs_edit`, adding or deleting a variable using :meth:`linregm_var_add` and :meth:`linregm_var_del` or estimating and testing an estimable function using :meth:`linregm_estfunc`. For the same matrix of independent variables, a new set of parameter estimates can be quickly calculated from a new vector of dependent variables using :meth:`linregm_fit_newvar`. The details of the factorizations held in :math:`\textit{q}`, :math:`\textit{p}` and :math:`\textit{wk}` are only for use by this suite of functions and cannot be used by other functions that use such factorizations, e.g., :meth:`lapackeig.dormqr <naginterfaces.library.lapackeig.dormqr>` since these will expect a different storage scheme for the input factorization. .. _g02da-py2-py-references: **References** Cook, R D and Weisberg, S, 1982, `Residuals and Influence in Regression`, Chapman and Hall Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley Golub, G H and Van Loan, C F, 1996, `Matrix Computations`, (3rd Edition), Johns Hopkins University Press, Baltimore Hammarling, S, 1985, `The singular value decomposition in multivariate statistics`, SIGNUM Newsl. (20(3)), 2--25 McCullagh, P and Nelder, J A, 1983, `Generalized Linear Models`, Chapman and Hall Searle, S R, 1971, `Linear Models`, Wiley See Also -------- :meth:`naginterfaces.library.examples.correg.linregm_fit_ex.main` """ raise NotImplementedError
[docs]def linregm_obs_edit(update, isx, q, x, ix, y, rss, mean='M', wt=None): r""" ``linregm_obs_edit`` adds or deletes an observation from a general regression model fitted by :meth:`linregm_fit`. .. _g02dc-py2-py-doc: For full information please refer to the NAG Library document for g02dc https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02dcf.html .. _g02dc-py2-py-parameters: **Parameters** **update** : str, length 1 Indicates if an observation is to be added or deleted. :math:`\mathrm{update} = \texttt{'A'}` The observation is added. :math:`\mathrm{update} = \texttt{'D'}` The observation is deleted. **isx** : int, array-like, shape :math:`\left(m\right)` If :math:`\mathrm{isx}[\textit{j}-1]` is greater than :math:`0`, the value contained in :math:`\mathrm{x}[\left(\textit{j}-1\right)\times \mathrm{ix}]` is to be included as a value of :math:`x^\mathrm{T}`, for :math:`\textit{j} = 1,2,\ldots,m`. **q** : float, array-like, shape :math:`\left(\textit{ip}, \textit{ip}+1\right)` Must be array :math:`\mathrm{q}` as output by :meth:`linregm_fit`, :meth:`linregm_var_add`, :meth:`linregm_var_del` or :meth:`linregm_fit_onestep`, or a previous call to ``linregm_obs_edit``. **x** : float, array-like, shape :math:`\left(\left(m-1\right)\times \mathrm{ix}+1\right)` The :math:`\textit{ip}` values for the dependent variables of the new observation, :math:`x^\mathrm{T}`. The positions will depend on the value of :math:`\mathrm{ix}`. **ix** : int The increment for elements of :math:`\mathrm{x}`. Two situations are common: :math:`\mathrm{ix} = 1` The values of :math:`x` are to be chosen from consecutive locations in :math:`\mathrm{x}`, i.e., :math:`\mathrm{x}[0],\mathrm{x}[1],\ldots,\mathrm{x}[m-1]`. :math:`\mathrm{ix} = {\textit{ldx}}` The values of :math:`x` are to be chosen from a row of a two-dimensional array with first dimension :math:`\textit{ldx}`, i.e., :math:`\mathrm{x}[0],\mathrm{x}[{\textit{ldx}}],\ldots,\mathrm{x}[\left(m-1\right){\textit{ldx}}]`. **y** : float The value of the dependent variable for the new observation, :math:`y_{\text{new}}`. **rss** : float The value of the residual sums of squares for the original set of observations. **mean** : str, length 1, optional Indicates if a mean has been used in the model. :math:`\mathrm{mean} = \texttt{'M'}` A mean term or intercept will have been included in the model by :meth:`linregm_fit`. :math:`\mathrm{mean} = \texttt{'Z'}` A model with no mean term or intercept will have been fitted by :meth:`linregm_fit`. **wt** : None or float, optional If provided, :math:`\mathrm{wt}` must contain the weight to be used with the new observation. If :math:`\mathrm{wt}` is **None**, the observation is not included in the model. **Returns** **q** : float, ndarray, shape :math:`\left(\textit{ip}, \textit{ip}+1\right)` The first :math:`\textit{ip}` elements of the first column of :math:`\mathrm{q}` will contain :math:`c_1^*` the upper triangular part of columns :math:`2` to :math:`\textit{ip}+1` will contain :math:`R^*` the remainder is unchanged. **rss** : float The updated values of the residual sums of squares. **Note:** this will only be valid if the model is of full rank. .. _g02dc-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\langle\mathit{\boldsymbol{value}}\rangle` elements of :math:`\mathrm{isx} > 0` instead of :math:`\textit{ip}-1` (for mean) :math:`\text{} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`1`) On entry, :math:`\mathrm{ix} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ix} \geq 1`. (`errno` :math:`1`) On entry, :math:`\langle\mathit{\boldsymbol{value}}\rangle` elements of :math:`\mathrm{isx} > 0` instead of :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`1`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{rss} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{rss}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{update} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{update} = \texttt{'A'}` or :math:`\texttt{'D'}`. (`errno` :math:`1`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'U'}` or :math:`\texttt{'W'}`. (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{wt} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{wt}\geq 0.0`. (`errno` :math:`3`) The :math:`R` matrix could not be updated. (`errno` :math:`4`) The residual sums of squares cannot be updated. .. _g02dc-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` :meth:`linregm_fit` fits a general linear regression model to a dataset. You may wish to change the model by either adding or deleting an observation from the dataset. ``linregm_obs_edit`` takes the results from :meth:`linregm_fit` and makes the required changes to the vector :math:`c` and the upper triangular matrix :math:`R` produced by :meth:`linregm_fit`. The regression coefficients, standard errors and the variance-covariance matrix of the regression coefficients can be obtained from :meth:`linregm_update` after all required changes to the dataset have been made. :meth:`linregm_fit` performs a :math:`QR` decomposition on the (weighted) :math:`X` matrix of independent variables. To add a new observation to a model with :math:`p` parameters, the upper triangular matrix :math:`R` and vector :math:`c_1` (the first :math:`p` elements of :math:`c`) are augmented by the new observation on independent variables in :math:`x^\mathrm{T}` and dependent variable :math:`y_{\text{new}}`. Givens rotations are then used to restore the upper triangular form. .. math:: \begin{pmatrix}R:c_1\\x:y_{\text{new}}\end{pmatrix}→\begin{pmatrix}R^*:c_1^*\\0:y_{\text{new}}^*\end{pmatrix}\text{.} **Note:** only :math:`R` and the upper part of :math:`c` are updated the remainder of the :math:`Q` matrix is unchanged. .. _g02dc-py2-py-references: **References** Golub, G H and Van Loan, C F, 1996, `Matrix Computations`, (3rd Edition), Johns Hopkins University Press, Baltimore Hammarling, S, 1985, `The singular value decomposition in multivariate statistics`, SIGNUM Newsl. (20(3)), 2--25 """ raise NotImplementedError
[docs]def linregm_update(q, rss, p, tol=0.000001): r""" ``linregm_update`` calculates the regression parameters for a general linear regression model. It is intended to be called after :meth:`linregm_obs_edit`, :meth:`linregm_var_add` or :meth:`linregm_var_del`. .. _g02dd-py2-py-doc: For full information please refer to the NAG Library document for g02dd https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02ddf.html .. _g02dd-py2-py-parameters: **Parameters** **q** : float, array-like, shape :math:`\left(:, \textit{ip}+1\right)` Note: the required extent for this argument in dimension 1 is determined as follows: if :math:`\mathrm{rss}\leq 0.0`: :math:`n`; otherwise: :math:`\textit{ip}`. Must be the array :math:`\mathrm{q}` as output by :meth:`linregm_obs_edit`, :meth:`linregm_var_add`, :meth:`linregm_var_del` or :meth:`linregm_fit_onestep`. If on entry :math:`\mathrm{rss}\leq 0.0` then all :math:`\textit{n}` elements of :math:`c` are needed. This is provided by functions :meth:`linregm_var_add`, :meth:`linregm_var_del` or :meth:`linregm_fit_onestep`. **rss** : float Either the residual sum of squares or a value less than or equal to :math:`0.0` to indicate that the residual sum of squares is to be calculated by the function. **p** : float, array-like, shape :math:`\left(\textit{ip}\times \textit{ip}+2\times \textit{ip}\right)` Must be array :math:`\mathrm{p}` as output by :meth:`linregm_fit`, :meth:`linregm_var_add` or :meth:`linregm_fit_onestep`, or a previous call to ``linregm_update``. **tol** : float, optional The value of :math:`\mathrm{tol}` is used to decide if the independent variables are of full rank and, if not, what is the rank of the independent variables. The smaller the value of :math:`\mathrm{tol}` the stricter the criterion for selecting the singular value decomposition. If :math:`\mathrm{tol} = 0.0`, the singular value decomposition will never be used, this may cause run time errors or inaccuracies if the independent variables are not of full rank. **Returns** **rss** : float If :math:`\mathrm{rss}\leq 0.0` on entry, then on exit :math:`\mathrm{rss}` will contain the residual sum of squares as calculated by ``linregm_update``. If :math:`\mathrm{rss}` was positive on entry, it will be unchanged. **idf** : int The degrees of freedom associated with the residual sum of squares. **b** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The estimates of the :math:`p` parameters, :math:`\hat{\beta }`. **se** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The standard errors of the :math:`p` parameters given in :math:`\mathrm{b}`. **cov** : float, ndarray, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The upper triangular part of the variance-covariance matrix of the :math:`p` parameter estimates given in :math:`\mathrm{b}`. They are stored packed by column, i.e., the covariance between the parameter estimate given in :math:`\mathrm{b}[i-1]` and the parameter estimate given in :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[j\times \left(j-1\right)/2+i-1]`. **svd** : bool If a singular value decomposition has been performed, :math:`\mathrm{svd} = \mathbf{True}`, otherwise :math:`\mathrm{svd} = \mathbf{False}`. **irank** : int The rank of the independent variables. If :math:`\mathrm{svd} = \mathbf{False}`, :math:`\mathrm{irank} = \textit{ip}`. If :math:`\mathrm{svd} = \mathbf{True}`, :math:`\mathrm{irank}` is an estimate of the rank of the independent variables. :math:`\mathrm{irank}` is calculated as the number of singular values greater than :math:`\mathrm{tol}\times \text{}` (largest singular value). It is possible for the SVD to be carried out but :math:`\mathrm{irank}` to be returned as :math:`\textit{ip}`. **p** : float, ndarray, shape :math:`\left(\textit{ip}\times \textit{ip}+2\times \textit{ip}\right)` Contains details of the singular value decomposition if used. If :math:`\mathrm{svd} = \mathbf{False}`, :math:`\mathrm{p}` is not referenced. If :math:`\mathrm{svd} = \mathbf{True}`, the first :math:`\textit{ip}` elements of :math:`\mathrm{p}` are unchanged, the next :math:`\textit{ip}` values contain the singular values. The following :math:`\textit{ip}\times \textit{ip}` values contain the matrix :math:`P^*` stored by columns. .. _g02dd-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\textit{ldq} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{rss}\leq 0.0`, :math:`\textit{ldq} \geq n`. (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 1`. (`errno` :math:`2`) The degrees of freedom for error are less than or equal to :math:`0`. In this case the estimates of :math:`\beta` are returned but not the standard errors or covariances. (`errno` :math:`3`) SVD solution failed to converge. .. _g02dd-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` A general linear regression model fitted by :meth:`linregm_fit` may be adjusted by adding or deleting an observation using :meth:`linregm_obs_edit`, adding a new independent variable using :meth:`linregm_var_add` or deleting an existing independent variable using :meth:`linregm_var_del`. Alternatively a model may be constructed by a forward selection procedure using :meth:`linregm_fit_onestep`. These functions compute the vector :math:`c` and the upper triangular matrix :math:`R`. ``linregm_update`` takes these basic results and computes the regression coefficients, :math:`\hat{\beta }`, their standard errors and their variance-covariance matrix. If :math:`R` is of full rank, then :math:`\hat{\beta }` is the solution to .. math:: R\hat{\beta } = c_1\text{,} where :math:`c_1` is the first :math:`p` elements of :math:`c`. If :math:`R` is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of :math:`R`, .. math:: R = Q_*\begin{pmatrix}D&0\\0&0\end{pmatrix}P^\mathrm{T}\text{,} where :math:`D` is a :math:`k\times k` diagonal matrix with nonzero diagonal elements, :math:`k` being the rank of :math:`R`, and :math:`Q_*` and :math:`P` are :math:`p\times p` orthogonal matrices. This gives the solution .. math:: \hat{\beta } = P_1D^{-1}Q_{*_1}^\mathrm{T}c_1\text{.} :math:`P_1` being the first :math:`k` columns of :math:`P`, i.e., :math:`P = \left(P_1P_0\right)`, and :math:`Q_{*_1}` being the first :math:`k` columns of :math:`Q_*`. Details of the SVD are made available in the form of the matrix :math:`P^*`: .. math:: P^* = \begin{pmatrix}D^{-1} P_1^\mathrm{T} \\ P_0^\mathrm{T} \end{pmatrix}\text{.} This will be only one of the possible solutions. Other estimates may be obtained by applying constraints to the parameters. These solutions can be obtained by calling :meth:`linregm_constrain` after calling ``linregm_update``. Only certain linear combinations of the parameters will have unique estimates; these are known as estimable functions. These can be estimated using :meth:`linregm_estfunc`. The residual sum of squares required to calculate the standard errors and the variance-covariance matrix can either be input or can be calculated if additional information on :math:`c` for the whole sample is provided. .. _g02dd-py2-py-references: **References** Golub, G H and Van Loan, C F, 1996, `Matrix Computations`, (3rd Edition), Johns Hopkins University Press, Baltimore Hammarling, S, 1985, `The singular value decomposition in multivariate statistics`, SIGNUM Newsl. (20(3)), 2--25 Searle, S R, 1971, `Linear Models`, Wiley """ raise NotImplementedError
[docs]def linregm_var_add(q, p, x, wt=None, tol=0.000001): r""" ``linregm_var_add`` adds a new independent variable to a general linear regression model. .. _g02de-py2-py-doc: For full information please refer to the NAG Library document for g02de https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02def.html .. _g02de-py2-py-parameters: **Parameters** **q** : float, array-like, shape :math:`\left(n, \textit{ip}+2\right)` If :math:`\textit{ip}\neq 0`, :math:`\mathrm{q}` must contain the results of the :math:`QR` decomposition for the model with :math:`p` parameters as returned by :meth:`linregm_fit` or a previous call to ``linregm_var_add``. If :math:`\textit{ip} = 0`, the first column of :math:`\mathrm{q}` should contain the :math:`n` values of the dependent variable, :math:`y`. **p** : float, array-like, shape :math:`\left(\textit{ip}+1\right)` Contains further details of the :math:`QR` decomposition used. The first :math:`\textit{ip}` elements of :math:`\mathrm{p}` must contain the zeta values for the :math:`QR` decomposition (see :meth:`lapackeig.dgeqrf <naginterfaces.library.lapackeig.dgeqrf>` for details). The first :math:`\textit{ip}` elements of array :math:`\mathrm{p}` are provided by :meth:`linregm_fit` or by previous calls to ``linregm_var_add``. **x** : float, array-like, shape :math:`\left(n\right)` :math:`x`, the new independent variable. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If provided :math:`\mathrm{wt}` must contain the weights to be used with the model. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. If :math:`\mathrm{wt}` is not provided the effective number of observations is :math:`n`. **tol** : float, optional The value of :math:`\mathrm{tol}` is used to decide if the new independent variable is linearly related to independent variables already included in the model. If the new variable is linearly related then :math:`c` is not updated. The smaller the value of :math:`\mathrm{tol}` the stricter the criterion for deciding if there is a linear relationship. **Returns** **q** : float, ndarray, shape :math:`\left(n, \textit{ip}+2\right)` The results of the :math:`QR` decomposition for the model with :math:`p+1` parameters: the first column of :math:`\mathrm{q}` contains the updated value of :math:`c`; the columns :math:`2` to :math:`\textit{ip}+1` are unchanged; the first :math:`\textit{ip}+1` elements of column :math:`\textit{ip}+2` contain the new column of :math:`R`, while the remaining :math:`n-\textit{ip}-1` elements contain details of the matrix :math:`Q_{{p+1}}`. **p** : float, ndarray, shape :math:`\left(\textit{ip}+1\right)` The first :math:`\textit{ip}` elements of :math:`\mathrm{p}` are unchanged and the :math:`\left(\textit{ip}+1\right)`\ th element contains the zeta value for :math:`Q_{{p+1}}`. **rss** : float The residual sum of squares for the new fitted model. **Note:** this will only be valid if the model is of full rank, see `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02def.html#fcomments>`__. .. _g02de-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol} > 0.0`. (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} < n`. (`errno` :math:`1`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'W'}` or :math:`\texttt{'U'}`. (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 0`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] < 0.0`. Constraint: :math:`\mathrm{wt}[i-1]\geq 0.0`, for :math:`i = 1,2,\ldots,n`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`3`) :math:`\mathrm{x}` variable is a linear combination of existing model terms. .. _g02de-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` A linear regression model may be built up by adding new independent variables to an existing model. ``linregm_var_add`` updates the :math:`QR` decomposition used in the computation of the linear regression model. The :math:`QR` decomposition may come from :meth:`linregm_fit` or a previous call to ``linregm_var_add``. The general linear regression model is defined by .. math:: y = X\beta +\epsilon \text{,} .. rst-class:: nag-rules-none nag-align-left +-----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |where|:math:`y` is a vector of :math:`n` observations on the dependent variable, | +-----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | |:math:`X` is an :math:`n\times p` matrix of the independent variables of column rank :math:`k`, | +-----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | |:math:`\beta` is a vector of length :math:`p` of unknown parameters, | +-----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |and |:math:`\epsilon` is a vector of length :math:`n` of unknown random errors such that :math:`\mathrm{var}\left(\epsilon \right) = V\sigma^2`, where :math:`V` is a known diagonal matrix.| +-----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ If :math:`V = I`, the identity matrix, then least squares estimation is used. If :math:`V\neq I`, then for a given weight matrix :math:`W\propto V^{-1}`, weighted least squares estimation is used. The least squares estimates, :math:`\hat{\beta }` of the parameters :math:`\beta` minimize :math:`\left(y-X\beta \right)^\mathrm{T}\left(y-X\beta \right)` while the weighted least squares estimates, minimize :math:`\left(y-X\beta \right)^\mathrm{T}W\left(y-X\beta \right)`. The parameter estimates may be found by computing a :math:`QR` decomposition of :math:`X` (or :math:`W^{{\frac{1}{2}}}X` in the weighted case), i.e., .. math:: X = QR^*\quad \text{ }\quad \left(\text{or }\quad W^{{\frac{1}{2}}}X = QR^*\right)\text{,} where :math:`R^* = \begin{pmatrix}R\\0\end{pmatrix}` and :math:`R` is a :math:`p\times p` upper triangular matrix and :math:`Q` is an :math:`n\times n` orthogonal matrix. If :math:`R` is of full rank, then :math:`\hat{\beta }` is the solution to .. math:: R\hat{\beta } = c_1\text{,} where :math:`c = Q^\mathrm{T}y` (or :math:`Q^\mathrm{T}W^{{\frac{1}{2}}}y`) and :math:`c_1` is the first :math:`p` elements of :math:`c`. If :math:`R` is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of :math:`R`. To add a new independent variable, :math:`x_{{p+1}}`, :math:`R` and :math:`c` have to be updated. The matrix :math:`Q_{{p+1}}` is found such that :math:`Q_{{p+1}}^\mathrm{T}\left[R:Q^\mathrm{T}x_{{p+1}}\right]` (or :math:`Q_{{p+1}}^\mathrm{T}\left[R:Q^\mathrm{T}W^{{\frac{1}{2}}}x_{{p+1}}\right]`) is upper triangular. The vector :math:`c` is then updated by multiplying by :math:`Q_{{p+1}}^\mathrm{T}`. The new independent variable is tested to see if it is linearly related to the existing independent variables by checking that at least one of the values :math:`\left(Q^\mathrm{T}x_{{p+1}}\right)_{\textit{i}}`, for :math:`\textit{i} = p+2,\ldots,n`, is nonzero. The new parameter estimates, :math:`\hat{\beta }`, can then be obtained by a call to :meth:`linregm_update`. The function can be used with :math:`p = 0`, in which case :math:`R` and :math:`c` are initialized. .. _g02de-py2-py-references: **References** Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley Golub, G H and Van Loan, C F, 1996, `Matrix Computations`, (3rd Edition), Johns Hopkins University Press, Baltimore Hammarling, S, 1985, `The singular value decomposition in multivariate statistics`, SIGNUM Newsl. (20(3)), 2--25 McCullagh, P and Nelder, J A, 1983, `Generalized Linear Models`, Chapman and Hall Searle, S R, 1971, `Linear Models`, Wiley """ raise NotImplementedError
[docs]def linregm_var_del(q, indx, rss): r""" ``linregm_var_del`` deletes an independent variable from a general linear regression model. .. _g02df-py2-py-doc: For full information please refer to the NAG Library document for g02df https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02dff.html .. _g02df-py2-py-parameters: **Parameters** **q** : float, array-like, shape :math:`\left(\textit{ip}, \textit{ip}+1\right)` The results of the :math:`QR` decomposition as returned by functions :meth:`linregm_fit`, :meth:`linregm_obs_edit`, :meth:`linregm_var_add` or :meth:`linregm_fit_onestep`, or previous calls to ``linregm_var_del``. **indx** : int Indicates which independent variable is to be deleted from the model. **rss** : float The residual sum of squares for the full regression. **Returns** **q** : float, ndarray, shape :math:`\left(\textit{ip}, \textit{ip}+1\right)` The updated :math:`QR` decomposition. **rss** : float The residual sum of squares with the (:math:`\mathrm{indx}`)th variable removed. Note that the residual sum of squares will only be valid if the regression is of full rank, otherwise the residual sum of squares should be obtained using :meth:`linregm_update`. .. _g02df-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{rss} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{rss}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{indx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{indx} \geq 1` and :math:`\mathrm{indx} \leq \textit{ip}`. (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{q}[\langle\mathit{\boldsymbol{value}}\rangle,\langle\mathit{\boldsymbol{value}}\rangle] = 0.0`. .. _g02df-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` When selecting a linear regression model it is sometimes useful to drop independent variables from the model and to examine the resulting sub-model. ``linregm_var_del`` updates the :math:`QR` decomposition used in the computation of the linear regression model. The :math:`QR` decomposition may come from :meth:`linregm_fit` or :meth:`linregm_var_add`, or a previous call to ``linregm_var_del``. For the general linear regression model with :math:`p` independent variables fitted :meth:`linregm_fit` or :meth:`linregm_var_add` compute a :math:`QR` decomposition of the (weighted) independent variables and form an upper triangular matrix :math:`R` and a vector :math:`c`. To remove an independent variable :math:`R` and :math:`c` have to be updated. The column of :math:`R` corresponding to the variable to be dropped is removed and the matrix is then restored to upper triangular form by applying a series of Givens rotations. The rotations are then applied to :math:`c`. Note only the first :math:`p` elements of :math:`c` are affected. The method used means that while the updated values of :math:`R` and :math:`c` are computed an updated value of :math:`Q` from the :math:`QR` decomposition is not available so a call to :meth:`linregm_var_add` cannot be made after a call to ``linregm_var_del``. :meth:`linregm_update` can be used to calculate the parameter estimates, :math:`\hat{\beta }`, from the information provided by ``linregm_var_del``. .. _g02df-py2-py-references: **References** Golub, G H and Van Loan, C F, 1996, `Matrix Computations`, (3rd Edition), Johns Hopkins University Press, Baltimore Hammarling, S, 1985, `The singular value decomposition in multivariate statistics`, SIGNUM Newsl. (20(3)), 2--25 """ raise NotImplementedError
[docs]def linregm_fit_newvar(rss, irank, cov, q, svd, p, y, wk, wt=None): r""" ``linregm_fit_newvar`` calculates the estimates of the parameters of a general linear regression model for a new dependent variable after a call to :meth:`linregm_fit`. .. _g02dg-py2-py-doc: For full information please refer to the NAG Library document for g02dg https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02dgf.html .. _g02dg-py2-py-parameters: **Parameters** **rss** : float The residual sum of squares for the original dependent variable. **irank** : int The rank of the independent variables, as given by :meth:`linregm_fit`. **cov** : float, array-like, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The covariance matrix of the parameter estimates as given by :meth:`linregm_fit`. **q** : float, array-like, shape :math:`\left(n, \textit{ip}+1\right)` The results of the :math:`QR` decomposition as returned by :meth:`linregm_fit`. **svd** : bool Indicates if a singular value decomposition was used by :meth:`linregm_fit`. :math:`\mathrm{svd} = \mathbf{True}` A singular value decomposition was used by :meth:`linregm_fit`. :math:`\mathrm{svd} = \mathbf{False}` A singular value decomposition was not used by :meth:`linregm_fit`. **p** : float, array-like, shape :math:`\left(:\right)` Note: the required length for this argument is determined as follows: if :math:`\mathrm{not}\left(\mathrm{svd}\right)`: :math:`\textit{ip}`; otherwise: :math:`{ \textit{ip} \times \textit{ip} + 2 \times \textit{ip} }`. Details of the :math:`QR` decomposition and SVD, if used, as returned in array :math:`\mathrm{p}` by :meth:`linregm_fit`. If :math:`\mathrm{svd} = \mathbf{False}`, only the first :math:`\textit{ip}` elements of :math:`\mathrm{p}` are used; these contain the zeta values for the :math:`QR` decomposition (see :meth:`lapackeig.dgeqrf <naginterfaces.library.lapackeig.dgeqrf>` for details). If :math:`\mathrm{svd} = \mathbf{True}`, the first :math:`\textit{ip}` elements of :math:`\mathrm{p}` contain the zeta values for the :math:`QR` decomposition (see :meth:`lapackeig.dgeqrf <naginterfaces.library.lapackeig.dgeqrf>` for details) and the next :math:`\textit{ip}\times \textit{ip}+\textit{ip}` elements of :math:`\mathrm{p}` contain details of the singular value decomposition. **y** : float, array-like, shape :math:`\left(n\right)` The new dependent variable, :math:`y_{\text{new}}`. **wk** : float, array-like, shape :math:`\left(5\times \left(\textit{ip}-1\right)+\textit{ip}\times \textit{ip}\right)` If :math:`\mathrm{svd} = \mathbf{True}`, :math:`\mathrm{wk}` must be unaltered from the previous call to :meth:`linregm_fit` or ``linregm_fit_newvar``. If :math:`\mathrm{svd} = \mathbf{False}`, :math:`\mathrm{wk}` is used as workspace. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If provided :math:`\mathrm{wt}` must contain the weights to be used with the model. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. If :math:`\mathrm{wt}` is not provided the effective number of observations is :math:`n`. **Returns** **rss** : float The residual sum of squares for the new dependent variable. **cov** : float, ndarray, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The upper triangular part of the variance-covariance matrix of the :math:`\textit{ip}` parameter estimates given in :math:`\mathrm{b}`. They are stored packed by column, i.e., the covariance between the parameter estimate given in :math:`\mathrm{b}[i-1]` and the parameter estimate given in :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[\left(j\times \left(j-1\right)/2+i\right)-1]`. **q** : float, ndarray, shape :math:`\left(n, \textit{ip}+1\right)` The first column of :math:`\mathrm{q}` contains the new values of :math:`c`, the remainder of :math:`\mathrm{q}` will be unchanged. **b** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The least squares estimates of the parameters of the regression model, :math:`\hat{\beta }`. **se** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The standard error of the estimates of the parameters. **res** : float, ndarray, shape :math:`\left(n\right)` The residuals for the new regression model. .. _g02dg-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{irank} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{irank} > 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{irank} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{svd} = \mathbf{False}`, :math:`\mathrm{irank} = \textit{ip}`. (`errno` :math:`1`) On entry, :math:`\mathrm{irank} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{svd} = \mathbf{True}`, :math:`\mathrm{irank} \leq \textit{ip}`. (`errno` :math:`1`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'W'}` or :math:`\texttt{'U'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{rss} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{rss} > 0.0`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq \textit{ip}`. (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] < 0.0`. Constraint: :math:`\mathrm{wt}[i-1]\geq 0.0`, for :math:`i = 1,2,\ldots,n`. .. _g02dg-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` ``linregm_fit_newvar`` uses the results given by :meth:`linregm_fit` to fit the same set of independent variables to a new dependent variable. :meth:`linregm_fit` computes a :math:`QR` decomposition of the matrix of :math:`p` independent variables and also, if the model is not of full rank, a singular value decomposition (SVD). These results can be used to compute estimates of the parameters for a general linear model with a new dependent variable. The :math:`QR` decomposition leads to the formation of an upper triangular :math:`p\times p` matrix :math:`R` and an :math:`n\times n` orthogonal matrix :math:`Q`. In addition the vector :math:`c = Q^\mathrm{T}y` (or :math:`Q^\mathrm{T}W^{{1/2}}y`) is computed. For a new dependent variable, :math:`y_{\mathrm{new}}`, ``linregm_fit_newvar`` computes a new value of :math:`c = Q^\mathrm{T}y_{\text{new}}` or :math:`Q^\mathrm{T}W^{{1/2}}y_{\text{new}}`. If :math:`R` is of full rank, then the least squares parameter estimates, :math:`\hat{\beta }`, are the solution to .. math:: R\hat{\beta } = c_1\text{,} where :math:`c_1` is the first :math:`p` elements of :math:`c`. If :math:`R` is not of full rank, then :meth:`linregm_fit` will have computed an SVD of :math:`R`, .. math:: R = Q_*\begin{pmatrix}D&0\\0&0\end{pmatrix}P^\mathrm{T}\text{,} where :math:`D` is a :math:`k\times k` diagonal matrix with nonzero diagonal elements, :math:`k` being the rank of :math:`R`, and :math:`Q_*` and :math:`P` are :math:`p\times p` orthogonal matrices. This gives the solution .. math:: \hat{\beta } = P_1D^{-1}Q_{*_1}^\mathrm{T}c_1\text{,} :math:`P_1` being the first :math:`k` columns of :math:`P`, i.e., :math:`P = \left(P_1P_0\right)`, and :math:`Q_{*_1}` being the first :math:`k` columns of :math:`Q_*`. Details of the SVD are made available by :meth:`linregm_fit` in the form of the matrix :math:`P^*`: .. math:: P^* = \begin{pmatrix}D^{-1} P_1^\mathrm{T} \\ P_0^\mathrm{T} \end{pmatrix}\text{.} The matrix :math:`Q_*` is made available through the workspace of :meth:`linregm_fit`. In addition to parameter estimates, the new residuals are computed and the variance-covariance matrix of the parameter estimates are found by scaling the variance-covariance matrix for the original regression. .. _g02dg-py2-py-references: **References** Golub, G H and Van Loan, C F, 1996, `Matrix Computations`, (3rd Edition), Johns Hopkins University Press, Baltimore Hammarling, S, 1985, `The singular value decomposition in multivariate statistics`, SIGNUM Newsl. (20(3)), 2--25 Searle, S R, 1971, `Linear Models`, Wiley """ raise NotImplementedError
[docs]def linregm_constrain(p, c, b, rss, idf): r""" ``linregm_constrain`` calculates the estimates of the parameters of a general linear regression model for given constraints from the singular value decomposition results. .. _g02dk-py2-py-doc: For full information please refer to the NAG Library document for g02dk https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02dkf.html .. _g02dk-py2-py-parameters: **Parameters** **p** : float, array-like, shape :math:`\left(\textit{ip}\times \textit{ip}+2\times \textit{ip}\right)` As returned by :meth:`linregm_fit` and :meth:`linregm_update`. **c** : float, array-like, shape :math:`\left(\textit{ip}, \textit{iconst}\right)` The :math:`\textit{iconst}` constraints stored by column, i.e., the :math:`i`\ th constraint is stored in the :math:`i`\ th column of :math:`\mathrm{c}`. **b** : float, array-like, shape :math:`\left(\textit{ip}\right)` The parameter estimates computed by using the singular value decomposition, :math:`\hat{\beta }_{\text{svd}}`. **rss** : float The residual sum of squares as returned by :meth:`linregm_fit` or :meth:`linregm_update`. **idf** : int The degrees of freedom associated with the residual sum of squares as returned by :meth:`linregm_fit` or :meth:`linregm_update`. **Returns** **b** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The parameter estimates of the parameters with the constraints imposed, :math:`\hat{\beta }_{\mathrm{c}}`. **se** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The standard error of the parameter estimates in :math:`\mathrm{b}`. **cov** : float, ndarray, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The upper triangular part of the variance-covariance matrix of the :math:`\textit{ip}` parameter estimates given in :math:`\mathrm{b}`. They are stored packed by column, i.e., the covariance between the parameter estimate given in :math:`\mathrm{b}[i-1]` and the parameter estimate given in :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[\left(j\times \left(j-1\right)/2+i\right)-1]`. .. _g02dk-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{idf} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{idf} > 0`. (`errno` :math:`1`) On entry, :math:`\textit{iconst} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{iconst} > 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{rss} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{rss} > 0.0`. (`errno` :math:`1`) On entry, :math:`\textit{iconst} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{iconst} < \textit{ip}`. (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`2`) :math:`\mathrm{c}` does not give a model of full rank. .. _g02dk-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` ``linregm_constrain`` computes the estimates given a set of linear constraints for a general linear regression model which is not of full rank. It is intended for use after a call to :meth:`linregm_fit` or :meth:`linregm_update`. In the case of a model not of full rank the functions use a singular value decomposition (SVD) to find the parameter estimates, :math:`\hat{\beta }_{\text{svd}}`, and their variance-covariance matrix. Details of the SVD are made available in the form of the matrix :math:`P^*`: .. math:: P^* = \begin{pmatrix}D^{-1} P_1^\mathrm{T} \\ P_0^\mathrm{T} \end{pmatrix}\text{,} as described by :meth:`linregm_fit` and :meth:`linregm_update`. Alternative solutions can be formed by imposing constraints on the parameters. If there are :math:`p` parameters and the rank of the model is :math:`k`, then :math:`n_c = p-k` constraints will have to be imposed to obtain a unique solution. Let :math:`C` be a :math:`p\times n_c` matrix of constraints, such that .. math:: C^{\mathrm{T}}\beta = 0 then the new parameter estimates :math:`\hat{\beta }_c` are given by .. math:: \begin{array}{cc}\hat{\beta }_c& = A\hat{\beta }_{\mathrm{svd}}\text{;}\\\\& = \left(I-P_0\left(C^\mathrm{T}P_0\right)^{-1}\right)\hat{\beta }_{\mathrm{svd}}\text{,}\end{array} where :math:`I` is the identity matrix, and the variance-covariance matrix is given by .. math:: AP_1D^{-2}P_1^{\mathrm{T}}A^{\mathrm{T}}\text{,} provided :math:`\left(C^{\mathrm{T}}P_0\right)^{-1}` exists. .. _g02dk-py2-py-references: **References** Golub, G H and Van Loan, C F, 1996, `Matrix Computations`, (3rd Edition), Johns Hopkins University Press, Baltimore Hammarling, S, 1985, `The singular value decomposition in multivariate statistics`, SIGNUM Newsl. (20(3)), 2--25 Searle, S R, 1971, `Linear Models`, Wiley """ raise NotImplementedError
[docs]def linregm_estfunc(irank, b, cov, p, f, tol=0.0): r""" ``linregm_estfunc`` gives the estimate of an estimable function along with its standard error. .. _g02dn-py2-py-doc: For full information please refer to the NAG Library document for g02dn https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02dnf.html .. _g02dn-py2-py-parameters: **Parameters** **irank** : int :math:`k`, the rank of the independent variables. **b** : float, array-like, shape :math:`\left(\textit{ip}\right)` The :math:`\textit{ip}` values of the estimates of the parameters of the model, :math:`\hat{\beta }`. **cov** : float, array-like, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The upper triangular part of the variance-covariance matrix of the :math:`\textit{ip}` parameter estimates given in :math:`\mathrm{b}`. They are stored packed by column, i.e., the covariance between the parameter estimate given in :math:`\mathrm{b}[i-1]` and the parameter estimate given in :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[\left(j\times \left(j-1\right)/2+i\right)-1]`. **p** : float, array-like, shape :math:`\left(\textit{ip}\times \textit{ip}+2\times \textit{ip}\right)` As returned by :meth:`linregm_fit` and :meth:`linregm_update`. **f** : float, array-like, shape :math:`\left(\textit{ip}\right)` :math:`f`, the linear function to be estimated. **tol** : float, optional :math:`\eta`, the tolerance value used in the check for estimability. If :math:`\mathrm{tol}\leq 0.0` then :math:`\sqrt{\epsilon }`, where :math:`\epsilon` is the machine precision, is used instead. **Returns** **est** : bool Indicates if the function was estimable. :math:`\mathrm{est} = \mathbf{True}` The function is estimable. :math:`\mathrm{est} = \mathbf{False}` The function is not estimable and :math:`\mathrm{stat}`, :math:`\mathrm{sestat}` and :math:`\mathrm{t}` are not set. **stat** : float If :math:`\mathrm{est} = \mathbf{True}`, :math:`\mathrm{stat}` contains the estimate of the function, :math:`f^\mathrm{T}\hat{\beta }`. **sestat** : float If :math:`\mathrm{est} = \mathbf{True}`, :math:`\mathrm{sestat}` contains the standard error of the estimate of the function, :math:`\mathrm{se}\left(F\right)`. **t** : float If :math:`\mathrm{est} = \mathbf{True}`, :math:`\mathrm{t}` contains the :math:`t`-statistic for the test of the function being equal to zero. .. _g02dn-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{irank} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{irank} \leq \textit{ip}`. (`errno` :math:`1`) On entry, :math:`\mathrm{irank} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{irank} \geq 1`. (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`3`) Standard error of statistic :math:`\text{} = 0.0`; this may be due to rounding errors if the standard error is very small or due to mis-specified inputs :math:`\mathrm{cov}` and :math:`\mathrm{f}`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`2`) On entry, :math:`\mathrm{irank} = \textit{ip}`, i.e., model of full rank. In this case :math:`\mathrm{est}` is returned as true and all statistics are calculated. .. _g02dn-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` ``linregm_estfunc`` computes the estimates of an estimable function for a general linear regression model which is not of full rank. It is intended for use after a call to :meth:`linregm_fit` or :meth:`linregm_update`. An estimable function is a linear combination of the parameters such that it has a unique estimate. For a full rank model all linear combinations of parameters are estimable. In the case of a model not of full rank the functions use a singular value decomposition (SVD) to find the parameter estimates, :math:`\hat{\beta }`, and their variance-covariance matrix. Given the upper triangular matrix :math:`R` obtained from the :math:`QR` decomposition of the independent variables the SVD gives .. math:: R = Q_*\begin{pmatrix}D&0\\0&0\end{pmatrix}P^\mathrm{T}\text{,} where :math:`D` is a :math:`k\times k` diagonal matrix with nonzero diagonal elements, :math:`k` being the rank of :math:`R`, and :math:`Q_*` and :math:`P` are :math:`p\times p` orthogonal matrices. This gives the solution .. math:: \hat{\beta } = P_1D^{-1}Q_{*_1}^\mathrm{T}c_1\text{,} :math:`P_1` being the first :math:`k` columns of :math:`P`, i.e., :math:`P = \left(P_1P_0\right)`, :math:`Q_{*_1}` being the first :math:`k` columns of :math:`Q_*`, and :math:`c_1` being the first :math:`p` elements of :math:`c`. Details of the SVD are made available in the form of the matrix :math:`P^*`: .. math:: P^* = \begin{pmatrix}D^{-1} P_1^\mathrm{T} \\ P_0^\mathrm{T} \end{pmatrix}\text{,} as given by :meth:`linregm_fit` and :meth:`linregm_update`. A linear function of the parameters, :math:`F = f^\mathrm{T}\beta`, can be tested to see if it is estimable by computing :math:`\zeta = P_0^\mathrm{T}f`. If :math:`\zeta` is zero, then the function is estimable; if not, the function is not estimable. In practice :math:`\left\lvert \zeta \right\rvert` is tested against some small quantity :math:`\eta`. Given that :math:`F` is estimable it can be estimated by :math:`f^\mathrm{T}\hat{\beta }` and its standard error calculated from the variance-covariance matrix of :math:`\hat{\beta }`, :math:`C_{\beta }`, as .. math:: \mathrm{se}\left(F\right) = \sqrt{f^\mathrm{T}C_{\beta }f}\text{.} Also a :math:`t`-statistic, .. math:: t = \frac{{f^\mathrm{T}\hat{\beta }}}{{\mathrm{se}\left(F\right)}}\text{,} can be computed. The :math:`t`-statistic will have a Student's :math:`t`-distribution with degrees of freedom as given by the degrees of freedom for the residual sum of squares for the model. .. _g02dn-py2-py-references: **References** Golub, G H and Van Loan, C F, 1996, `Matrix Computations`, (3rd Edition), Johns Hopkins University Press, Baltimore Hammarling, S, 1985, `The singular value decomposition in multivariate statistics`, SIGNUM Newsl. (20(3)), 2--25 Searle, S R, 1971, `Linear Models`, Wiley """ raise NotImplementedError
[docs]def linregm_rssq(x, vname, isx, y, mean='M', wt=None): r""" ``linregm_rssq`` calculates the residual sums of squares for all possible linear regressions for a given set of independent variables. .. _g02ea-py2-py-doc: For full information please refer to the NAG Library document for g02ea https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02eaf.html .. _g02ea-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th independent variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **vname** : str, array-like, shape :math:`\left(m\right)` :math:`\mathrm{vname}[\textit{j}-1]` must contain the name of the variable in column :math:`\textit{j}` of :math:`\mathrm{x}`, for :math:`\textit{j} = 1,2,\ldots,m`. **isx** : int, array-like, shape :math:`\left(m\right)` Indicates which independent variables are to be considered in the model. :math:`\mathrm{isx}[j-1]\geq 2` The variable contained in the :math:`j`\ th column of :math:`\mathrm{x}` is included in all regression models, i.e., is a forced variable. :math:`\mathrm{isx}[j-1] = 1` The variable contained in the :math:`j`\ th column of :math:`\mathrm{x}` is included in the set from which the regression models are chosen, i.e., is a free variable. :math:`\mathrm{isx}[j-1] = 0` The variable contained in the :math:`j`\ th column of :math:`\mathrm{x}` is not included in the models. **y** : float, array-like, shape :math:`\left(n\right)` :math:`\mathrm{y}[\textit{i}-1]` must contain the :math:`\textit{i}`\ th observation on the dependent variable, :math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **mean** : str, length 1, optional Indicates if a mean term is to be included. :math:`\mathrm{mean} = \texttt{'M'}` A mean term, intercept, will be included in the model. :math:`\mathrm{mean} = \texttt{'Z'}` The model will pass through the origin, zero-point. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If provided :math:`\mathrm{wt}` must contain the weights to be used with the model. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. If :math:`\mathrm{wt}` is not provided the effective number of observations is :math:`n`. **Returns** **nmod** : int The total number of models for which residual sums of squares have been calculated. **modl** : str, ndarray, shape :math:`\left(\max\left(2^{\textit{k}},m\right), m\right)` The first :math:`\mathrm{nterms}[i-1]` elements of the :math:`i`\ th row of :math:`\mathrm{modl}` contain the names of the independent variables, as given in :math:`\mathrm{vname}`, that are included in the :math:`i`\ th model. **rss** : float, ndarray, shape :math:`\left(\max\left(2^{\textit{k}},m\right)\right)` :math:`\mathrm{rss}[\textit{i}-1]` contains the residual sum of squares for the :math:`\textit{i}`\ th model, for :math:`\textit{i} = 1,2,\ldots,\mathrm{nmod}`. **nterms** : int, ndarray, shape :math:`\left(\max\left(2^{\textit{k}},m\right)\right)` :math:`\mathrm{nterms}[\textit{i}-1]` contains the number of independent variables in the :math:`\textit{i}`\ th model, not including the mean if one is fitted, for :math:`\textit{i} = 1,2,\ldots,\mathrm{nmod}`. **mrank** : int, ndarray, shape :math:`\left(\max\left(2^{\textit{k}},m\right)\right)` :math:`\mathrm{mrank}[i-1]` contains the rank of the residual sum of squares for the :math:`i`\ th model. .. _g02ea-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{ldmodl} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ldmodl} \geq m`. (`errno` :math:`1`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'W'}` or :math:`\texttt{'U'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 2`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] < 0.0`. Constraint: :math:`\mathrm{wt}[i-1] \geq 0.0`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`3`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] < 0`. Constraint: :math:`\mathrm{isx}[i-1] \geq 0`, for :math:`i = 1,2,\ldots,m`. (`errno` :math:`3`) There are no free variables, i.e., no element of :math:`\mathrm{isx} = 1`. (`errno` :math:`4`) On entry, :math:`\textit{ldmodl} = \langle\mathit{\boldsymbol{value}}\rangle` and number of possible models is :math:`\langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ldmodl}\geq` the number of possible models. (`errno` :math:`5`) On entry, the number of independent variables to be considered (forced plus free plus mean if included) is greater or equal to the effective number of observations. (`errno` :math:`6`) The full model is not of full rank, i.e., some of the independent variables may be linear combinations of other independent variables. Variables must be excluded from the model in order to give full rank. .. _g02ea-py2-py-notes: **Notes** For a set of :math:`\textit{k}` possible independent variables there are :math:`2^{\textit{k}}` linear regression models with from zero to :math:`\textit{k}` independent variables in each model. For example if :math:`\textit{k} = 3` and the variables are :math:`A`, :math:`B` and :math:`C` then the possible models are: (i) null model (#) :math:`A` (#) :math:`B` (#) :math:`C` (#) :math:`A` and :math:`B` (#) :math:`A` and :math:`C` (#) :math:`B` and :math:`C` (#) :math:`A`, :math:`B` and :math:`C`. ``linregm_rssq`` calculates the residual sums of squares from each of the :math:`2^{\textit{k}}` possible models. The method used involves a :math:`QR` decomposition of the matrix of possible independent variables. Independent variables are then moved into and out of the model by a series of Givens rotations and the residual sums of squares computed for each model; see Clark (1981) and Smith and Bremner (1989). The computed residual sums of squares are then ordered first by increasing number of terms in the model, then by decreasing size of residual sums of squares. So the first model will always have the largest residual sum of squares and the :math:`2^{\textit{k}}`\ th will always have the smallest. This aids you in selecting the best possible model from the given set of independent variables. ``linregm_rssq`` allows you to specify some independent variables that must be in the model, the forced variables. The other independent variables from which the possible models are to be formed are the free variables. .. _g02ea-py2-py-references: **References** Clark, M R B, 1981, `A Givens algorithm for moving from one linear model to another without going back to the data`, Appl. Statist. (30), 198--203 Smith, D M and Bremner, J M, 1989, `All possible subset regressions using the` :math:`{QR}` `decomposition`, Comput. Statist. Data Anal. (7), 217--236 Weisberg, S, 1985, `Applied Linear Regression`, Wiley """ raise NotImplementedError
[docs]def linregm_rssq_stat(n, sigsq, tss, nterms, rss, mean='M'): r""" ``linregm_rssq_stat`` calculates :math:`R^2` and :math:`C_p`-values from the residual sums of squares for a series of linear regression models. .. _g02ec-py2-py-doc: For full information please refer to the NAG Library document for g02ec https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02ecf.html .. _g02ec-py2-py-parameters: **Parameters** **n** : int :math:`n`, the number of observations used in the regression model. **sigsq** : float The best estimate of true variance of the errors, :math:`\hat{\sigma }^2`. **tss** : float The total sum of squares for the regression model. **nterms** : int, array-like, shape :math:`\left(\textit{nmod}\right)` :math:`\mathrm{nterms}[\textit{i}-1]` must contain the number of independent variables (not counting the mean) fitted to the :math:`\textit{i}`\ th model, for :math:`\textit{i} = 1,2,\ldots,\textit{nmod}`. **rss** : float, array-like, shape :math:`\left(\textit{nmod}\right)` :math:`\mathrm{rss}[i-1]` must contain the residual sum of squares for the :math:`i`\ th model. **mean** : str, length 1, optional Indicates if a mean term is to be included. :math:`\mathrm{mean} = \texttt{'M'}` A mean term, intercept, will be included in the model. :math:`\mathrm{mean} = \texttt{'Z'}` The model will pass through the origin, zero-point. **Returns** **rsq** : float, ndarray, shape :math:`\left(\textit{nmod}\right)` :math:`\mathrm{rsq}[\textit{i}-1]` contains the :math:`R^2`-value for the :math:`\textit{i}`\ th model, for :math:`\textit{i} = 1,2,\ldots,\textit{nmod}`. **cp** : float, ndarray, shape :math:`\left(\textit{nmod}\right)` :math:`\mathrm{cp}[\textit{i}-1]` contains the :math:`C_p`-value for the :math:`\textit{i}`\ th model, for :math:`\textit{i} = 1,2,\ldots,\textit{nmod}`. .. _g02ec-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{tss} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tss} > 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{sigsq} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{sigsq} > 0.0`. (`errno` :math:`1`) On entry, :math:`\textit{nmod} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nmod} > 0`. (`errno` :math:`2`) On entry: the number of parameters, :math:`p`, is :math:`\langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{n} \geq {2p}`. (`errno` :math:`3`) On entry, :math:`\mathrm{rss}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{tss} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{rss}[i] \leq \mathrm{tss}`, for all :math:`i`. (`errno` :math:`4`) A value of :math:`C_p` is less than :math:`0.0`. This may occur if :math:`\mathrm{sigsq}` is too large or if :math:`\mathrm{rss}`, :math:`\mathrm{n}` or IP are incorrect. .. _g02ec-py2-py-notes: **Notes** When selecting a linear regression model for a set of :math:`n` observations a balance has to be found between the number of independent variables in the model and fit as measured by the residual sum of squares. The more variables included the smaller will be the residual sum of squares. Two statistics can help in selecting the best model. (a) :math:`R^2` represents the proportion of variation in the dependent variable that is explained by the independent variables. .. math:: R^2 = \frac{\text{Regression Sum of Squares}}{\text{Total Sum of Squares}}\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |where|:math:`\text{Total Sum of Squares} = \mathrm{tss} = \sum \left(y-\bar{y}\right)^2` (if mean is fitted, otherwise :math:`\mathrm{tss} = \sum y^2`) and | +-----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | |:math:`\text{Regression Sum of Squares} = \text{RegSS} = \mathrm{tss}-\mathrm{rss}`, where :math:`\mathrm{rss} = \text{residual sum of squares} = \sum \left(y-\hat{y}\right)^2`.| +-----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ The :math:`R^2`-values can be examined to find a model with a high :math:`R^2`-value but with small number of independent variables. (#) :math:`C_p` statistic. .. math:: C_p = \frac{\mathrm{rss}}{{\hat{\sigma }^2}}-\left(n-2p\right)\text{,} where :math:`p` is the number of parameters (including the mean) in the model and :math:`\hat{\sigma }^2` is an estimate of the true variance of the errors. This can often be obtained from fitting the full model. A well fitting model will have :math:`C_p\simeq p`. :math:`C_p` is often plotted against :math:`p` to see which models are closest to the :math:`C_p = p` line. ``linregm_rssq_stat`` may be called after :meth:`linregm_rssq` which calculates the residual sums of squares for all possible linear regression models. .. _g02ec-py2-py-references: **References** Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley Weisberg, S, 1985, `Applied Linear Regression`, Wiley """ raise NotImplementedError
[docs]def linregm_fit_onestep(istep, x, vname, isx, y, model, nterm, rss, idf, ifr, free, q, p, mean='M', wt=None, fin=2.0): r""" ``linregm_fit_onestep`` carries out one step of a forward selection procedure in order to enable the 'best' linear regression model to be found. .. _g02ee-py2-py-doc: For full information please refer to the NAG Library document for g02ee https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02eef.html .. _g02ee-py2-py-parameters: **Parameters** **istep** : int Indicates which step in the forward selection process is to be carried out. :math:`\mathrm{istep} = 0` The process is initialized. **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th independent variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **vname** : str, array-like, shape :math:`\left(m\right)` :math:`\mathrm{vname}[\textit{j}-1]` must contain the name of the independent variable in column :math:`\textit{j}` of :math:`\mathrm{x}`, for :math:`\textit{j} = 1,2,\ldots,m`. **isx** : int, array-like, shape :math:`\left(m\right)` Indicates which independent variables could be considered for inclusion in the regression. :math:`\mathrm{isx}[j-1]\geq 2` The variable contained in the :math:`\textit{j}`\ th column of :math:`\mathrm{x}` is automatically included in the regression model, for :math:`\textit{j} = 1,2,\ldots,m`. :math:`\mathrm{isx}[j-1] = 1` The variable contained in the :math:`\textit{j}`\ th column of :math:`\mathrm{x}` is considered for inclusion in the regression model, for :math:`\textit{j} = 1,2,\ldots,m`. :math:`\mathrm{isx}[j-1] = 0` The variable in the :math:`\textit{j}`\ th column is not considered for inclusion in the model, for :math:`\textit{j} = 1,2,\ldots,m`. **y** : float, array-like, shape :math:`\left(n\right)` The dependent variable. **model** : str, array-like, shape :math:`\left(\textit{maxip}\right)` If :math:`\mathrm{istep} = 0`, :math:`\mathrm{model}` need not be set. If :math:`\mathrm{istep}\neq 0`, :math:`\mathrm{model}` must contain the values returned by the previous call to ``linregm_fit_onestep``. **nterm** : int If :math:`\mathrm{istep} = 0`, :math:`\mathrm{nterm}` need not be set. If :math:`\mathrm{istep}\neq 0`, :math:`\mathrm{nterm}` must contain the value returned by the previous call to ``linregm_fit_onestep``. **rss** : float If :math:`\mathrm{istep} = 0`, :math:`\mathrm{rss}` need not be set. If :math:`\mathrm{istep}\neq 0`, :math:`\mathrm{rss}` must contain the value returned by the previous call to ``linregm_fit_onestep``. **idf** : int If :math:`\mathrm{istep} = 0`, :math:`\mathrm{idf}` need not be set. If :math:`\mathrm{istep}\neq 0`, :math:`\mathrm{idf}` must contain the value returned by the previous call to ``linregm_fit_onestep``. **ifr** : int If :math:`\mathrm{istep} = 0`, :math:`\mathrm{ifr}` need not be set. If :math:`\mathrm{istep}\neq 0`, :math:`\mathrm{ifr}` must contain the value returned by the previous call to ``linregm_fit_onestep``. **free** : str, array-like, shape :math:`\left(\textit{maxip}\right)` If :math:`\mathrm{istep} = 0`, :math:`\mathrm{free}` need not be set. If :math:`\mathrm{istep}\neq 0`, :math:`\mathrm{free}` must contain the values returned by the previous call to ``linregm_fit_onestep``. **q** : float, array-like, shape :math:`\left(n, \textit{maxip}+2\right)` If :math:`\mathrm{istep} = 0`, :math:`\mathrm{q}` need not be set. If :math:`\mathrm{istep}\neq 0`, :math:`\mathrm{q}` must contain the values returned by the previous call to ``linregm_fit_onestep``. **p** : float, array-like, shape :math:`\left(\textit{maxip}+1\right)` If :math:`\mathrm{istep} = 0`, :math:`\mathrm{p}` need not be set. If :math:`\mathrm{istep}\neq 0`, :math:`\mathrm{p}` must contain the values returned by the previous call to ``linregm_fit_onestep``. **mean** : str, length 1, optional Indicates if a mean term is to be included. :math:`\mathrm{mean} = \texttt{'M'}` A mean term, intercept, will be included in the model. :math:`\mathrm{mean} = \texttt{'Z'}` The model will pass through the origin, zero-point. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If provided :math:`\mathrm{wt}` must contain the weights to be used with the model. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. If :math:`\mathrm{wt}` is not provided the effective number of observations is :math:`n`. **fin** : float, optional The critical value of the :math:`F` statistic for the term to be included in the model, :math:`F_{\mathrm{c}}`. **Returns** **istep** : int Is incremented by :math:`1`. **addvar** : bool Indicates if a variable has been added to the model. :math:`\mathrm{addvar} = \mathbf{True}` A variable has been added to the model. :math:`\mathrm{addvar} = \mathbf{False}` No variable had an :math:`F` value greater than :math:`F_{\mathrm{c}}` and none were added to the model. **newvar** : str If :math:`\mathrm{addvar} = \mathbf{True}`, :math:`\mathrm{newvar}` contains the name of the variable added to the model. **chrss** : float If :math:`\mathrm{addvar} = \mathbf{True}`, :math:`\mathrm{chrss}` contains the change in the residual sum of squares due to adding variable :math:`\mathrm{newvar}`. **f** : float If :math:`\mathrm{addvar} = \mathbf{True}`, :math:`\mathrm{f}` contains the :math:`F` statistic for the inclusion of the variable in :math:`\mathrm{newvar}`. **model** : str, ndarray, shape :math:`\left(\textit{maxip}\right)` The names of the variables in the current model. **nterm** : int The number of independent variables in the current model, not including the mean, if any. **rss** : float The residual sums of squares for the current model. **idf** : int The degrees of freedom for the residual sum of squares for the current model. **ifr** : int The number of free independent variables, i.e., the number of variables not in the model that are still being considered for selection. **free** : str, ndarray, shape :math:`\left(\textit{maxip}\right)` The first :math:`\mathrm{ifr}` values of :math:`\mathrm{free}` contain the names of the free variables. **exss** : float, ndarray, shape :math:`\left(\textit{maxip}\right)` The first :math:`\mathrm{ifr}` values of :math:`\mathrm{exss}` contain what would be the change in regression sum of squares if the free variables had been added to the model, i.e., the extra sum of squares for the free variables. :math:`\mathrm{exss}[i-1]` contains what would be the change in regression sum of squares if the variable :math:`\mathrm{free}[i-1]` had been added to the model. **q** : float, ndarray, shape :math:`\left(n, \textit{maxip}+2\right)` The results of the :math:`QR` decomposition for the current model: the first column of :math:`\mathrm{q}` contains :math:`c = Q^\mathrm{T}y` (or :math:`Q^\mathrm{T}W^{{\frac{1}{2}}}y` where :math:`W` is the vector of weights if used); the upper triangular part of columns :math:`2` to :math:`\textit{p}+1` contain the :math:`R` matrix; the strictly lower triangular part of columns :math:`2` to :math:`\textit{p}+1` contain details of the :math:`Q` matrix; the remaining :math:`\textit{p}+1` to :math:`\textit{p}+\mathrm{ifr}` columns of contain :math:`Q^\mathrm{T}X_{\textit{free}}` (or :math:`Q^\mathrm{T}W^{{\frac{1}{2}}}X_{\textit{free}}`), where :math:`\textit{p} = \mathrm{nterm}`, or :math:`\textit{p} = \mathrm{nterm}+1` if :math:`\mathrm{mean} = \texttt{'M'}`. **p** : float, ndarray, shape :math:`\left(\textit{maxip}+1\right)` The first :math:`\textit{p}` elements of :math:`\mathrm{p}` contain details of the :math:`QR` decomposition, where :math:`\textit{p} = \mathrm{nterm}`, or :math:`\textit{p} = \mathrm{nterm}+1` if :math:`\mathrm{mean} = \texttt{'M'}`. .. _g02ee-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{fin} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{fin}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{rss} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{rss} > 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{istep} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{nterm} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{istep} \neq 0`, :math:`\mathrm{nterm} > 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{istep} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{istep} \geq 0`. (`errno` :math:`1`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'W'}` or :math:`\texttt{'U'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] < 0.0`. Constraint: :math:`\mathrm{wt}[i-1]\geq 0.0`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`3`) On entry, number of forced variables :math:`\text{}\geq n`. (`errno` :math:`3`) Degrees of freedom for error will equal :math:`0` if new variable is added, i.e., the number of variables in the model plus :math:`1` is equal to the effective number of observations. (`errno` :math:`4`) On entry, :math:`\textit{maxip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{maxip}` must be large enough to accommodate the number of terms given by :math:`\mathrm{isx}`. (`errno` :math:`4`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] < 0`. Constraint: :math:`\mathrm{isx}[i-1] \geq 0`, for :math:`i = 1,2,\ldots,m`. (`errno` :math:`4`) On entry, :math:`\mathrm{isx}[i-1] = 0`, for all :math:`i = 1,2,\ldots,m`. Constraint: at least one value of :math:`\mathrm{isx}` must be nonzero. **Warns** **NagAlgorithmicWarning** (`errno` :math:`5`) On entry, the variables forced into the model are not of full rank, i.e., some of these variables are linear combinations of others. (`errno` :math:`6`) There are no free variables, i.e., no element of :math:`\mathrm{isx} = 0`. (`errno` :math:`7`) The value of the change in the sum of squares is greater than the input value of :math:`\mathrm{rss}`. This may occur due to rounding errors if the true residual sum of squares for the new model is small relative to the residual sum of squares for the previous model. .. _g02ee-py2-py-notes: **Notes** One method of selecting a linear regression model from a given set of independent variables is by forward selection. The following procedure is used: (i) Select the best fitting independent variable, i.e., the independent variable which gives the smallest residual sum of squares. If the :math:`F`-test for this variable is greater than a chosen critical value, :math:`F_{\mathrm{c}}`, then include the variable in the model, else stop. (#) Find the independent variable that leads to the greatest reduction in the residual sum of squares when added to the current model. (#) If the :math:`F`-test for this variable is greater than a chosen critical value, :math:`F_{\mathrm{c}}`, then include the variable in the model and go to \(2), otherwise stop. At any step the variables not in the model are known as the free terms. ``linregm_fit_onestep`` allows you to specify some independent variables that must be in the model, these are known as forced variables. The computational procedure involves the use of :math:`QR` decompositions, the :math:`R` and the :math:`Q` matrices being updated as each new variable is added to the model. In addition the matrix :math:`Q^\mathrm{T}X_{\mathrm{free}}`, where :math:`X_{\mathrm{free}}` is the matrix of variables not included in the model, is updated. ``linregm_fit_onestep`` computes one step of the forward selection procedure at a call. The results produced at each step may be printed or used as inputs to :meth:`linregm_update`, in order to compute the regression coefficients for the model fitted at that step. Repeated calls to ``linregm_fit_onestep`` should be made until :math:`F < F_{\mathrm{c}}` is indicated. .. _g02ee-py2-py-references: **References** Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley Weisberg, S, 1985, `Applied Linear Regression`, Wiley """ raise NotImplementedError
[docs]def linregm_fit_stepwise(n, wmean, c, sw, isx, fin=4.0, fout=None, tau=0.000001, monlev=0, monfun=None, data=None, io_manager=None): r""" ``linregm_fit_stepwise`` calculates a full stepwise selection from :math:`p` variables by using Clarke's sweep algorithm on the correlation matrix of a design and data matrix, :math:`Z`. The (weighted) variance-covariance, (weighted) means and sum of weights of :math:`Z` must be supplied. .. _g02ef-py2-py-doc: For full information please refer to the NAG Library document for g02ef https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02eff.html .. _g02ef-py2-py-parameters: **Parameters** **n** : int The number of observations used in the calculations. **wmean** : float, array-like, shape :math:`\left(m+1\right)` The mean of the design matrix, :math:`Z`. **c** : float, array-like, shape :math:`\left(\left(m+1\right)\times \left(m+2\right)/2\right)` The upper-triangular variance-covariance matrix packed by column for the design matrix, :math:`Z`. Because the function computes the correlation matrix :math:`R` from :math:`\mathrm{c}`, the variance-covariance matrix need only be supplied up to a scaling factor. **sw** : float If weights were used to calculate :math:`\mathrm{c}` then :math:`\mathrm{sw}` is the sum of positive weight values; otherwise :math:`\mathrm{sw}` is the number of observations used to calculate :math:`\mathrm{c}`. **isx** : int, array-like, shape :math:`\left(m\right)` The value of :math:`\mathrm{isx}[\textit{j}-1]` determines the set of variables used to perform full stepwise model selection, for :math:`\textit{j} = 1,2,\ldots,m`. :math:`\mathrm{isx}[\textit{j}-1] = -1` To exclude the variable corresponding to the :math:`j`\ th column of :math:`X` from the final model. :math:`\mathrm{isx}[\textit{j}-1] = 1` To consider the variable corresponding to the :math:`j`\ th column of :math:`X` for selection in the final model. :math:`\mathrm{isx}[\textit{j}-1] = 2` To force the inclusion of the variable corresponding to the :math:`j`\ th column of :math:`X` in the final model. **fin** : float, optional The value of the variance ratio which an explanatory variable must exceed to be included in a model. **fout** : None or float, optional Note: if this argument is **None** then a default value will be used, determined as follows: :math:`\mathrm{fin}`. The explanatory variable in a model with the lowest variance ratio value is removed from the model if its value is less than :math:`\mathrm{fout}`. :math:`\mathrm{fout}` is usually set equal to the value of :math:`\mathrm{fin}`; a value less than :math:`\mathrm{fin}` is occasionally preferred. **tau** : float, optional The tolerance, :math:`\tau`, for detecting collinearities between variables when adding or removing an explanatory variable from a model. Explanatory variables deemed to be collinear are excluded from the final model. **monlev** : int, optional A value of :math:`1` for :math:`\mathrm{monlev}` enables monitoring of the model selection process; a value of :math:`0` disables it. **monfun** : None or callable monfun(flag, var, val, data=None), optional Note: if this argument is **None** then a NAG-supplied facility will be used. The function for monitoring the model selection process. **Parameters** **flag** : str, length 1 The value of :math:`\mathrm{flag}` indicates the stage of the stepwise selection of explanatory variables. :math:`\mathrm{flag} = \texttt{'A'}` Variable :math:`\mathrm{var}` was added to the current model. :math:`\mathrm{flag} = \texttt{'B'}` Beginning the backward elimination step. :math:`\mathrm{flag} = \texttt{'C'}` Variable :math:`\mathrm{var}` failed the collinearity test and is excluded from the model. :math:`\mathrm{flag} = \texttt{'D'}` Variable :math:`\mathrm{var}` was dropped from the current model. :math:`\mathrm{flag} = \texttt{'F'}` Beginning the forward selection step :math:`\mathrm{flag} = \texttt{'K'}` Backward elimination did not remove any variables from the current model. :math:`\mathrm{flag} = \texttt{'S'}` Starting stepwise selection procedure. :math:`\mathrm{flag} = \texttt{'V'}` The variance ratio for variable :math:`\mathrm{var}` takes the value :math:`\mathrm{val}`. :math:`\mathrm{flag} = \texttt{'X'}` Finished stepwise selection procedure. **var** : int The index of the explanatory variable in the design matrix :math:`Z` to which :math:`\mathrm{flag}` pertains. **val** : float If :math:`\mathrm{flag} = \texttt{'V'}`, :math:`\mathrm{val}` is the variance ratio value for the coefficient associated with explanatory variable index :math:`\mathrm{var}`. **data** : arbitrary, optional, modifiable in place User-communication data for callback functions. **data** : arbitrary, optional User-communication data for callback functions. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **isx** : int, ndarray, shape :math:`\left(m\right)` The value of :math:`\mathrm{isx}[\textit{j}-1]` indicates the status of the :math:`j`\ th explanatory variable in the model. :math:`\mathrm{isx}[\textit{j}-1] = -1` Forced exclusion. :math:`\mathrm{isx}[\textit{j}-1] = 0` Excluded. :math:`\mathrm{isx}[\textit{j}-1] = 1` Selected. :math:`\mathrm{isx}[\textit{j}-1] = 2` Forced selection. **b** : float, ndarray, shape :math:`\left(m+1\right)` :math:`\mathrm{b}[0]` contains the estimate for the intercept term in the fitted model. If :math:`\mathrm{isx}[j-1]\neq 0`, then :math:`\mathrm{b}[{j+1}-1]` contains the estimate for the :math:`j`\ th explanatory variable in the fitted model; otherwise :math:`\mathrm{b}[{j+1}-1] = 0`. **se** : float, ndarray, shape :math:`\left(m+1\right)` :math:`\mathrm{se}[\textit{j}-1]` contains the standard error for the estimate of :math:`\mathrm{b}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,m+1`. **rsq** : float The :math:`R^2`-statistic for the fitted regression model. **rms** : float The mean square of residuals for the fitted regression model. **df** : int The number of degrees of freedom for the sum of squares of residuals. .. _g02ef-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m > 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{n} > 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{sw} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{sw} > 1.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{fin} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{fin} > 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{fout} = \langle\mathit{\boldsymbol{value}}\rangle`; :math:`\mathrm{fin} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0.0\leq \mathrm{fout}\leq \mathrm{fin}`. (`errno` :math:`1`) On entry, :math:`\mathrm{tau} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tau} > 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{monlev} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{monlev} = 0` or :math:`1`. (`errno` :math:`2`) On entry at least one diagonal element of :math:`\mathrm{c}\leq 0.0`. Constraint: :math:`\mathrm{c}` must be positive definite. (`errno` :math:`2`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{isx}[\textit{j}-1] = -1`, :math:`1` or :math:`2`, for :math:`\textit{j} = 1,2,\ldots,m`. (`errno` :math:`2`) On entry, :math:`\mathrm{isx}[i-1]\neq 1`, for all :math:`i = 1,2,\ldots,m`. Constraint: there must be at least one free variable. (`errno` :math:`4`) All variables are collinear, no model to select. **Warns** **NagAlgorithmicWarning** (`errno` :math:`3`) Matrix not positive definite, results may be inaccurate. .. _g02ef-py2-py-notes: **Notes** The general multiple linear regression model is defined by .. math:: y = \beta_0+X\beta +\epsilon \text{,} where :math:`y` is a vector of :math:`n` observations on the dependent variable, :math:`\beta_0` is an intercept coefficient, :math:`X` is an :math:`n\times p` matrix of :math:`p` explanatory variables, :math:`\beta` is a vector of :math:`p` unknown coefficients, and :math:`\epsilon` is a vector of length :math:`n` of unknown, Normally distributed, random errors. ``linregm_fit_stepwise`` employs a full stepwise regression to select a subset of explanatory variables from the :math:`p` available variables (the intercept is included in the model) and computes regression coefficients and their standard errors, and various other statistical quantities, by minimizing the sum of squares of residuals. The method applies repeatedly a forward selection step followed by a backward elimination step and halts when neither step updates the current model. The criterion used to update a current model is the variance ratio of residual sum of squares. Let :math:`s_1` and :math:`s_2` be the residual sum of squares of the current model and this model after undergoing a single update, with degrees of freedom :math:`q_1` and :math:`q_2`, respectively. Then the condition: .. math:: \frac{{\left(s_2-s_1\right)/\left(q_2-q_1\right)}}{{s_1/q_1}} > f_1\text{,} must be satisfied if a variable :math:`k` will be considered for entry to the current model, and the condition: .. math:: \frac{{\left(s_1-s_2\right)/\left(q_1-q_2\right)}}{{s_1/q_1}} < f_2\text{,} must be satisfied if a variable :math:`k` will be considered for removal from the current model, where :math:`f_1` and :math:`f_2` are user-supplied values and :math:`f_2\leq f_1`. In the entry step the entry statistic is computed for each variable not in the current model. If no variable is associated with a test value that exceeds :math:`f_1` then this step is terminated; otherwise the variable associated with the largest value for the entry statistic is entered into the model. In the removal step the removal statistic is computed for each variable in the current model. If no variable is associated with a test value less than :math:`f_2` then this step is terminated; otherwise the variable associated with the smallest value for the removal statistic is removed from the model. The data values :math:`X` and :math:`y` are not provided as input to the function. Instead, summary statistics of the design and data matrix :math:`Z = \left(X | y\right)` are required. Explanatory variables are entered into and removed from the current model by using sweep operations on the correlation matrix :math:`R` of :math:`Z`, given by: .. math:: R = \begin{pmatrix}1&\ldots &r_{{1p}}&r_{{1y}}\\ \vdots &⋱& \vdots & \vdots \\r_{{p1}}&\ldots &1&r_{{py}}\\r_{{y1}}&\ldots &r_{{yp}}&1\end{pmatrix}\text{,} where :math:`r_{{\textit{i}\textit{j}}}` is the correlation between the explanatory variables :math:`\textit{i}` and :math:`\textit{j}`, for :math:`\textit{j} = 1,2,\ldots,p`, for :math:`\textit{i} = 1,2,\ldots,p`, and :math:`r_{{yi}}` (and :math:`r_{{iy}}`) is the correlation between the response variable :math:`y` and the :math:`\textit{i}`\ th explanatory variable, for :math:`\textit{i} = 1,2,\ldots,p`. A sweep operation on the :math:`k`\ th row and column (:math:`k\leq p`) of :math:`R` replaces: .. math:: \begin{array}{l} r_{{kk}} \text{ by } -1 / r_{{kk}} \text{;} \\ r_{{ik}} \text{ by } r_{{ik}} / \left\lvert r_{{kk}}\right\rvert \text{, }\quad i = 1,2,\ldots,p+1 \text{ } \left(i\neq k\right) \text{;} \\ r_{{kj}} \text{ by } r_{{kj}} / \left\lvert r_{{kk}}\right\rvert \text{, }\quad j = 1,2,\ldots,p+1 \text{ } \left(j\neq k\right) \text{;} \\ r_{{ij}} \text{ by } r_{{ij}} - r_{{ik}} r_{{kj}} / \left\lvert r_{{kk}}\right\rvert \text{, } i = 1,2,\ldots,p+1 \text{ } \left(i\neq k\right) \text{; } j = 1,2,\ldots,p+1 \text{ } \left(j\neq k\right) \text{.} \end{array} The :math:`k`\ th explanatory variable is eligible for entry into the current model if it satisfies the collinearity tests: :math:`r_{{kk}} > \tau` and .. math:: \left(r_{{ii}}-\frac{{r_{{ik}}r_{{ki}}}}{{r_{{kk}}}}\right)\tau \leq 1\text{,} for a user-supplied value (:math:`> 0`) of :math:`\tau` and where the index :math:`i` runs over explanatory variables in the current model. The sweep operation is its own inverse, therefore, pivoting on an explanatory variable :math:`k` in the current model has the effect of removing it from the model. Once the stepwise model selection procedure is finished, the function calculates: (a) the least squares estimate for the :math:`i`\ th explanatory variable included in the fitted model; (#) standard error estimates for each coefficient in the final model; (#) the square root of the mean square of residuals and its degrees of freedom; (#) the multiple correlation coefficient. The function makes use of the symmetry of the sweep operations and correlation matrix which reduces by almost one half the storage and computation required by the sweep algorithm, see Clarke (1981) for details. .. _g02ef-py2-py-references: **References** Clarke, M R B, 1981, `Algorithm AS 178: the Gauss--Jordan sweep operator with detection of collinearity`, Appl. Statist. (31), 166--169 Dempster, A P, 1969, `Elements of Continuous Multivariate Analysis`, Addison--Wesley Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley See Also -------- :meth:`naginterfaces.library.examples.correg.linregm_fit_stepwise_ex.main` """ raise NotImplementedError
[docs]def linregm_stat_resinf(n, ip, res, h, rms): r""" ``linregm_stat_resinf`` calculates two types of standardized residuals and two measures of influence for a linear regression. .. _g02fa-py2-py-doc: For full information please refer to the NAG Library document for g02fa https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02faf.html .. _g02fa-py2-py-parameters: **Parameters** **n** : int :math:`n`, the number of observations included in the regression. **ip** : int :math:`p`, the number of linear parameters estimated in the regression model. **res** : float, array-like, shape :math:`\left(\textit{nres}\right)` The residuals, :math:`r_i`. **h** : float, array-like, shape :math:`\left(\textit{nres}\right)` The diagonal elements of :math:`H`, :math:`h_i`, corresponding to the residuals in :math:`\mathrm{res}`. **rms** : float The estimate of :math:`\sigma^2` based on all :math:`n` observations, :math:`s^2`, i.e., the residual mean square. **Returns** **sres** : float, ndarray, shape :math:`\left(\textit{nres}, 4\right)` The standardized residuals and influence statistics. For the observation with residual, :math:`r_i`, given in :math:`\mathrm{res}[i-1]`. :math:`\mathrm{sres}[i-1,0]` Is the internally standardized residual, :math:`\mathrm{RI}_i`. :math:`\mathrm{sres}[i-1,1]` Is the externally standardized residual, :math:`\mathrm{RE}_i`. :math:`\mathrm{sres}[i-1,2]` Is Cook's :math:`D` statistic, :math:`D_i`. :math:`\mathrm{sres}[i-1,3]` Is Atkinson's :math:`T` statistic, :math:`T_i`. .. _g02fa-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{nres} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nres} \leq \mathrm{n}`. (`errno` :math:`1`) On entry, :math:`\mathrm{rms} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{rms} > 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{ip} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`{\mathrm{n}-1} > \mathrm{ip}`. (`errno` :math:`1`) On entry, :math:`\mathrm{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ip} \geq 1`. (`errno` :math:`1`) On entry, :math:`\textit{nres} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nres} \geq 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{h}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0.0 < \mathrm{h}[i] < 1.0`, for all :math:`i`. (`errno` :math:`3`) On entry, a value in :math:`\mathrm{res}` is too large for given :math:`\mathrm{rms}`. :math:`\mathrm{res}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{rms} = \langle\mathit{\boldsymbol{value}}\rangle`. .. _g02fa-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` For the general linear regression model .. math:: y = X\beta +\epsilon \text{,} .. rst-class:: nag-rules-none nag-align-left +-----+-------------------------------------------------------------------------------------------------------------------------------------------+ |where|:math:`y` is a vector of length :math:`n` of the dependent variable, | +-----+-------------------------------------------------------------------------------------------------------------------------------------------+ | |:math:`X` is an :math:`n\times p` matrix of the independent variables, | +-----+-------------------------------------------------------------------------------------------------------------------------------------------+ | |:math:`\beta` is a vector of length :math:`p` of unknown parameters, | +-----+-------------------------------------------------------------------------------------------------------------------------------------------+ |and |:math:`\epsilon` is a vector of length :math:`n` of unknown random errors such that :math:`\mathrm{var}\left(\epsilon \right) = \sigma^2I`.| +-----+-------------------------------------------------------------------------------------------------------------------------------------------+ The residuals are given by .. math:: r = y-\hat{y} = y-X\hat{\beta } and the fitted values, :math:`\hat{y} = X\hat{\beta }`, can be written as :math:`Hy` for an :math:`n\times n` matrix :math:`H`. The :math:`i`\ th diagonal elements of :math:`H`, :math:`h_i`, give a measure of the influence of the :math:`i`\ th values of the independent variables on the fitted regression model. The values of :math:`r` and the :math:`h_i` are returned by :meth:`linregm_fit`. ``linregm_stat_resinf`` calculates statistics which help to indicate if an observation is extreme and having an undue influence on the fit of the regression model. Two types of standardized residual are calculated: (i) The :math:`i`\ th residual is standardized by its variance when the estimate of :math:`\sigma^2`, :math:`s^2`, is calculated from all the data; this is known as internal Studentization. .. math:: RI_i = \frac{r_i}{{s\sqrt{1-h_i}}}\text{.} (#) The :math:`i`\ th residual is standardized by its variance when the estimate of :math:`\sigma^2`, :math:`s_{{-i}}^2` is calculated from the data excluding the :math:`i`\ th observation; this is known as external Studentization. .. math:: RE_i = \frac{r_i}{{s_{{-i}}\sqrt{1-h_i}}} = r_i\sqrt{\frac{{n-p-1}}{{n-p-RI_i^2}}}\text{.} The two measures of influence are: (i) Cook's :math:`D` .. math:: D_i = \frac{1}{p}RE_i^2\frac{h_i}{{1-h_i}}\text{.} (#) Atkinson's :math:`T` .. math:: T_i = \left\lvert RE_i\right\rvert \sqrt{\left(\frac{{n-p}}{p}\right)\left(\frac{h_i}{{1-h_i}}\right)}\text{.} .. _g02fa-py2-py-references: **References** Atkinson, A C, 1981, `Two graphical displays for outlying and influential observations in regression`, Biometrika (68), 13--20 Cook, R D and Weisberg, S, 1982, `Residuals and Influence in Regression`, Chapman and Hall """ raise NotImplementedError
[docs]def linregm_stat_durbwat(ip, res): r""" ``linregm_stat_durbwat`` calculates the Durbin--Watson statistic, for a set of residuals, and the upper and lower bounds for its significance. .. _g02fc-py2-py-doc: For full information please refer to the NAG Library document for g02fc https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02fcf.html .. _g02fc-py2-py-parameters: **Parameters** **ip** : int :math:`p`, the number of independent variables in the regression model, including the mean. **res** : float, array-like, shape :math:`\left(n\right)` The residuals, :math:`r_1,r_2,\ldots,r_n`. **Returns** **d** : float The Durbin--Watson statistic, :math:`d`. **pdl** : float Lower bound for the significance of the Durbin--Watson statistic, :math:`p_{\mathrm{l}}`. **pdu** : float Upper bound for the significance of the Durbin--Watson statistic, :math:`p_{\mathrm{u}}`. .. _g02fc-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ip} \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > \mathrm{ip}`. (`errno` :math:`2`) On entry, mean of :math:`\mathrm{res} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: the mean of the residuals :math:`\text{}\leq \sqrt{\epsilon }`, where :math:`\epsilon = \text{machine precision}`. (`errno` :math:`3`) On entry, all residuals are identical. .. _g02fc-py2-py-notes: **Notes** For the general linear regression model .. math:: y = X\beta +\epsilon \text{,} .. rst-class:: nag-rules-none nag-align-left +-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |where|:math:`y` is a vector of length :math:`n` of the dependent variable, :math:`X` is an :math:`n\times p` matrix of the independent variables, :math:`\beta` is a vector of length :math:`p` of unknown parameters,| +-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |and |:math:`\epsilon` is a vector of length :math:`n` of unknown random errors. | +-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ The residuals are given by .. math:: r = y-\hat{y} = y-X\hat{\beta } and the fitted values, :math:`\hat{y} = X\hat{\beta }`, can be written as :math:`Hy` for an :math:`n\times n` matrix :math:`H`. Note that when a mean term is included in the model the sum of the residuals is zero. If the observations have been taken serially, that is :math:`y_1,y_2,\ldots,y_n` can be considered as a time series, the Durbin--Watson test can be used to test for serial correlation in the :math:`\epsilon_i`, see Durbin and Watson (1950), Durbin and Watson (1951) and Durbin and Watson (1971). The Durbin--Watson statistic is .. math:: d = \frac{{\sum_{{i = 1}}^{{n-1}}\left(r_{{i+1}}-r_i\right)^2}}{{\sum_{{i = 1}}^nr_i^2}}\text{.} Positive serial correlation in the :math:`\epsilon_i` will lead to a small value of :math:`d` while for independent errors :math:`d` will be close to :math:`2`. Durbin and Watson show that the exact distribution of :math:`d` depends on the eigenvalues of the matrix :math:`HA` where the matrix :math:`A` is such that :math:`d` can be written as .. math:: d = \frac{{r^\mathrm{T}Ar}}{{r^\mathrm{T}r}} and the eigenvalues of the matrix :math:`A` are :math:`\lambda_j = \left(1-\cos\left(\pi j/n\right)\right)`, for :math:`j = 1,2,\ldots,n-1`. However bounds on the distribution can be obtained, the lower bound being .. math:: d_{\mathrm{l}} = \frac{{\sum_{{i = 1}}^{{n-p}}\lambda_iu_i^2}}{{\sum_{{i = 1}}^{{n-p}}u_i^2}} and the upper bound being .. math:: d_{\mathrm{u}} = \frac{{\sum_{{i = 1}}^{{n-p}}\lambda_{{i-1+p}}u_i^2}}{{\sum_{{i = 1}}^{{n-p}}u_i^2}}\text{,} where the :math:`u_i` are independent standard Normal variables. The lower tail probabilities associated with these bounds, :math:`p_{\mathrm{l}}` and :math:`p_{\mathrm{u}}`, are computed by :meth:`stat.prob_durbin_watson <naginterfaces.library.stat.prob_durbin_watson>`. The interpretation of the bounds is that, for a test of size (significance) :math:`\alpha`, if :math:`p_l\leq \alpha` the test is significant, if :math:`p_u > \alpha` the test is not significant, while if :math:`p_{\mathrm{l}} > \alpha` and :math:`p_{\mathrm{u}}\leq \alpha` no conclusion can be reached. The above probabilities are for the usual test of positive auto-correlation. If the alternative of negative auto-correlation is required, then a call to :meth:`stat.prob_durbin_watson <naginterfaces.library.stat.prob_durbin_watson>` should be made with the argument :math:`\mathrm{d}` taking the value of :math:`4-d`; see Newbold (1988). .. _g02fc-py2-py-references: **References** Durbin, J and Watson, G S, 1950, `Testing for serial correlation in least squares regression. I`, Biometrika (37), 409--428 Durbin, J and Watson, G S, 1951, `Testing for serial correlation in least squares regression. II`, Biometrika (38), 159--178 Durbin, J and Watson, G S, 1971, `Testing for serial correlation in least squares regression. III`, Biometrika (58), 1--19 Granger, C W J and Newbold, P, 1986, `Forecasting Economic Time Series`, (2nd Edition), Academic Press Newbold, P, 1988, `Statistics for Business and Economics`, Prentice--Hall """ raise NotImplementedError
[docs]def glm_normal(x, isx, y, link='I', mean='M', wt=None, s=0.0, a=0.0, v=None, tol=0.0, maxit=10, iprint=0, eps=0.0, io_manager=None): r""" ``glm_normal`` fits a generalized linear model with normal errors. .. _g02ga-py2-py-doc: For full information please refer to the NAG Library document for g02ga https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02gaf.html .. _g02ga-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th independent variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **isx** : int, array-like, shape :math:`\left(m\right)` Indicates which independent variables are to be included in the model. If :math:`\mathrm{isx}[j-1] > 0`, the variable contained in the :math:`j`\ th column of :math:`\mathrm{x}` is included in the regression model. **y** : float, array-like, shape :math:`\left(n\right)` The observations on the dependent variable, :math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **link** : str, length 1, optional Indicates which link function is to be used. :math:`\mathrm{link} = \texttt{'E'}` An exponent link is used. :math:`\mathrm{link} = \texttt{'I'}` An identity link is used. :math:`\mathrm{link} = \texttt{'L'}` A log link is used. :math:`\mathrm{link} = \texttt{'S'}` A square root link is used. :math:`\mathrm{link} = \texttt{'R'}` A reciprocal link is used. **mean** : str, length 1, optional Indicates if a mean term is to be included. :math:`\mathrm{mean} = \texttt{'M'}` A mean term, intercept, will be included in the model. :math:`\mathrm{mean} = \texttt{'Z'}` The model will pass through the origin, zero-point. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If provided :math:`\mathrm{wt}` must contain the weights to be used with the model. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. If :math:`\mathrm{wt}` is not provided the effective number of observations is :math:`n`. **s** : float, optional The scale parameter for the model, :math:`\sigma^2`. If :math:`\mathrm{s} = 0.0`, the scale parameter is estimated with the function using the residual mean square. **a** : float, optional If :math:`\mathrm{link} = \texttt{'E'}`, :math:`\mathrm{a}` must contain the power of the exponential. If :math:`\mathrm{link} \neq \texttt{'E'}`, :math:`\mathrm{a}` is not referenced. **v** : None or float, array-like, shape :math:`\left(n, \textit{ip}+7\right)`, optional If :math:`\mathrm{v}\text{ is not }\mathbf{None}`, :math:`\mathrm{v}[\textit{i}-1,6]`, for :math:`\textit{i} = 1,2,\ldots,n`, must contain the offset values :math:`o_{\textit{i}}`. All other values need not be set. **tol** : float, optional Indicates the accuracy required for the fit of the model. The iterative weighted least squares procedure is deemed to have converged if the absolute change in deviance between interactions is less than :math:`\mathrm{tol}\times \left(1.0+\text{current residual sum of squares}\right)`. This is approximately an absolute precision if the residual sum of squares is small and a relative precision if the residual sum of squares is large. If :math:`0.0\leq \mathrm{tol} < \text{machine precision}`, ``glm_normal`` will use :math:`10\times \text{machine precision}`. **maxit** : int, optional The maximum number of iterations for the iterative weighted least squares. If :math:`\mathrm{maxit} = 0`, a default value of :math:`10` is used. **iprint** : int, optional Indicates if the printing of information on the iterations is required. :math:`\mathrm{iprint}\leq 0` There is no printing. :math:`\mathrm{iprint} > 0` Every :math:`\mathrm{iprint}` iteration, the following is printed: the deviance, the current estimates, and if the weighted least squares equations are singular, then this is indicated. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see :class:`~naginterfaces.base.utils.FileObjManager`). **eps** : float, optional The value of :math:`\mathrm{eps}` is used to decide if the independent variables are of full rank and, if not, what is the rank of the independent variables. The smaller the value of :math:`\mathrm{eps}` the stricter the criterion for selecting the singular value decomposition. If :math:`0.0\leq \mathrm{eps} < \text{machine precision}`, the function will use machine precision instead. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **s** : float If on input :math:`\mathrm{s} = 0.0`, :math:`\mathrm{s}` contains the estimated value of the scale parameter, :math:`\hat{\sigma }^2`. If on input :math:`\mathrm{s}\neq 0.0`, :math:`\mathrm{s}` is unchanged on exit. **rss** : float The residual sum of squares for the fitted model. **idf** : int The degrees of freedom associated with the residual sum of squares for the fitted model. **b** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The estimates of the parameters of the generalized linear model, :math:`\hat{\beta }`. If :math:`\mathrm{mean} = \texttt{'M'}`, :math:`\mathrm{b}[0]` will contain the estimate of the mean parameter and :math:`\mathrm{b}[i]` will contain the coefficient of the variable contained in column :math:`j` of :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. If :math:`\mathrm{mean} = \texttt{'Z'}`, :math:`\mathrm{b}[i-1]` will contain the coefficient of the variable contained in column :math:`j` of :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. **irank** : int The rank of the independent variables. If the model is of full rank, :math:`\mathrm{irank} = \textit{ip}`. If the model is not of full rank, :math:`\mathrm{irank}` is an estimate of the rank of the independent variables. :math:`\mathrm{irank}` is calculated as the number of singular values greater than :math:`\mathrm{eps}\times \text{}` (largest singular value). It is possible for the SVD to be carried out but for :math:`\mathrm{irank}` to be returned as :math:`\textit{ip}`. **se** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The standard errors of the linear parameters. :math:`\mathrm{se}[\textit{i}-1]` contains the standard error of the parameter estimate in :math:`\mathrm{b}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. **cov** : float, ndarray, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The upper triangular part of the variance-covariance matrix of the :math:`\textit{ip}` parameter estimates given in :math:`\mathrm{b}`. They are stored packed by column, i.e., the covariance between the parameter estimate given in :math:`\mathrm{b}[i-1]` and the parameter estimate given in :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[\left(j\times \left(j-1\right)/2+i\right)-1]`. **v** : float, ndarray, shape :math:`\left(n, \textit{ip}+7\right)` Auxiliary information on the fitted model. .. rst-class:: nag-rules-none nag-align-left +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,0]` |contains the linear predictor value, :math:`\eta_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,1]` |contains the fitted value, :math:`\hat{\mu }_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,2]` |is only included for consistency with other functions. :math:`\mathrm{v}[\textit{i}-1,2] = 1.0`, for :math:`\textit{i} = 1,2,\ldots,n`.| +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,3]` |contains the square root of the working weight, :math:`w_{\textit{i}}^{\frac{1}{2}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,4]` |contains the residual, :math:`r_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,5]` |contains the leverage, :math:`h_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,6]` |contains the offset, for :math:`i = 1,2,\ldots,n`. If :math:`\mathrm{v}` is **None** on entry, all values will be zero. | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,j-1]`|for :math:`j = 8,\ldots,\textit{ip}+7`, contains the results of the :math:`QR` decomposition or the singular value decomposition. | +---------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ If the model is not of full rank, i.e., :math:`\mathrm{irank} < \textit{ip}`, the first :math:`\textit{ip}` rows of columns :math:`8` to :math:`\textit{ip}+7` contain the :math:`P^*` matrix. .. _g02ga-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{eps} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{eps}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} \geq 0`. (`errno` :math:`1`) On entry, :math:`\textit{offset} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{offset} = \texttt{'Y'}` or :math:`\texttt{'N'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{link} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{link} = \texttt{'E'}`, :math:`\texttt{'I'}`, :math:`\texttt{'L'}`, :math:`\texttt{'S'}` or :math:`\texttt{'R'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{a} = 0.0` and :math:`\mathrm{link} = \texttt{'E'}`. Constraint: if :math:`\mathrm{link} = \texttt{'E'}`, :math:`\mathrm{a}\neq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{s} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{s}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'W'}` or :math:`\texttt{'U'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] < 0.0`. Constraint: :math:`\mathrm{wt}[i-1] \geq 0.0`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`3`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] < 0`. Constraint: :math:`\mathrm{isx}[j-1] \geq 0.0`, for :math:`j = 1,2,\ldots,m`. (`errno` :math:`3`) On entry, :math:`\textit{ip}` incompatible with number of nonzero values of :math:`\mathrm{isx}`: :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`3`) Number of requested x-variables greater than :math:`\textit{n}`. (`errno` :math:`4`) A fitted value is at a boundary. This will only occur with :math:`\mathrm{link} = \texttt{'L'}`, :math:`\texttt{'R'}` or :math:`\texttt{'E'}`. This may occur if there are small values of :math:`y` and the model is not suitable for the data. The model should be reformulated with, perhaps, some observations dropped. (`errno` :math:`5`) SVD solution failed to converge. (`errno` :math:`6`) The iterative weighted least squares has failed to converge in :math:`\mathrm{maxit}` (or default :math:`10`) iterations. The value of :math:`\mathrm{maxit}` could be increased but it may be advantageous to examine the convergence using the :math:`\mathrm{iprint}` option. This may indicate that the convergence is slow because the solution is at a boundary in which case it may be better to reformulate the model. **Warns** **NagAlgorithmicWarning** (`errno` :math:`7`) The rank of the model has changed during the weighted least squares iterations. The estimate for :math:`\beta` returned may be reasonable, but you should check how the deviance has changed during iterations. (`errno` :math:`8`) The degrees of freedom for error are :math:`0`. A saturated model has been fitted. .. _g02ga-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` A generalized linear model with Normal errors consists of the following elements: (a) a set of :math:`n` observations, :math:`y_i`, from a Normal distribution with probability density function: .. math:: \frac{1}{{\sqrt{2\pi }\sigma }}\mathrm{exp}\left(-\frac{\left(y-\mu \right)^2}{{2\sigma^2}}\right)\text{,} where :math:`\mu` is the mean and :math:`\sigma^2` is the variance. (#) :math:`X`, a set of :math:`p` independent variables for each observation, :math:`x_1,x_2,\ldots,x_p`. (#) a linear model: .. math:: \eta = \sum \beta_jx_j\text{.} (#) a link between the linear predictor, :math:`\eta`, and the mean of the distribution, :math:`\mu`, i.e., :math:`\eta = g\left(\mu \right)`. The possible link functions are: (i) exponent link: :math:`\eta = \mu^a`, for a constant :math:`a`, (#) identity link: :math:`\eta = \mu`, (#) log link: :math:`\eta = \log\left(\mu \right)`, (#) square root link: :math:`\eta = \sqrt{\mu }`, (#) reciprocal link: :math:`\eta = \frac{1}{\mu }`. (#) a measure of fit, the residual sum of squares :math:`\text{} = \sum \left(y_i-\hat{\mu }_i\right)^2`. The linear parameters are estimated by iterative weighted least squares. An adjusted dependent variable, :math:`z`, is formed: .. math:: z = \eta +\left(y-\mu \right)\frac{{d\eta }}{{d\mu }} and a working weight, :math:`w`, .. math:: w^{-1} = \left(\frac{{d\eta }}{{d\mu }}\right)^2\text{.} At each iteration an approximation to the estimate of :math:`\beta`, :math:`\hat{\beta }`, is found by the weighted least squares regression of :math:`z` on :math:`X` with weights :math:`w`. ``glm_normal`` finds a :math:`QR` decomposition of :math:`w^{{\frac{1}{2}}}X`, i.e., :math:`w^{{\frac{1}{2}}}X = QR` where :math:`R` is a :math:`p\times p` triangular matrix and :math:`Q` is an :math:`n\times p` column orthogonal matrix. If :math:`R` is of full rank, then :math:`\hat{\beta }` is the solution to .. math:: R\hat{\beta } = Q^\mathrm{T}w^{{\frac{1}{2}}}z\text{.} If :math:`R` is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of :math:`R`. .. math:: R = Q_*\begin{pmatrix}D&0\\0&0\end{pmatrix}P^\mathrm{T}\text{,} where :math:`D` is a :math:`k\times k` diagonal matrix with nonzero diagonal elements, :math:`k` being the rank of :math:`R` and :math:`w^{{\frac{1}{2}}}X`. This gives the solution .. math:: \hat{\beta } = P_1D^{-1}\begin{pmatrix}Q_*&0\\0&I\end{pmatrix}Q^\mathrm{T}w^{{\frac{1}{2}}}z :math:`P_1` being the first :math:`k` columns of :math:`P`, i.e., :math:`P = \left(P_1P_0\right)`. The iterations are continued until there is only a small change in the residual sum of squares. The initial values for the algorithm are obtained by taking .. math:: \hat{\eta } = g\left(y\right)\text{.} The fit of the model can be assessed by examining and testing the residual sum of squares, in particular comparing the difference in residual sums of squares between nested models, i.e., when one model is a sub-model of the other. Let :math:`\mathrm{RSS}_f` be the residual sum of squares for the full model with degrees of freedom :math:`\nu_f` and let :math:`\mathrm{RSS}_s` be the residual sum of squares for the sub-model with degrees of freedom :math:`\nu_s` then: .. math:: F = \frac{{\left(\mathrm{RSS}_s-\mathrm{RSS}_f\right)/\left(\nu_s-\nu_f\right)}}{{\mathrm{RSS}_f/\nu_f}}\text{,} has, approximately, an :math:`F`-distribution with (:math:`\nu_s-\nu_f`), :math:`\nu_f` degrees of freedom. The parameter estimates, :math:`\hat{\beta }`, are asymptotically Normally distributed with variance-covariance matrix: :math:`C = R^{-1}{R^{-1}}^\mathrm{T}\sigma^2` in the full rank case, otherwise :math:`C = P_1D^{-2}P_1^\mathrm{T}\sigma^2` The residuals and influence statistics can also be examined. The estimated linear predictor :math:`\hat{\eta } = X\hat{\beta }`, can be written as :math:`Hw^{{\frac{1}{2}}}z` for an :math:`n\times n` matrix :math:`H`. The :math:`i`\ th diagonal elements of :math:`H`, :math:`h_i`, give a measure of the influence of the :math:`i`\ th values of the independent variables on the fitted regression model. These are sometimes known as leverages. The fitted values are given by :math:`\hat{\mu } = g^{-1}\left(\hat{\eta }\right)`. ``glm_normal`` also computes the residuals, :math:`r`: .. math:: r_i = y_i-\hat{\mu }_i\text{.} An option allows prior weights :math:`\omega_i` to be used; this gives a model with: .. math:: \sigma_i^2 = \frac{\sigma^2}{\omega_i}\text{.} In many linear regression models the first term is taken as a mean term or an intercept, i.e., :math:`x_{{\textit{i},1}} = 1`, for :math:`\textit{i} = 1,2,\ldots,n`; this is provided as an option. Often only some of the possible independent variables are included in a model, the facility to select variables to be included in the model is provided. If part of the linear predictor can be represented by a variable with a known coefficient, then this can be included in the model by using an offset, :math:`o`: .. math:: \eta = o+\sum \beta_jx_j\text{.} If the model is not of full rank the solution given will be only one of the possible solutions. Other estimates may be obtained by applying constraints to the parameters. These solutions can be obtained by using :meth:`glm_constrain` after using ``glm_normal``. Only certain linear combinations of the parameters will have unique estimates; these are known as estimable functions and can be estimated and tested using :meth:`glm_estfunc`. Details of the SVD are made available, in the form of the matrix :math:`P^*`: .. math:: P^* = \begin{pmatrix}D^{-1} P_1^\mathrm{T} \\ P_0^\mathrm{T} \end{pmatrix}\text{.} .. _g02ga-py2-py-references: **References** Cook, R D and Weisberg, S, 1982, `Residuals and Influence in Regression`, Chapman and Hall McCullagh, P and Nelder, J A, 1983, `Generalized Linear Models`, Chapman and Hall See Also -------- :meth:`naginterfaces.library.examples.correg.glm_normal_ex.main` """ raise NotImplementedError
[docs]def glm_binomial(x, isx, y, t, link='G', mean='M', wt=None, v=None, tol=0.0, maxit=10, iprint=0, eps=0.0, io_manager=None): r""" ``glm_binomial`` fits a generalized linear model with binomial errors. .. _g02gb-py2-py-doc: For full information please refer to the NAG Library document for g02gb https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02gbf.html .. _g02gb-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th independent variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **isx** : int, array-like, shape :math:`\left(m\right)` Indicates which independent variables are to be included in the model. If :math:`\mathrm{isx}[j-1] > 0`, the variable contained in the :math:`j`\ th column of :math:`\mathrm{x}` is included in the regression model. **y** : float, array-like, shape :math:`\left(n\right)` The observations on the dependent variable, :math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **t** : float, array-like, shape :math:`\left(n\right)` :math:`t`, the binomial denominator. **link** : str, length 1, optional Indicates which link function is to be used. :math:`\mathrm{link} = \texttt{'G'}` A logistic link is used. :math:`\mathrm{link} = \texttt{'P'}` A probit link is used. :math:`\mathrm{link} = \texttt{'C'}` A complementary log-log link is used. **mean** : str, length 1, optional Indicates if a mean term is to be included. :math:`\mathrm{mean} = \texttt{'M'}` A mean term, intercept, will be included in the model. :math:`\mathrm{mean} = \texttt{'Z'}` The model will pass through the origin, zero-point. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If provided :math:`\mathrm{wt}` must contain the weights to be used with the model. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. If :math:`\mathrm{wt}` is not provided the effective number of observations is :math:`n`. **v** : None or float, array-like, shape :math:`\left(n, \textit{ip}+7\right)`, optional If :math:`\mathrm{v}\text{ is not }\mathbf{None}`, :math:`\mathrm{v}[\textit{i}-1,6]`, for :math:`\textit{i} = 1,2,\ldots,n`, must contain the offset values :math:`o_{\textit{i}}`. All other values need not be set. **tol** : float, optional Indicates the accuracy required for the fit of the model. The iterative weighted least squares procedure is deemed to have converged if the absolute change in deviance between iterations is less than :math:`\mathrm{tol}\times \left(1.0+\text{Current Deviance}\right)`. This is approximately an absolute precision if the deviance is small and a relative precision if the deviance is large. If :math:`0.0\leq \mathrm{tol} < \text{machine precision}`, the function will use :math:`10\times \text{machine precision}` instead. **maxit** : int, optional The maximum number of iterations for the iterative weighted least squares. If :math:`\mathrm{maxit} = 0`, a default value of :math:`10` is used. **iprint** : int, optional Indicates if the printing of information on the iterations is required. :math:`\mathrm{iprint}\leq 0` There is no printing. :math:`\mathrm{iprint} > 0` The following is printed every :math:`\mathrm{iprint}` iterations: the deviance, the current estimates, and if the weighted least squares equations are singular, then this is indicated. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see :class:`~naginterfaces.base.utils.FileObjManager`). **eps** : float, optional The value of :math:`\mathrm{eps}` is used to decide if the independent variables are of full rank and, if not, what is the rank of the independent variables. The smaller the value of :math:`\mathrm{eps}` the stricter the criterion for selecting the singular value decomposition. If :math:`0.0\leq \mathrm{eps} < \text{machine precision}`, the function will use machine precision instead. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **dev** : float The deviance for the fitted model. **idf** : int The degrees of freedom associated with the deviance for the fitted model. **b** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The estimates of the parameters of the generalized linear model, :math:`\hat{\beta }`. If :math:`\mathrm{mean} = \texttt{'M'}`, the first element of :math:`\mathrm{b}` will contain the estimate of the mean parameter and :math:`\mathrm{b}[i]` will contain the coefficient of the variable contained in column :math:`j` of :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. If :math:`\mathrm{mean} = \texttt{'Z'}`, :math:`\mathrm{b}[i-1]` will contain the coefficient of the variable contained in column :math:`j` of :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. **irank** : int The rank of the independent variables. If the model is of full rank, :math:`\mathrm{irank} = \textit{ip}`. If the model is not of full rank, :math:`\mathrm{irank}` is an estimate of the rank of the independent variables. :math:`\mathrm{irank}` is calculated as the number of singular values greater that :math:`\mathrm{eps}\times \text{}`\ (largest singular value). It is possible for the SVD to be carried out but for :math:`\mathrm{irank}` to be returned as :math:`\textit{ip}`. **se** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The standard errors of the linear parameters. :math:`\mathrm{se}[\textit{i}-1]` contains the standard error of the parameter estimate in :math:`\mathrm{b}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. **cov** : float, ndarray, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The upper triangular part of the variance-covariance matrix of the :math:`\textit{ip}` parameter estimates given in :math:`\mathrm{b}`. They are stored in packed form by column, i.e., the covariance between the parameter estimate given in :math:`\mathrm{b}[i-1]` and the parameter estimate given in :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[\left(j\times \left(j-1\right)/2+i\right)-1]`. **v** : float, ndarray, shape :math:`\left(n, \textit{ip}+7\right)` Auxiliary information on the fitted model. .. rst-class:: nag-rules-none nag-align-left +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,0]` |contains the linear predictor value, :math:`\eta_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,1]` |contains the fitted value, :math:`\hat{\mu }_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,2]` |contains the variance standardization, :math:`\frac{1}{\tau_{\textit{i}}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,3]` |contains the square root of the working weight, :math:`w_{\textit{i}}^{\frac{1}{2}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,4]` |contains the deviance residual, :math:`r_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,5]` |contains the leverage, :math:`h_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,6]` |contains the offset, :math:`o_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. If :math:`\mathrm{v}` is **None** on entry, all values will be zero.| +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,j-1]`|for :math:`j = 8,\ldots,\textit{ip}+7`, contains the results of the :math:`QR` decomposition or the singular value decomposition. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ If the model is not of full rank, i.e., :math:`\mathrm{irank} < \textit{ip}`, the first :math:`\textit{ip}` rows of columns :math:`8` to :math:`\textit{ip}+7` contain the :math:`P^*` matrix. .. _g02gb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{eps} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{eps}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} \geq 0`. (`errno` :math:`1`) On entry, :math:`\textit{offset} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{offset} = \texttt{'Y'}` or :math:`\texttt{'N'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{link} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{link} = \texttt{'G'}`, :math:`\texttt{'P'}` or :math:`\texttt{'C'}`. (`errno` :math:`1`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'W'}` or :math:`\texttt{'U'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] < 0.0`. Constraint: :math:`\mathrm{wt}[i-1] \geq 0.0`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`3`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] < 0`. Constraint: :math:`\mathrm{isx}[j-1] \geq 0.0`, for :math:`j = 1,2,\ldots,m`. (`errno` :math:`3`) On entry, :math:`\textit{ip}` incompatible with number of nonzero values of :math:`\mathrm{isx}`: :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`3`) Number of requested x-variables greater than :math:`\textit{n}`. (`errno` :math:`4`) On entry, :math:`\mathrm{t}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{t}[i-1] \geq 0`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`5`) On entry, :math:`\mathrm{y}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{t}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0.0\leq \mathrm{y}[i-1]\leq \mathrm{t}[i-1]`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`6`) A fitted value is at the boundary, i.e., :math:`0.0` or :math:`1.0`. This may occur if there are :math:`y` values of :math:`0.0` or :math:`t` and the model is too complex for the data. The model should be reformulated with, perhaps, some observations dropped. (`errno` :math:`7`) SVD solution failed to converge. (`errno` :math:`8`) The iterative weighted least squares has failed to converge in :math:`\mathrm{maxit}` (or default :math:`10`) iterations. The value of :math:`\mathrm{maxit}` could be increased but it may be advantageous to examine the convergence using the :math:`\mathrm{iprint}` option. This may indicate that the convergence is slow because the solution is at a boundary in which case it may be better to reformulate the model. **Warns** **NagAlgorithmicWarning** (`errno` :math:`9`) The rank of the model has changed during the weighted least squares iterations. The estimate for :math:`\beta` returned may be reasonable, but you should check how the deviance has changed during iterations. (`errno` :math:`10`) Degrees of freedom for error are :math:`0`. .. _g02gb-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` A generalized linear model with binomial errors consists of the following elements: (a) a set of :math:`n` observations, :math:`y_i`, from a binomial distribution: .. math:: \begin{pmatrix}t\\y\end{pmatrix}\pi^y\left(1-\pi \right)^{{t-y}}\text{.} (#) :math:`X`, a set of :math:`p` independent variables for each observation, :math:`x_1,x_2,\ldots,x_p`. (#) a linear model: .. math:: \eta = \sum \beta_jx_j\text{.} (#) a link between the linear predictor, :math:`\eta`, and the mean of the distribution, :math:`\mu = \pi t`, the link function, :math:`\eta = g\left(\mu \right)`. The possible link functions are: (i) logistic link: :math:`\eta = \log\left(\frac{\mu }{{t-\mu }}\right)`, (#) probit link: :math:`\eta = \Phi^{-1}\left(\frac{\mu }{t}\right)`, (#) complementary log-log link: :math:`\log\left(-\log\left(1-\frac{\mu }{t}\right)\right)\text{.}` (#) a measure of fit, the deviance: .. math:: \sum_{{i = 1}}^n\mathrm{dev}\left(y_i,\hat{\mu }_i\right) = \sum_{{i = 1}}^n2\left(y_i\log\left(\frac{y_i}{\hat{\mu }_i}\right)+\left(t_i-y_i\right)\log\left(\frac{\left(t_i-y_i\right)}{\left(t_i-\hat{\mu }_i\right)}\right)\right)\text{.} The linear parameters are estimated by iterative weighted least squares. An adjusted dependent variable, :math:`z`, is formed: .. math:: z = \eta +\left(y-\mu \right)\frac{{d\eta }}{{d\mu }} and a working weight, :math:`w`, .. math:: w^{-1} = \left(\tau \frac{{d\eta }}{{d\mu }}\right)^2\text{, where }\tau = \sqrt{\frac{t}{{\mu \left(t-\mu \right)}}}\text{.} At each iteration an approximation to the estimate of :math:`\beta`, :math:`\hat{\beta }`, is found by the weighted least squares regression of :math:`z` on :math:`X` with weights :math:`w`. ``glm_binomial`` finds a :math:`QR` decomposition of :math:`w^{{1/2}}X`, i.e., :math:`w^{{1/2}}X = QR` where :math:`R` is a :math:`p\times p` triangular matrix and :math:`Q` is an :math:`n\times p` column orthogonal matrix. If :math:`R` is of full rank, then :math:`\hat{\beta }` is the solution to .. math:: R\hat{\beta } = Q^\mathrm{T}w^{{1/2}}z\text{.} If :math:`R` is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of :math:`R`. .. math:: R = Q_*\begin{pmatrix}D&0\\0&0\end{pmatrix}P^\mathrm{T}\text{,} where :math:`D` is a :math:`k\times k` diagonal matrix with nonzero diagonal elements, :math:`k` being the rank of :math:`R` and :math:`w^{{1/2}}X`. This gives the solution .. math:: \hat{\beta } = P_1D^{-1}\begin{pmatrix}Q_*&0\\0&I\end{pmatrix}Q^\mathrm{T}w^{{1/2}}z\text{,} :math:`P_1` being the first :math:`k` columns of :math:`P`, i.e., :math:`P = \left(P_1P_0\right)`. The iterations are continued until there is only a small change in the deviance. The initial values for the algorithm are obtained by taking .. math:: \hat{\eta } = g\left(y\right)\text{.} The fit of the model can be assessed by examining and testing the deviance, in particular by comparing the difference in deviance between nested models, i.e., when one model is a sub-model of the other. The difference in deviance between two nested models has, asymptotically, a :math:`\chi^2`-distribution with degrees of freedom given by the difference in the degrees of freedom associated with the two deviances. The parameters estimates, :math:`\hat{\beta }`, are asymptotically Normally distributed with variance-covariance matrix :math:`C = R^{-1}{R^{-1}}^\mathrm{T}` in the full rank case, otherwise :math:`C = P_1D^{-2}P_1^\mathrm{T}`. The residuals and influence statistics can also be examined. The estimated linear predictor :math:`\hat{\eta } = X\hat{\beta }`, can be written as :math:`Hw^{{1/2}}z` for an :math:`n\times n` matrix :math:`H`. The :math:`i`\ th diagonal elements of :math:`H`, :math:`h_i`, give a measure of the influence of the :math:`i`\ th values of the independent variables on the fitted regression model. These are sometimes known as leverages. The fitted values are given by :math:`\hat{\mu } = g^{-1}\left(\hat{\eta }\right)`. ``glm_binomial`` also computes the deviance residuals, :math:`r`: .. math:: r_i = \mathrm{sign}\left(y_i-\hat{\mu }_i\right)\sqrt{\mathrm{dev}\left(y_i,\hat{\mu }_i\right)}\text{.} An option allows the use of prior weights in the model. In many linear regression models the first term is taken as a mean term or an intercept, i.e., :math:`x_{{i,1}} = 1`, for :math:`i = 1,2,\ldots,n`. This is provided as an option. Often only some of the possible independent variables are included in a model; the facility to select variables to be included in the model is provided. If part of the linear predictor can be represented by variables with a known coefficient then this can be included in the model by using an offset, :math:`o`: .. math:: \eta = o+\sum \beta_jx_j\text{.} If the model is not of full rank the solution given will be only one of the possible solutions. Other estimates may be obtained by applying constraints to the parameters. These solutions can be obtained by using :meth:`glm_constrain` after using ``glm_binomial``. Only certain linear combinations of the parameters will have unique estimates, these are known as estimable functions and can be estimated and tested using :meth:`glm_estfunc`. Details of the SVD are made available in the form of the matrix :math:`P^*`: .. math:: P^* = \begin{pmatrix}D^{-1} P_1^\mathrm{T} \\ P_0^\mathrm{T} \end{pmatrix}\text{.} .. _g02gb-py2-py-references: **References** Cook, R D and Weisberg, S, 1982, `Residuals and Influence in Regression`, Chapman and Hall Cox, D R, 1983, `Analysis of Binary Data`, Chapman and Hall McCullagh, P and Nelder, J A, 1983, `Generalized Linear Models`, Chapman and Hall See Also -------- :meth:`naginterfaces.library.examples.correg.glm_binomial_ex.main` """ raise NotImplementedError
[docs]def glm_poisson(x, isx, y, link='L', mean='M', wt=None, a=0.0, v=None, tol=0.0, maxit=10, iprint=0, eps=0.0, io_manager=None): r""" ``glm_poisson`` fits a generalized linear model with Poisson errors. .. _g02gc-py2-py-doc: For full information please refer to the NAG Library document for g02gc https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02gcf.html .. _g02gc-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` The matrix of all possible independent variables. :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}\textit{j}`\ th element of :math:`\mathrm{x}`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **isx** : int, array-like, shape :math:`\left(m\right)` Indicates which independent variables are to be included in the model. If :math:`\mathrm{isx}[j-1] > 0`, the variable contained in the :math:`j`\ th column of :math:`\mathrm{x}` is included in the regression model. **y** : float, array-like, shape :math:`\left(n\right)` :math:`y`, observations on the dependent variable. **link** : str, length 1, optional Indicates which link function is to be used. :math:`\mathrm{link} = \texttt{'E'}` An exponent link is used. :math:`\mathrm{link} = \texttt{'I'}` An identity link is used. :math:`\mathrm{link} = \texttt{'L'}` A log link is used. :math:`\mathrm{link} = \texttt{'S'}` A square root link is used. :math:`\mathrm{link} = \texttt{'R'}` A reciprocal link is used. **mean** : str, length 1, optional Indicates if a mean term is to be included. :math:`\mathrm{mean} = \texttt{'M'}` A mean term, intercept, will be included in the model. :math:`\mathrm{mean} = \texttt{'Z'}` The model will pass through the origin, zero-point. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If provided :math:`\mathrm{wt}` must contain the weights to be used with the model. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. If :math:`\mathrm{wt}` is not provided the effective number of observations is :math:`n`. **a** : float, optional If :math:`\mathrm{link} = \texttt{'E'}`, :math:`\mathrm{a}` must contain the power of the exponential. If :math:`\mathrm{link}\neq \texttt{'E'}`, :math:`\mathrm{a}` is not referenced. **v** : None or float, array-like, shape :math:`\left(n, \textit{ip}+7\right)`, optional If :math:`\mathrm{v}\text{ is not }\mathbf{None}`, :math:`\mathrm{v}[\textit{i}-1,6]`, for :math:`\textit{i} = 1,2,\ldots,n`, must contain the offset values :math:`o_{\textit{i}}`. All other values need not be set. **tol** : float, optional Indicates the accuracy required for the fit of the model. The iterative weighted least squares procedure is deemed to have converged if the absolute change in deviance between iterations is less than :math:`\mathrm{tol}\times \left(1.0+\text{Current Deviance}\right)`. This is approximately an absolute precision if the deviance is small and a relative precision if the deviance is large. If :math:`0.0\leq \mathrm{tol} < \text{machine precision}`, the function will use :math:`10\times \text{machine precision}` instead. **maxit** : int, optional The maximum number of iterations for the iterative weighted least squares. If :math:`\mathrm{maxit} = 0`, a default value of :math:`10` is used. **iprint** : int, optional Indicates if the printing of information on the iterations is required. :math:`\mathrm{iprint}\leq 0` There is no printing. :math:`\mathrm{iprint} > 0` Every :math:`\mathrm{iprint}` iteration, the following are printed: the deviance; the current estimates; and if the weighted least squares equations are singular, then this is indicated. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see :class:`~naginterfaces.base.utils.FileObjManager`). **eps** : float, optional The value of :math:`\mathrm{eps}` is used to decide if the independent variables are of full rank and, if not, what is the rank of the independent variables. The smaller the value of :math:`\mathrm{eps}` the stricter the criterion for selecting the singular value decomposition. If :math:`0.0\leq \mathrm{eps} < \text{machine precision}`, the function will use machine precision instead. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **dev** : float The deviance for the fitted model. **idf** : int The degrees of freedom asociated with the deviance for the fitted model. **b** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The estimates of the parameters of the generalized linear model, :math:`\hat{\beta }`. If :math:`\mathrm{mean} = \texttt{'M'}`, the first element of :math:`\mathrm{b}` will contain the estimate of the mean parameter and :math:`\mathrm{b}[i]` will contain the coefficient of the variable contained in column :math:`j` of :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. If :math:`\mathrm{mean} = \texttt{'Z'}`, :math:`\mathrm{b}[i-1]` will contain the coefficient of the variable contained in column :math:`j` of :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. **irank** : int The rank of the independent variables. If the model is of full rank, :math:`\mathrm{irank} = \textit{ip}`. If the model is not of full rank, :math:`\mathrm{irank}` is an estimate of the rank of the independent variables. :math:`\mathrm{irank}` is calculated as the number of singular values greater that :math:`\mathrm{eps}\times \text{}`\ (largest singular value). It is possible for the SVD to be carried out but for :math:`\mathrm{irank}` to be returned as :math:`\textit{ip}`. **se** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The standard errors of the linear parameters. :math:`\mathrm{se}[\textit{i}-1]` contains the standard error of the parameter estimate in :math:`\mathrm{b}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. **cov** : float, ndarray, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The upper triangular part of the variance-covariance matrix of the :math:`\textit{ip}` parameter estimates given in :math:`\mathrm{b}`. They are stored packed by column, i.e., the covariance between the parameter estimate given in :math:`\mathrm{b}[i-1]` and the parameter estimate given in :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[\left(j\times \left(j-1\right)/2+i\right)-1]`. **v** : float, ndarray, shape :math:`\left(n, \textit{ip}+7\right)` Auxiliary information on the fitted model. .. rst-class:: nag-rules-none nag-align-left +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,0]` |contains the linear predictor value, :math:`\eta_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,1]` |contains the fitted value, :math:`\hat{\mu }_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,2]` |contains the variance standardization, :math:`\frac{1}{\tau_{\textit{i}}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,3]` |contains the square root of the working weight, :math:`w_{\textit{i}}^{\frac{1}{2}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,4]` |contains the deviance residual, :math:`r_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,5]` |contains the leverage, :math:`h_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,6]` |contains the offset, :math:`o_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. If :math:`\mathrm{v}` is **None** on entry, all values will be zero.| +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,j-1]`|for :math:`j = 8,\ldots,\textit{ip}+7`, contains the results of the :math:`QR` decomposition or the singular value decomposition. | +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ If the model is not of full rank, i.e., :math:`\mathrm{irank} < \textit{ip}`, the first :math:`\textit{ip}` rows of columns :math:`8` to :math:`\textit{ip}+7` contain the :math:`P^*` matrix. .. _g02gc-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{eps} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{eps}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} \geq 0`. (`errno` :math:`1`) On entry, :math:`\textit{offset} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{offset} = \texttt{'Y'}` or :math:`\texttt{'N'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{link} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{link} = \texttt{'E'}`, :math:`\texttt{'I'}`, :math:`\texttt{'L'}`, :math:`\texttt{'S'}` or :math:`\texttt{'R'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{a} = 0.0` and :math:`\mathrm{link} = \texttt{'E'}`. Constraint: if :math:`\mathrm{link} = \texttt{'E'}`, :math:`\mathrm{a}\neq 0.0`. (`errno` :math:`1`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'W'}` or :math:`\texttt{'U'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] < 0.0`. Constraint: :math:`\mathrm{wt}[i-1] \geq 0.0`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`3`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] < 0`. Constraint: :math:`\mathrm{isx}[j-1] \geq 0.0`, for :math:`j = 1,2,\ldots,m`. (`errno` :math:`3`) On entry, :math:`\textit{ip}` incompatible with number of nonzero values of :math:`\mathrm{isx}`: :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`3`) Number of requested x-variables greater than :math:`\textit{n}`. (`errno` :math:`4`) On entry, :math:`\mathrm{y}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{y}[i-1] \geq 0.0`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`5`) A fitted value is at the boundary, i.e., :math:`\hat{\mu } = 0.0`. This may occur if there are :math:`y` values of :math:`0.0` and the model is too complex for the data. The model should be reformulated with, perhaps, some observations dropped. (`errno` :math:`6`) SVD solution failed to converge. (`errno` :math:`7`) The iterative weighted least squares has failed to converge in :math:`\mathrm{maxit}` (or default :math:`10`) iterations. The value of :math:`\mathrm{maxit}` could be increased but it may be advantageous to examine the convergence using the :math:`\mathrm{iprint}` option. This may indicate that the convergence is slow because the solution is at a boundary in which case it may be better to reformulate the model. **Warns** **NagAlgorithmicWarning** (`errno` :math:`8`) The rank of the model has changed during the weighted least squares iterations. The estimate for :math:`\beta` returned may be reasonable, but you should check how the deviance has changed during iterations. (`errno` :math:`9`) The degrees of freedom for error are :math:`0`. A saturated model has been fitted. .. _g02gc-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` A generalized linear model with Poisson errors consists of the following elements: (a) a set of :math:`n` observations, :math:`y_i`, from a Poisson distribution: .. math:: \frac{{\mu^ye^{{-\mu }}}}{{y!}}\text{.} (#) :math:`X`, a set of :math:`p` independent variables for each observation, :math:`x_1,x_2,\ldots,x_p`. (#) a linear model: .. math:: \eta = \sum \beta_jx_j\text{.} (#) a link between the linear predictor, :math:`\eta`, and the mean of the distribution, :math:`\mu`, :math:`\eta = g\left(\mu \right)`. The possible link functions are: (i) exponent link: :math:`\eta = \mu^a`, for a constant :math:`a`, (#) identity link: :math:`\eta = \mu`, (#) log link: :math:`\eta = \log\left(\mu \right)`, (#) square root link: :math:`\eta = \sqrt{\mu }`, (#) reciprocal link: :math:`\eta = \frac{1}{\mu }`. (#) a measure of fit, the deviance: .. math:: \sum_{{i = 1}}^n\mathrm{dev}\left(y_i,\hat{\mu }_i\right) = \sum_{{i = 1}}^n2\left(y_i\log\left(\frac{y_i}{\hat{\mu }_i}\right)-\left(y_i-\hat{\mu }_i\right)\right)\text{.} The linear parameters are estimated by iterative weighted least squares. An adjusted dependent variable, :math:`z`, is formed: .. math:: z = \eta +\left(y-\mu \right)\frac{{\mathrm{d}\eta }}{{\mathrm{d}\mu }} and a working weight, :math:`w`, .. math:: w^{-1} = \left(\tau d\frac{{\mathrm{d}\eta }}{{\mathrm{d}\mu }}\right)^2\text{,} where :math:`\tau = \sqrt{\mu }`. At each iteration an approximation to the estimate of :math:`\beta`, :math:`\hat{\beta }`, is found by the weighted least squares regression of :math:`z` on :math:`X` with weights :math:`w`. ``glm_poisson`` finds a :math:`QR` decomposition of :math:`w^{{1/2}}X`, i.e., :math:`w^{{1/2}}X = QR` where :math:`R` is a :math:`p\times p` triangular matrix and :math:`Q` is an :math:`n\times p` column orthogonal matrix. If :math:`R` is of full rank, then :math:`\hat{\beta }` is the solution to: .. math:: R\hat{\beta } = Q^\mathrm{T}w^{{1/2}}z\text{.} If :math:`R` is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of :math:`R`. .. math:: R = Q_*\begin{pmatrix}D&0\\0&0\end{pmatrix}P^\mathrm{T}\text{,} where :math:`D` is a :math:`k\times k` diagonal matrix with nonzero diagonal elements, :math:`k` being the rank of :math:`R` and :math:`w^{{1/2}}X`. This gives the solution .. math:: \hat{\beta } = P_1D^{-1}\begin{pmatrix}Q_*&0\\0&I\end{pmatrix}Q^\mathrm{T}w^{{1/2}}z\text{,} :math:`P_1` being the first :math:`k` columns of :math:`P`, i.e., :math:`P = \left(P_1P_0\right)`. The iterations are continued until there is only a small change in the deviance. The initial values for the algorithm are obtained by taking .. math:: \hat{\eta } = g\left(y\right)\text{.} The fit of the model can be assessed by examining and testing the deviance, in particular by comparing the difference in deviance between nested models, i.e., when one model is a sub-model of the other. The difference in deviance between two nested models has, asymptotically, a :math:`\chi^2`-distribution with degrees of freedom given by the difference in the degrees of freedom associated with the two deviances. The parameters estimates, :math:`\hat{\beta }`, are asymptotically Normally distributed with variance-covariance matrix :math:`C = R^{-1}{R^{-1}}^\mathrm{T}` in the full rank case, otherwise :math:`C = P_1D^{-2}P_1^\mathrm{T}`. The residuals and influence statistics can also be examined. The estimated linear predictor :math:`\hat{\eta } = X\hat{\beta }`, can be written as :math:`Hw^{{1/2}}z` for an :math:`n\times n` matrix :math:`H`. The :math:`i`\ th diagonal elements of :math:`H`, :math:`h_i`, give a measure of the influence of the :math:`i`\ th values of the independent variables on the fitted regression model. These are known as leverages. The fitted values are given by :math:`\hat{\mu } = g^{-1}\left(\hat{\eta }\right)`. ``glm_poisson`` also computes the deviance residuals, :math:`r`: .. math:: r_i = \mathrm{sign}\left(y_i-\hat{\mu }_i\right)\sqrt{\mathrm{dev}\left(y_i,\hat{\mu }_i\right)}\text{.} An option allows prior weights to be used with the model. In many linear regression models the first term is taken as a mean term or an intercept, i.e., :math:`x_{{i,1}} = 1`, for :math:`i = 1,2,\ldots,n`. This is provided as an option. Often only some of the possible independent variables are included in a model; the facility to select variables to be included in the model is provided. If part of the linear predictor can be represented by a variables with a known coefficient then this can be included in the model by using an offset, :math:`o`: .. math:: \eta = o+\sum \beta_jx_j\text{.} If the model is not of full rank the solution given will be only one of the possible solutions. Other estimates may be obtained by applying constraints to the parameters. These solutions can be obtained by using :meth:`glm_constrain` after using ``glm_poisson``. Only certain linear combinations of the parameters will have unique estimates, these are known as estimable functions, these can be estimated and tested using :meth:`glm_estfunc`. Details of the SVD are made available in the form of the matrix :math:`P^*`: .. math:: P^* = \begin{pmatrix}D^{-1} P_1^\mathrm{T} \\ P_0^\mathrm{T} \end{pmatrix}\text{.} The generalized linear model with Poisson errors can be used to model contingency table data; see Cook and Weisberg (1982) and McCullagh and Nelder (1983). .. _g02gc-py2-py-references: **References** Cook, R D and Weisberg, S, 1982, `Residuals and Influence in Regression`, Chapman and Hall McCullagh, P and Nelder, J A, 1983, `Generalized Linear Models`, Chapman and Hall Plackett, R L, 1974, `The Analysis of Categorical Data`, Griffin See Also -------- :meth:`naginterfaces.library.examples.surviv.coxmodel_ex.main` """ raise NotImplementedError
[docs]def glm_gamma(x, isx, y, link='R', mean='M', wt=None, s=0, a=0.0, v=None, tol=0.0, maxit=10, iprint=0, eps=0.0, io_manager=None): r""" ``glm_gamma`` fits a generalized linear model with gamma errors. .. _g02gd-py2-py-doc: For full information please refer to the NAG Library document for g02gd https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02gdf.html .. _g02gd-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th independent variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **isx** : int, array-like, shape :math:`\left(m\right)` Indicates which independent variables are to be included in the model. If :math:`\mathrm{isx}[j-1] > 0`, the variable contained in the :math:`j`\ th column of :math:`\mathrm{x}` is included in the regression model. **y** : float, array-like, shape :math:`\left(n\right)` :math:`y`, the dependent variable. **link** : str, length 1, optional Indicates which link function is to be used. :math:`\mathrm{link} = \texttt{'E'}` An exponential link is used. :math:`\mathrm{link} = \texttt{'I'}` An identity link is used. :math:`\mathrm{link} = \texttt{'L'}` A log link is used. :math:`\mathrm{link} = \texttt{'S'}` A square root link is used. :math:`\mathrm{link} = \texttt{'R'}` A reciprocal link is used. **mean** : str, length 1, optional Indicates if a mean term is to be included. :math:`\mathrm{mean} = \texttt{'M'}` A mean term, intercept, will be included in the model. :math:`\mathrm{mean} = \texttt{'Z'}` The model will pass through the origin, zero-point. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If provided :math:`\mathrm{wt}` must contain the weights to be used with the model. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. If :math:`\mathrm{wt}` is not provided the effective number of observations is :math:`n`. **s** : float, optional The scale parameter for the gamma model, :math:`\nu^{-1}`. :math:`\mathrm{s} = 0.0` The scale parameter is estimated with the function using the formula described in :ref:`Notes <g02gd-py2-py-notes>`. **a** : float, optional If :math:`\mathrm{link} = \texttt{'E'}`, :math:`\mathrm{a}` must contain the power of the exponential. If :math:`\mathrm{link} \neq \texttt{'E'}`, :math:`\mathrm{a}` is not referenced. **v** : None or float, array-like, shape :math:`\left(n, \textit{ip}+7\right)`, optional The offset values :math:`o_{\textit{i}}`, otherwise :math:`\mathrm{v}` need not be set and no offset is used. **tol** : float, optional Indicates the accuracy required for the fit of the model. The iterative weighted least squares procedure is deemed to have converged if the absolute change in deviance between iterations is less than :math:`\mathrm{tol}\times \left(1.0+\text{Current Deviance}\right)`. This is approximately an absolute precision if the deviance is small and a relative precision if the deviance is large. If :math:`0.0\leq \mathrm{tol} < \text{machine precision}` then the function will use :math:`10\times \text{machine precision}` instead. **maxit** : int, optional The maximum number of iterations for the iterative weighted least squares. :math:`\mathrm{maxit} = 0` A default value of :math:`10` is used. **iprint** : int, optional Indicates if the printing of information on the iterations is required. :math:`\mathrm{iprint}\leq 0` There is no printing. :math:`\mathrm{iprint} > 0` Every :math:`\mathrm{iprint}` iteration, the following are printed: the deviance; the current estimates; and if the weighted least squares equations are singular, then this is indicated. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see :class:`~naginterfaces.base.utils.FileObjManager`). **eps** : float, optional The value of :math:`\mathrm{eps}` is used to decide if the independent variables are of full rank and, if not, what is the rank of the independent variables. The smaller the value of :math:`\mathrm{eps}` the stricter the criterion for selecting the singular value decomposition. If :math:`0.0\leq \mathrm{eps} < \text{machine precision}`, the function will use machine precision instead. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **s** : float If on input :math:`\mathrm{s} = 0.0`, :math:`\mathrm{s}` contains the estimated value of the scale parameter, :math:`\hat{\nu }^{-1}`. If on input :math:`\mathrm{s}\neq 0.0`, :math:`\mathrm{s}` is unchanged on exit. **dev** : float The adjusted deviance for the fitted model. **idf** : int The degrees of freedom asociated with the deviance for the fitted model. **b** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The estimates of the parameters of the generalized linear model, :math:`\hat{\beta }`. If :math:`\mathrm{mean} = \texttt{'M'}`, the first element of :math:`\mathrm{b}` will contain the estimate of the mean parameter and :math:`\mathrm{b}[i]` will contain the coefficient of the variable contained in column :math:`j` of :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. If :math:`\mathrm{mean} = \texttt{'Z'}`, :math:`\mathrm{b}[i-1]` will contain the coefficient of the variable contained in column :math:`j` of :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. **irank** : int The rank of the independent variables. If the model is of full rank, :math:`\mathrm{irank} = \textit{ip}`. If the model is not of full rank, :math:`\mathrm{irank}` is an estimate of the rank of the independent variables. :math:`\mathrm{irank}` is calculated as the number of singular values greater that :math:`\mathrm{eps}\times \text{}`\ (largest singular value). It is possible for the SVD to be carried out but for :math:`\mathrm{irank}` to be returned as :math:`\textit{ip}`. **se** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The standard errors of the linear parameters. :math:`\mathrm{se}[\textit{i}-1]` contains the standard error of the parameter estimate in :math:`\mathrm{b}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. **cov** : float, ndarray, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The upper triangular part of the variance-covariance matrix of the :math:`\textit{ip}` parameter estimates given in :math:`\mathrm{b}`. They are stored in packed form by column, i.e., the covariance between the parameter estimate given in :math:`\mathrm{b}[i-1]` and the parameter estimate given in :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[\left(j\times \left(j-1\right)/2+i\right)-1]`. **v** : float, ndarray, shape :math:`\left(n, \textit{ip}+7\right)` Auxiliary information on the fitted model. .. rst-class:: nag-rules-none nag-align-left +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,0]` |contains the linear predictor value, :math:`\eta_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,1]` |contains the fitted value, :math:`\hat{\mu }_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,2]` |contains the variance standardization, :math:`\frac{1}{\tau_{\textit{i}}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,3]` |contains the square root of the working weight, :math:`w_{\textit{i}}^{\frac{1}{2}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,4]` |contains the Anscombe residual, :math:`r_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,5]` |contains the leverage, :math:`h_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,6]` |contains the offset, :math:`o_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. If :math:`\mathrm{v}\text{ is }\mathbf{None}`, all values will be zero.| +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`\mathrm{v}[i-1,j-1]`|for :math:`j = 8,\ldots,\textit{ip}+7`, contains the results of the :math:`QR` decomposition or the singular value decomposition. | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ If the model is not of full rank, i.e., :math:`\mathrm{irank} < \textit{ip}`, the first :math:`\textit{ip}` rows of columns :math:`8` to :math:`\textit{ip}+7` contain the :math:`P^*` matrix. .. _g02gd-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{eps} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{eps}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} \geq 0`. (`errno` :math:`1`) On entry, :math:`\textit{offset} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{offset} = \texttt{'Y'}` or :math:`\texttt{'N'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{link} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{link} = \texttt{'E'}`, :math:`\texttt{'I'}`, :math:`\texttt{'L'}`, :math:`\texttt{'S'}` or :math:`\texttt{'R'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{a} = 0.0` and :math:`\mathrm{link} = \texttt{'E'}`. Constraint: if :math:`\mathrm{link} = \texttt{'E'}`, :math:`\mathrm{a}\neq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{s} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{s}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'W'}` or :math:`\texttt{'U'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] < 0.0`. Constraint: :math:`\mathrm{wt}[i-1] \geq 0.0`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`3`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] < 0`. Constraint: :math:`\mathrm{isx}[j-1] \geq 0.0`, for :math:`j = 1,2,\ldots,m`. (`errno` :math:`3`) On entry, :math:`\textit{ip}` incompatible with number of nonzero values of :math:`\mathrm{isx}`: :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`3`) Number of requested x-variables greater than :math:`\textit{n}`. (`errno` :math:`4`) On entry, :math:`\mathrm{y}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{y}[i-1] \geq 0.0`, for :math:`i = 1,2,\ldots,n`. (`errno` :math:`5`) A fitted value is at the boundary, i.e., :math:`\hat{\mu } = 0.0`. This may occur if there are :math:`y` values of :math:`0.0` and the model is too complex for the data. The model should be reformulated with, perhaps, some observations dropped. (`errno` :math:`6`) SVD solution failed to converge. (`errno` :math:`7`) The iterative weighted least squares has failed to converge in :math:`\mathrm{maxit}` (or default :math:`10`) iterations. The value of :math:`\mathrm{maxit}` could be increased but it may be advantageous to examine the convergence using the :math:`\mathrm{iprint}` option. This may indicate that the convergence is slow because the solution is at a boundary in which case it may be better to reformulate the model. **Warns** **NagAlgorithmicWarning** (`errno` :math:`8`) The rank of the model has changed during the weighted least squares iterations. The estimate for :math:`\beta` returned may be reasonable, but you should check how the deviance has changed during iterations. (`errno` :math:`9`) The degrees of freedom for error are :math:`0`. A saturated model has been fitted. .. _g02gd-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` A generalized linear model with gamma errors consists of the following elements: (a) a set of :math:`n` observations, :math:`y_i`, from a gamma distribution with probability density function: .. math:: \frac{1}{{\Gamma \left(\nu \right)}}\left(\frac{{\nu y}}{\mu }\right)^{\nu }\mathrm{exp}\left(-\frac{{\nu y}}{\mu }\right)\frac{1}{y} :math:`\nu` being constant for the sample. (#) :math:`X`, a set of :math:`p` independent variables for each observation, :math:`x_1,x_2,\ldots,x_p`. (#) a linear model: .. math:: \eta = \sum \beta_jx_j\text{.} (#) a link between the linear predictor, :math:`\eta`, and the mean of the distribution, :math:`\mu`, :math:`\eta = g\left(\mu \right)`. The possible link functions are: (i) exponent link: :math:`\eta = \mu^a`, for a constant :math:`a`, (#) identity link: :math:`\eta = \mu`, (#) log link: :math:`\eta = \log\left(\mu \right)`, (#) square root link: :math:`\eta = \sqrt{\mu }`, (#) reciprocal link: :math:`\eta = \frac{1}{\mu }`. (#) a measure of fit, an adjusted deviance. This is a function related to the deviance, but defined for :math:`y = 0`: .. math:: \sum_{{i = 1}}^n\mathrm{dev}^*\left(y_i,\hat{\mu }_i\right) = \sum_{{i = 1}}^n2\left(\log\left(\hat{\mu }_i\right)+\left(\frac{y_i}{\hat{\mu }_i}\right)\right)\text{.} The linear parameters are estimated by iterative weighted least squares. An adjusted dependent variable, :math:`z`, is formed: .. math:: z = \eta +\left(y-\mu \right)\frac{{d\eta }}{{d\mu }} and a working weight, :math:`w`, .. math:: w^{-1} = \left(\tau \frac{{d\eta }}{{d\mu }}\right)^2\text{, where }\quad \tau = \frac{1}{\mu }\text{.} At each iteration an approximation to the estimate of :math:`\beta`, :math:`\hat{\beta }` is found by the weighted least squares regression of :math:`z` on :math:`X` with weights :math:`w`. ``glm_gamma`` finds a :math:`QR` decomposition of :math:`w^{{\frac{1}{2}}}X`, i.e., :math:`w^{{\frac{1}{2}}}X = QR` where :math:`R` is a :math:`p\times p` triangular matrix and :math:`Q` is an :math:`n\times p` column orthogonal matrix. If :math:`R` is of full rank then :math:`\hat{\beta }` is the solution to: :math:`R\hat{\beta } = Q^\mathrm{T}w^{{\frac{1}{2}}}z` If :math:`R` is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of :math:`R`. .. math:: R = Q_*\begin{pmatrix}D&0\\0&0\end{pmatrix}P^\mathrm{T}\text{.} where :math:`D` is a :math:`k\times k` diagonal matrix with nonzero diagonal elements, :math:`k` being the rank of :math:`R` and :math:`w^{{\frac{1}{2}}}X`. This gives the solution .. math:: \hat{\beta } = P_1D^{-1}\begin{pmatrix}Q_*&0\\0&I\end{pmatrix}Q^\mathrm{T}w^{{\frac{1}{2}}}z\text{,} where :math:`P_1` is the first :math:`k` columns of :math:`P`, i.e., :math:`P = \left(P_1P_0\right)`. The iterations are continued until there is only a small change in the deviance. The initial values for the algorithm are obtained by taking .. math:: \hat{\eta } = g\left(y\right)\text{.} The scale parameter, :math:`\nu^{-1}` is estimated by a moment estimator: .. math:: \hat{\nu }^{-1} = \sum_{{i = 1}}^n\frac{{{\left[\left(y_i-\hat{\mu }_i\right)/\hat{\mu }_i\right]}^2}}{\left(n-k\right)}\text{.} The fit of the model can be assessed by examining and testing the deviance, in particular, by comparing the difference in deviance between nested models, i.e., when one model is a sub-model of the other. The difference in deviance or adjusted deviance between two nested models with known :math:`\nu` has, asymptotically, a :math:`\chi^2`-distribution with degrees of freedom given by the difference in the degrees of freedom associated with the two deviances. The parameters estimates, :math:`\hat{\beta }`, are asymptotically Normally distributed with variance-covariance matrix: :math:`C = R^{-1}{R^{-1}}^\mathrm{T}\nu^{-1}` in the full rank case, otherwise :math:`C = P_1D^{-2}P_1^\mathrm{T}\nu^{-1}`. The residuals and influence statistics can also be examined. The estimated linear predictor :math:`\hat{\eta } = X\hat{\beta }`, can be written as :math:`Hw^{{\frac{1}{2}}}z` for an :math:`n\times n` matrix :math:`H`. The :math:`i`\ th diagonal elements of :math:`H`, :math:`h_i`, give a measure of the influence of the :math:`i`\ th values of the independent variables on the fitted regression model. These are known as leverages. The fitted values are given by :math:`\hat{\mu } = g^{-1}\left(\hat{\eta }\right)`. ``glm_gamma`` also computes the Anscombe residuals, :math:`r`: .. math:: r_i = \frac{{3\left(y_i^{{\frac{1}{3}}}-\hat{\mu }_i^{{\frac{1}{3}}}\right)}}{{\hat{\mu }_i^{{\frac{1}{3}}}}}\text{.} An option allows the use of prior weights, :math:`\omega_i`. This gives a model with: .. math:: \nu_i = \nu \omega_i\text{.} In many linear regression models the first term is taken as a mean term or an intercept, i.e., :math:`x_{{i,1}} = 1`, for :math:`i = 1,2,\ldots,n`. This is provided as an option. Often only some of the possible independent variables are included in a model, the facility to select variables to be included in the model is provided. If part of the linear predictor can be represented by a variables with a known coefficient then this can be included in the model by using an offset, :math:`o`: .. math:: \eta = o+\sum \beta_jx_j\text{.} If the model is not of full rank the solution given will be only one of the possible solutions. Other estimates may be obtained by applying constraints to the parameters. These solutions can be obtained by using :meth:`glm_constrain` after using ``glm_gamma``. Only certain linear combinations of the parameters will have unique estimates, these are known as estimable functions, and can be estimated and tested using :meth:`glm_estfunc`. Details of the SVD are made available in the form of the matrix :math:`P^*`: .. math:: P^* = \begin{pmatrix} D^{-1} P_1^\mathrm{T} \\ P_0^\mathrm{T} \end{pmatrix}\text{.} .. _g02gd-py2-py-references: **References** Cook, R D and Weisberg, S, 1982, `Residuals and Influence in Regression`, Chapman and Hall McCullagh, P and Nelder, J A, 1983, `Generalized Linear Models`, Chapman and Hall """ raise NotImplementedError
[docs]def glm_constrain(v, c, b, s): r""" ``glm_constrain`` calculates the estimates of the parameters of a generalized linear model for given constraints from the singular value decomposition results. .. _g02gk-py2-py-doc: For full information please refer to the NAG Library document for g02gk https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02gkf.html .. _g02gk-py2-py-parameters: **Parameters** **v** : float, array-like, shape :math:`\left(\textit{ip}, \textit{ip}+7\right)` The array :math:`\mathrm{v}` as returned by :meth:`glm_normal`, :meth:`glm_binomial`, :meth:`glm_poisson` or :meth:`glm_gamma`. **c** : float, array-like, shape :math:`\left(\textit{ip}, \textit{iconst}\right)` Contains the :math:`\textit{iconst}` constraints stored by column, i.e., the :math:`i`\ th constraint is stored in the :math:`i`\ th column of :math:`\mathrm{c}`. **b** : float, array-like, shape :math:`\left(\textit{ip}\right)` The parameter estimates computed by using the singular value decomposition, :math:`\hat{\beta }_{\text{svd}}`. **s** : float The estimate of the scale parameter. For results from :meth:`glm_normal` and :meth:`glm_gamma` then :math:`\mathrm{s}` is the scale parameter for the model. For results from :meth:`glm_binomial` and :meth:`glm_poisson` then :math:`\mathrm{s}` should be set to :math:`1.0`. **Returns** **b** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The parameter estimates of the parameters with the constraints imposed, :math:`\hat{\beta }_{\mathrm{c}}`. **se** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The standard error of the parameter estimates in :math:`\mathrm{b}`. **cov** : float, ndarray, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The upper triangular part of the variance-covariance matrix of the :math:`\textit{ip}` parameter estimates given in :math:`\mathrm{b}`. They are stored packed by column, i.e., the covariance between the parameter estimate given in :math:`\mathrm{b}[i-1]` and the parameter estimate given in :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[\left(j\times \left(j-1\right)/2+i\right)-1]`. .. _g02gk-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{iconst} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{iconst} > 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{s} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{s} > 0.0`. (`errno` :math:`1`) On entry, :math:`\textit{iconst} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{iconst} < \textit{ip}`. (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`2`) :math:`\mathrm{c}` does not give a model of full rank. .. _g02gk-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` ``glm_constrain`` computes the estimates given a set of linear constraints for a generalized linear model which is not of full rank. It is intended for use after a call to :meth:`glm_normal`, :meth:`glm_binomial`, :meth:`glm_poisson` or :meth:`glm_gamma`. In the case of a model not of full rank the functions use a singular value decomposition to find the parameter estimates, :math:`\hat{\beta }_{\text{svd}}`, and their variance-covariance matrix. Details of the SVD are made available in the form of the matrix :math:`P^*`: .. math:: P^* = \begin{pmatrix}D^{-1} P_1^\mathrm{T} \\ P_0^\mathrm{T} \end{pmatrix} as described by :meth:`glm_normal`, :meth:`glm_binomial`, :meth:`glm_poisson` and :meth:`glm_gamma`. Alternative solutions can be formed by imposing constraints on the parameters. If there are :math:`p` parameters and the rank of the model is :math:`k` then :math:`n_{\mathrm{c}} = p-k` constraints will have to be imposed to obtain a unique solution. Let :math:`C` be a :math:`p\times n_{\mathrm{c}}` matrix of constraints, such that .. math:: C^\mathrm{T}\beta = 0\text{,} then the new parameter estimates :math:`\hat{\beta }_{\mathrm{c}}` are given by: .. math:: \begin{array}{ll}\hat{\beta }_{\mathrm{c}}& = A\hat{\beta }_{\mathrm{svd}}\\& = \left(I-P_0\left(C^\mathrm{T}P_0\right)^{-1}\right)\hat{\beta }_{\mathrm{svd}}\text{, where }I\text{ is the identity matrix,}\end{array} and the variance-covariance matrix is given by .. math:: AP_1D^{-2}P_1^\mathrm{T}A^\mathrm{T} provided :math:`\left(C^\mathrm{T}P_0\right)^{-1}` exists. .. _g02gk-py2-py-references: **References** Golub, G H and Van Loan, C F, 1996, `Matrix Computations`, (3rd Edition), Johns Hopkins University Press, Baltimore McCullagh, P and Nelder, J A, 1983, `Generalized Linear Models`, Chapman and Hall Searle, S R, 1971, `Linear Models`, Wiley """ raise NotImplementedError
[docs]def glm_estfunc(irank, b, cov, v, f, tol=0.0): r""" ``glm_estfunc`` gives the estimate of an estimable function along with its standard error from the results from fitting a generalized linear model. .. _g02gn-py2-py-doc: For full information please refer to the NAG Library document for g02gn https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02gnf.html .. _g02gn-py2-py-parameters: **Parameters** **irank** : int :math:`k`, the rank of the dependent variables. **b** : float, array-like, shape :math:`\left(\textit{ip}\right)` The :math:`\textit{ip}` values of the estimates of the parameters of the model, :math:`\hat{\beta }`. **cov** : float, array-like, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The upper triangular part of the variance-covariance matrix of the :math:`\textit{ip}` parameter estimates given in :math:`\mathrm{b}`. They are stored packed by column, i.e., the covariance between the parameter estimate given in :math:`\mathrm{b}[i-1]` and the parameter estimate given in :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[\left(j\times \left(j-1\right)/2+i\right)-1]`. **v** : float, array-like, shape :math:`\left(\textit{ip}, \textit{ip}+7\right)` As returned by :meth:`glm_normal`, :meth:`glm_binomial`, :meth:`glm_poisson` and :meth:`glm_gamma`. **f** : float, array-like, shape :math:`\left(\textit{ip}\right)` :math:`f`, the linear function to be estimated. **tol** : float, optional The tolerance value used in the check for estimability, :math:`\eta`. If :math:`\mathrm{tol}\leq 0.0` then :math:`\sqrt{\epsilon }`, where :math:`\epsilon` is the machine precision, is used instead. **Returns** **est** : bool Indicates if the function was estimable. :math:`\mathrm{est} = \mathbf{True}` The function is estimable. :math:`\mathrm{est} = \mathbf{False}` The function is not estimable and :math:`\mathrm{stat}`, :math:`\mathrm{sestat}` and :math:`\mathrm{z}` are not set. **stat** : float If :math:`\mathrm{est} = \mathbf{True}`, :math:`\mathrm{stat}` contains the estimate of the function, :math:`f^\mathrm{T}\hat{\beta }` **sestat** : float If :math:`\mathrm{est} = \mathbf{True}`, :math:`\mathrm{sestat}` contains the standard error of the estimate of the function, :math:`\mathrm{se}\left(F\right)`. **z** : float If :math:`\mathrm{est} = \mathbf{True}`, :math:`\mathrm{z}` contains the :math:`z` statistic for the test of the function being equal to zero. .. _g02gn-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{irank} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{irank} \leq \textit{ip}`. (`errno` :math:`1`) On entry, :math:`\mathrm{irank} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{irank} \geq 1`. (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`2`) :math:`\mathrm{irank} = \textit{ip}`. In this case :math:`\mathrm{est}` is returned as true and all statistics are calculated. (`errno` :math:`3`) Standard error of statistic :math:`\text{} = 0.0`; this may be due to rounding errors if the standard error is very small or due to mis-specified inputs :math:`\mathrm{cov}` and :math:`\mathrm{f}`. .. _g02gn-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` ``glm_estfunc`` computes the estimates of an estimable function for a generalized linear model which is not of full rank. It is intended for use after a call to :meth:`glm_normal`, :meth:`glm_binomial`, :meth:`glm_poisson` or :meth:`glm_gamma`. An estimable function is a linear combination of the parameters such that it has a unique estimate. For a full rank model all linear combinations of parameters are estimable. In the case of a model not of full rank the functions use a singular value decomposition (SVD) to find the parameter estimates, :math:`\hat{\beta }`, and their variance-covariance matrix. Given the upper triangular matrix :math:`R` obtained from the :math:`QR` decomposition of the independent variables the SVD gives .. math:: R = Q_*\begin{pmatrix}D&0\\0&0\end{pmatrix}P^\mathrm{T}\text{,} where :math:`D` is a :math:`k\times k` diagonal matrix with nonzero diagonal elements, :math:`k` being the rank of :math:`R`, and :math:`Q_*` and :math:`P` are :math:`p\times p` orthogonal matrices. This leads to a solution: .. math:: \hat{\beta } = P_1D^{-1}Q_{*_1}^\mathrm{T}c_1\text{,} :math:`P_1` being the first :math:`k` columns of :math:`P`, i.e., :math:`P = \left(P_1P_0\right)`; :math:`Q_{*_1}` being the first :math:`k` columns of :math:`Q_*`, and :math:`c_1` being the first :math:`p` elements of :math:`c`. Details of the SVD are made available in the form of the matrix :math:`P^*`: .. math:: P^* = \begin{pmatrix}D^{-1} P_1^\mathrm{T} \\ P_0^\mathrm{T} \end{pmatrix} as described by :meth:`glm_normal`, :meth:`glm_binomial`, :meth:`glm_poisson` and :meth:`glm_gamma`. A linear function of the parameters, :math:`F = f^\mathrm{T}\beta`, can be tested to see if it is estimable by computing :math:`\zeta = P_0^\mathrm{T}f`. If :math:`\zeta` is zero, then the function is estimable, if not; the function is not estimable. In practice :math:`\left\lvert \zeta \right\rvert` is tested against some small quantity :math:`\eta`. Given that :math:`F` is estimable it can be estimated by :math:`f^\mathrm{T}\hat{\beta }` and its standard error calculated from the variance-covariance matrix of :math:`\hat{\beta }`, :math:`C_{\beta }`, as .. math:: \mathrm{se}\left(F\right) = \sqrt{f^\mathrm{T}C_{\beta }f}\text{.} Also a :math:`z` statistic .. math:: z = \frac{{f^\mathrm{T}\hat{\beta }}}{{\mathrm{se}\left(F\right)}}\text{,} can be computed. The distribution of :math:`z` will be approximately Normal. .. _g02gn-py2-py-references: **References** Golub, G H and Van Loan, C F, 1996, `Matrix Computations`, (3rd Edition), Johns Hopkins University Press, Baltimore McCullagh, P and Nelder, J A, 1983, `Generalized Linear Models`, Chapman and Hall Searle, S R, 1971, `Linear Models`, Wiley """ raise NotImplementedError
[docs]def glm_predict(errfn, x, isx, b, cov, vfobs, link=None, mean='M', t=None, off=None, wt=None, s=0.0, a=0.0): r""" ``glm_predict`` allows prediction from a generalized linear model fit via :meth:`glm_normal`, :meth:`glm_binomial`, :meth:`glm_poisson` or :meth:`glm_gamma` or a linear model fit via :meth:`linregm_fit`. .. _g02gp-py2-py-doc: For full information please refer to the NAG Library document for g02gp https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02gpf.html .. _g02gp-py2-py-parameters: **Parameters** **errfn** : str, length 1 Indicates the distribution used to model the dependent variable, :math:`y`. :math:`\mathrm{errfn} = \texttt{'B'}` The binomial distribution is used. :math:`\mathrm{errfn} = \texttt{'G'}` The gamma distribution is used. :math:`\mathrm{errfn} = \texttt{'N'}` The Normal (Gaussian) distribution is used. :math:`\mathrm{errfn} = \texttt{'P'}` The Poisson distribution is used. **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th independent variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **isx** : int, array-like, shape :math:`\left(m\right)` Indicates which independent variables are to be included in the model. If :math:`\mathrm{isx}[j-1] > 0`, the variable contained in the :math:`j`\ th column of :math:`\mathrm{x}` is included in the regression model. **b** : float, array-like, shape :math:`\left(\textit{ip}\right)` The model parameters, :math:`\beta`. If :math:`\mathrm{mean} = \texttt{'M'}`, :math:`\mathrm{b}[0]` must contain the mean parameter and :math:`\mathrm{b}[i]` the coefficient of the variable contained in the :math:`j`\ th independent :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. If :math:`\mathrm{mean} = \texttt{'Z'}`, :math:`\mathrm{b}[i-1]` must contain the coefficient of the variable contained in the :math:`j`\ th independent :math:`\mathrm{x}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th positive value in the array :math:`\mathrm{isx}`. **cov** : float, array-like, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The upper triangular part of the variance-covariance matrix, :math:`C`, of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters :math:`\beta_i` and :math:`\beta_j`, that is the values stored in :math:`\mathrm{b}[i-1]` and :math:`\mathrm{b}[j-1]`, should be supplied in :math:`\mathrm{cov}[\textit{j}\times \left(\textit{j}-1\right)/2+\textit{i}-1]`, for :math:`\textit{j} = \textit{i},\ldots,\textit{ip}`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. **vfobs** : bool If :math:`\mathrm{vfobs} = \mathbf{True}`, the variance of future observations is included in the standard error of the predicted variable (i.e., :math:`I_{\mathrm{fobs}} = 1`), otherwise :math:`I_{\mathrm{fobs}} = 0`. **link** : None or str, length 1, optional Note: if this argument is **None** then a default value will be used, determined as follows: if :math:`\mathrm{errfn} = \texttt{'B'}`: :math:`{ \texttt{'G'} }`; if :math:`\mathrm{errfn} = \texttt{'G'}`: :math:`{ \texttt{'R'} }`; if :math:`\mathrm{errfn} = \texttt{'P'}`: :math:`{ \texttt{'L'} }`; otherwise: :math:`{ \texttt{'I'} }`. Indicates which link function is to be used. :math:`\mathrm{link} = \texttt{'C'}` A complementary log-log link is used. :math:`\mathrm{link} = \texttt{'E'}` An exponent link is used. :math:`\mathrm{link} = \texttt{'G'}` A logistic link is used. :math:`\mathrm{link} = \texttt{'I'}` An identity link is used. :math:`\mathrm{link} = \texttt{'L'}` A log link is used. :math:`\mathrm{link} = \texttt{'P'}` A probit link is used. :math:`\mathrm{link} = \texttt{'R'}` A reciprocal link is used. :math:`\mathrm{link} = \texttt{'S'}` A square root link is used. Details on the functional form of the different links can be found in `the G02 Introduction <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02intro.html>`__. **mean** : str, length 1, optional Indicates if a mean term is to be included. :math:`\mathrm{mean} = \texttt{'M'}` A mean term, intercept, will be included in the model. :math:`\mathrm{mean} = \texttt{'Z'}` The model will pass through the origin, zero-point. **t** : None or float, array-like, shape :math:`\left(:\right)`, optional Note: the required length for this argument is determined as follows: if :math:`\mathrm{errfn}=\texttt{'B'}`: :math:`n`; otherwise: :math:`0`. If :math:`\mathrm{errfn} = \texttt{'B'}`, :math:`\mathrm{t}[i-1]` must contain the binomial denominator, :math:`t_i`, for the :math:`i`\ th observation. Otherwise :math:`\mathrm{t}` is not referenced and may be **None**. **off** : None or float, array-like, shape :math:`\left(:\right)`, optional Note: the required length for this argument is determined as follows: if :math:`\mathrm{off}\text{ is not }\mathbf{None}`: :math:`n`; otherwise: :math:`0`. If an offset is required, then :math:`\mathrm{off}[i-1]` must contain the value of the offset :math:`o_i`, for the :math:`i`\ th observation. Otherwise :math:`\mathrm{off}` must be **None**. **wt** : None or float, array-like, shape :math:`\left(:\right)`, optional Note: the required length for this argument is determined as follows: if :math:`\mathrm{wt}\text{ is not }\mathbf{None}\text{ and }\mathrm{vfobs}= \mathbf{True}`: :math:`n`; otherwise: :math:`0`. If weighted estimates are required then :math:`\mathrm{wt}[i-1]` must contain the weight, :math:`\omega_i` for the :math:`i`\ th observation. Otherwise :math:`\mathrm{wt}` must be supplied as **None**. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations is the number of observations with positive weights. If :math:`\mathrm{wt} = \text{}` **None**, the effective number of observations is :math:`n`. If the variance of future observations is not included in the standard error of the predicted variable, :math:`\mathrm{wt}` is not referenced. **s** : float, optional If :math:`\mathrm{errfn} = \texttt{'N'}` or :math:`\texttt{'G'}` and :math:`\mathrm{vfobs} = \mathbf{True}`, the scale parameter, :math:`\phi`. Otherwise :math:`\mathrm{s}` is not referenced and :math:`\phi = 1`. **a** : float, optional If :math:`\mathrm{link} = \texttt{'E'}`, :math:`\mathrm{a}` must contain the power of the exponential. If :math:`\mathrm{link} \neq \texttt{'E'}`, :math:`\mathrm{a}` is not referenced. **Returns** **eta** : float, ndarray, shape :math:`\left(n\right)` The linear predictor, :math:`\eta`. **seeta** : float, ndarray, shape :math:`\left(n\right)` The standard error of the linear predictor, :math:`\mathrm{se}\left(\eta \right)`. **pred** : float, ndarray, shape :math:`\left(n\right)` The predicted value, :math:`\hat{y}`. **sepred** : float, ndarray, shape :math:`\left(n\right)` The standard error of the predicted value, :math:`\mathrm{se}\left(\hat{y}\right)`. If :math:`\mathrm{pred}[i-1]` could not be calculated, ``glm_predict`` returns :math:`\mathrm{errno}` = 22, and :math:`\mathrm{sepred}[i-1]` is set to :math:`{-99.0}`. .. _g02gp-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{errfn} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{errfn} = \texttt{'B'}`, :math:`\texttt{'G'}`, :math:`\texttt{'N'}` or :math:`\texttt{'P'}`. (`errno` :math:`2`) On entry, :math:`\mathrm{link} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{errfn} = \texttt{'B'}`, :math:`\mathrm{link} = \texttt{'C'}`, :math:`\texttt{'G'}` or :math:`\texttt{'P'}`, otherwise, :math:`\mathrm{link} = \texttt{'E'}`, :math:`\texttt{'I'}`, :math:`\texttt{'L'}`, :math:`\texttt{'R'}` or :math:`\texttt{'S'}`. (`errno` :math:`2`) On entry, :math:`\mathrm{errfn} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{link} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{errfn} = \texttt{'B'}`, :math:`\mathrm{link} = \texttt{'C'}`, :math:`\texttt{'G'}` or :math:`\texttt{'P'}`, otherwise, :math:`\mathrm{link} = \texttt{'E'}`, :math:`\texttt{'I'}`, :math:`\texttt{'L'}`, :math:`\texttt{'R'}` or :math:`\texttt{'S'}`. (`errno` :math:`3`) On entry, :math:`\mathrm{mean} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mean} = \texttt{'M'}` or :math:`\texttt{'Z'}`. (`errno` :math:`5`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{vfobs} = \mathbf{True}`, :math:`\textit{weight} = \texttt{'U'}` or :math:`\texttt{'W'}`. (`errno` :math:`6`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 1`. (`errno` :math:`9`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`10`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] < 0`. Constraint: :math:`\mathrm{isx}[j-1] \geq 0.0`, for :math:`j = 1,2,\ldots,m`. (`errno` :math:`11`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} > 0`. (`errno` :math:`12`) On entry, :math:`\mathrm{t}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{t}[i-1] \geq 0.0`, for all :math:`i`. (`errno` :math:`14`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{wt}[i-1] \geq 0.0`, for all :math:`i`. (`errno` :math:`15`) On entry, :math:`\mathrm{s} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{s} > 0.0`. (`errno` :math:`16`) On entry, :math:`\mathrm{a} = 0.0`. Constraint: if :math:`\mathrm{link} = \texttt{'E'}`, :math:`\mathrm{a}\neq 0.0`. (`errno` :math:`18`) On entry, :math:`\mathrm{cov}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{cov}[i] \geq 0.0` for at least one diagonal element. **Warns** **NagAlgorithmicWarning** (`errno` :math:`22`) At least one predicted value could not be calculated as required. :math:`\mathrm{sepred}` is set to :math:`-99.0` for affected predicted values. .. _g02gp-py2-py-notes: **Notes** A generalized linear model consists of the following elements: (i) A suitable distribution for the dependent variable :math:`y`. (#) A linear model, with linear predictor :math:`\eta = X\beta`, where :math:`X` is a matrix of independent variables and :math:`\beta` a column vector of :math:`p` parameters. (#) A link function :math:`g\left(.\right)` between the expected value of :math:`y` and the linear predictor, that is :math:`E\left(y\right) = \mu = g\left(\eta \right)`. In order to predict from a generalized linear model, that is estimate a value for the dependent variable, :math:`y`, given a set of independent variables :math:`X`, the matrix :math:`X` must be supplied, along with values for the parameters :math:`\beta` and their associated variance-covariance matrix, :math:`C`. Suitable values for :math:`\beta` and :math:`C` are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example :meth:`glm_normal`, :meth:`glm_binomial`, :meth:`glm_poisson` or :meth:`glm_gamma`. The predicted variable, and its standard error can then be obtained from: .. math:: \hat{y} = g^{-1}\left(\eta \right)\text{, }\quad \mathrm{se}\left(\hat{y}\right) = \sqrt{{\left(\frac{{\delta g^{-1}\left(x\right)}}{{\delta x}}\right)}_{\eta }}\mathrm{se}\left(\eta \right)+I_{\mathrm{fobs}}\mathrm{Var}\left(y\right) where .. math:: \eta = o+X\beta \text{, }\quad \mathrm{se}\left(\eta \right) = \mathrm{diag}\left(\sqrt{XCX^\mathrm{T}}\right)\text{,} :math:`o` is a vector of offsets and :math:`I_{\mathrm{fobs}} = 0`, if the variance of future observations is not taken into account, and :math:`1` otherwise. Here :math:`\mathrm{diag}\left(A\right)` indicates the diagonal elements of matrix :math:`A`. If required, the variance for the :math:`i`\ th future observation, :math:`\mathrm{Var}\left(y_i\right)`, can be calculated as: .. math:: \mathrm{Var}\left(y_i\right) = \frac{{\phi V\left(\theta \right)}}{{w_i}} where :math:`w_i` is a weight, :math:`\phi` is the scale (or dispersion) parameter, and :math:`V\left(\theta \right)` is the variance function. Both the scale parameter and the variance function depend on the distribution used for the :math:`y`, with: .. rst-class:: nag-rules-none nag-align-left +--------+---------------------------------------------------------------------------------------------+ |Poisson |:math:`V\left(\theta \right) = \mu_i`, :math:`\phi = 1` | +--------+---------------------------------------------------------------------------------------------+ |binomial|:math:`V\left(\theta \right) = \frac{{\mu_i\left(t_i-\mu_i\right)}}{{t_i}}`, :math:`\phi = 1`| +--------+---------------------------------------------------------------------------------------------+ |Normal |:math:`V\left(\theta \right) = 1` | +--------+---------------------------------------------------------------------------------------------+ |gamma |:math:`V\left(\theta \right) = \mu_i^2` | +--------+---------------------------------------------------------------------------------------------+ In the cases of a Normal and gamma error structure, the scale parameter (:math:`\phi`), is supplied by you. This value is usually obtained from the function used to fit the prediction model. In many cases, for a Normal error structure, :math:`\phi = \hat{\sigma }^2`, i.e., the estimated variance. .. _g02gp-py2-py-references: **References** McCullagh, P and Nelder, J A, 1983, `Generalized Linear Models`, Chapman and Hall See Also -------- :meth:`naginterfaces.library.examples.correg.glm_binomial_ex.main` """ raise NotImplementedError
[docs]def robustm(indw, ipsi, isigma, indc, x, y, cpsi, h1, h2, h3, cucv, dchi, theta, sigma, tol=5e-5, maxit=50, nitmon=0, io_manager=None): r""" ``robustm`` performs bounded influence regression (:math:`M`-estimates). Several standard methods are available. .. _g02ha-py2-py-doc: For full information please refer to the NAG Library document for g02ha https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02haf.html .. _g02ha-py2-py-parameters: **Parameters** **indw** : int Specifies the type of regression to be performed. :math:`\mathrm{indw} < 0` Mallows type regression with Maronna's proposed weights. :math:`\mathrm{indw} = 0` Huber type regression. :math:`\mathrm{indw} > 0` Schweppe type regression with Krasker--Welsch weights. **ipsi** : int Specifies which :math:`\psi` function is to be used. :math:`\mathrm{ipsi} = 0` :math:`\psi \left(t\right) = t`, i.e., least squares. :math:`\mathrm{ipsi} = 1` Huber's function. :math:`\mathrm{ipsi} = 2` Hampel's piecewise linear function. :math:`\mathrm{ipsi} = 3` Andrew's sine wave. :math:`\mathrm{ipsi} = 4` Tukey's bi-weight. **isigma** : int Specifies how :math:`\sigma` is to be estimated. :math:`\mathrm{isigma} < 0` :math:`\sigma` is estimated by median absolute deviation of residuals. :math:`\mathrm{isigma} = 0` :math:`\sigma` is held constant at its initial value. :math:`\mathrm{isigma} > 0` :math:`\sigma` is estimated using the :math:`\chi` function. **indc** : int If :math:`\mathrm{indw} \neq 0`, :math:`\mathrm{indc}` specifies the approximations used in estimating the covariance matrix of :math:`\hat{\theta }`. :math:`\mathrm{indc} = 1` Averaging over residuals. :math:`\mathrm{indc} \neq 1` Replacing expected by observed. :math:`\mathrm{indw} = 0` :math:`\mathrm{indc}` is not referenced. **x** : float, array-like, shape :math:`\left(n, m\right)` The values of the :math:`X` matrix, i.e., the independent variables. :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}\textit{j}`\ th element of :math:`X`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. If :math:`\mathrm{indw} < 0`, then during calculations the elements of :math:`\mathrm{x}` will be transformed as described in :ref:`Notes <g02ha-py2-py-notes>`. Before exit the inverse transformation will be applied. As a result there may be slight differences between the input :math:`\mathrm{x}` and the output :math:`\mathrm{x}`. **y** : float, array-like, shape :math:`\left(n\right)` The data values of the dependent variable. :math:`\mathrm{y}[\textit{i}-1]` must contain the value of :math:`y` for the :math:`\textit{i}`\ th observation, for :math:`\textit{i} = 1,2,\ldots,n`. If :math:`\mathrm{indw} < 0`, then during calculations the elements of :math:`\mathrm{y}` will be transformed as described in :ref:`Notes <g02ha-py2-py-notes>`. Before exit the inverse transformation will be applied. As a result there may be slight differences between the input :math:`\mathrm{y}` and the output :math:`\mathrm{y}`. **cpsi** : float If :math:`\mathrm{ipsi} = 1`, :math:`\mathrm{cpsi}` must specify the parameter, :math:`c`, of Huber's :math:`\psi` function. If :math:`\mathrm{ipsi} \neq 1` on entry, :math:`\mathrm{cpsi}` is not referenced. **h1** : float If :math:`\mathrm{ipsi} = 2`, :math:`\mathrm{h1}`, :math:`\mathrm{h2}`, and :math:`\mathrm{h3}` must specify the parameters :math:`h_1`, :math:`h_2`, and :math:`h_3`, of Hampel's piecewise linear :math:`\psi` function. :math:`\mathrm{h1}`, :math:`\mathrm{h2}`, and :math:`\mathrm{h3}` are not referenced if :math:`\mathrm{ipsi}\neq 2`. **h2** : float If :math:`\mathrm{ipsi} = 2`, :math:`\mathrm{h1}`, :math:`\mathrm{h2}`, and :math:`\mathrm{h3}` must specify the parameters :math:`h_1`, :math:`h_2`, and :math:`h_3`, of Hampel's piecewise linear :math:`\psi` function. :math:`\mathrm{h1}`, :math:`\mathrm{h2}`, and :math:`\mathrm{h3}` are not referenced if :math:`\mathrm{ipsi}\neq 2`. **h3** : float If :math:`\mathrm{ipsi} = 2`, :math:`\mathrm{h1}`, :math:`\mathrm{h2}`, and :math:`\mathrm{h3}` must specify the parameters :math:`h_1`, :math:`h_2`, and :math:`h_3`, of Hampel's piecewise linear :math:`\psi` function. :math:`\mathrm{h1}`, :math:`\mathrm{h2}`, and :math:`\mathrm{h3}` are not referenced if :math:`\mathrm{ipsi}\neq 2`. **cucv** : float If :math:`\mathrm{indw} < 0`, must specify the value of the constant, :math:`c`, of the function :math:`u` for Maronna's proposed weights. If :math:`\mathrm{indw} > 0`, must specify the value of the function :math:`u` for the Krasker--Welsch weights. If :math:`\mathrm{indw} = 0`, is not referenced. **dchi** : float :math:`d`, the constant of the :math:`\chi` function. :math:`\mathrm{dchi}` is not referenced if :math:`\mathrm{ipsi} = 0`, or if :math:`\mathrm{isigma} \leq 0`. **theta** : float, array-like, shape :math:`\left(m\right)` Starting values of the parameter vector :math:`\theta`. These may be obtained from least squares regression. Alternatively if :math:`\mathrm{isigma} < 0` and :math:`\mathrm{sigma} = 1` or if :math:`\mathrm{isigma} > 0` and :math:`\mathrm{sigma}` approximately equals the standard deviation of the dependent variable, :math:`y`, then :math:`\mathrm{theta}[\textit{i}-1] = 0.0`, for :math:`\textit{i} = 1,2,\ldots,m` may provide reasonable starting values. **sigma** : float A starting value for the estimation of :math:`\sigma`. :math:`\mathrm{sigma}` should be approximately the standard deviation of the residuals from the model evaluated at the value of :math:`\theta` given by :math:`\mathrm{theta}` on entry. **tol** : float, optional The relative precision for the calculation of :math:`A` (if :math:`\mathrm{indw} \neq 0`), the estimates of :math:`\theta` and the estimate of :math:`\sigma` (if :math:`\mathrm{isigma} \neq 0`). Convergence is assumed when the relative change in all elements being considered is less than :math:`\mathrm{tol}`. If :math:`\mathrm{indw} < 0` and :math:`\mathrm{isigma} < 0`, :math:`\mathrm{tol}` is also used to determine the precision of :math:`\beta_1`. It is advisable for :math:`\mathrm{tol}` to be greater than :math:`100\times \text{machine precision}`. **maxit** : int, optional The maximum number of iterations that should be used in the calculation of :math:`A` (if :math:`\mathrm{indw} \neq 0`), and of the estimates of :math:`\theta` and :math:`\sigma`, and of :math:`\beta_1` (if :math:`\mathrm{indw} < 0` and :math:`\mathrm{isigma} < 0`). A value of :math:`\mathrm{maxit} = 50` should be adequate for most uses. **nitmon** : int, optional The amount of information that is printed on each iteration. :math:`\mathrm{nitmon} = 0` No information is printed. :math:`\mathrm{nitmon}\neq 0` The current estimate of :math:`\theta`, the change in :math:`\theta` during the current iteration and the current value of :math:`\sigma` are printed on the first and every :math:`\mathrm{abs}\left(\mathrm{nitmon}\right)` iterations. Also, if :math:`\mathrm{indw} \neq 0` and :math:`\mathrm{nitmon} > 0`, then information on the iterations to calculate :math:`A` is printed. This is the current estimate of :math:`A` and the maximum value of :math:`S_{{ij}}` (see :ref:`Notes <g02ha-py2-py-notes>`). When printing occurs the output is directed to the file object associated with the advisory I/O unit (see :class:`~naginterfaces.base.utils.FileObjManager`). **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **x** : float, ndarray, shape :math:`\left(n, m\right)` Unchanged, except as described above. **y** : float, ndarray, shape :math:`\left(n\right)` Unchanged, except as described above. **theta** : float, ndarray, shape :math:`\left(m\right)` :math:`\mathrm{theta}[\textit{i}-1]` contains the M-estimate of :math:`\theta_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,m`. **sigma** : float Contains the final estimate of :math:`\sigma` if :math:`\mathrm{isigma} \neq 0` or the value assigned on entry if :math:`\mathrm{isigma} = 0`. **c** : float, ndarray, shape :math:`\left(m, m\right)` The diagonal elements of :math:`\mathrm{c}` contain the estimated asymptotic standard errors of the estimates of :math:`\theta`, i.e., :math:`\mathrm{c}[i-1,i-1]` contains the estimated asymptotic standard error of the estimate contained in :math:`\mathrm{theta}[i-1]`. The elements above the diagonal contain the estimated asymptotic correlation between the estimates of :math:`\theta`, i.e., :math:`\mathrm{c}[i-1,j-1]`, :math:`1\leq i < j\leq m` contains the asymptotic correlation between the estimates contained in :math:`\mathrm{theta}[i-1]` and :math:`\mathrm{theta}[j-1]`. The elements below the diagonal contain the estimated asymptotic covariance between the estimates of :math:`\theta`, i.e., :math:`\mathrm{c}[i-1,j-1]`, :math:`1\leq j < i\leq m` contains the estimated asymptotic covariance between the estimates contained in :math:`\mathrm{theta}[i-1]` and :math:`\mathrm{theta}[j-1]`. **rs** : float, ndarray, shape :math:`\left(n\right)` The residuals from the model evaluated at final value of :math:`\mathrm{theta}`, i.e., :math:`\mathrm{rs}` contains the vector :math:`\left(y-X\hat{\theta }\right)`. **wgt** : float, ndarray, shape :math:`\left(n\right)` The vector of weights. :math:`\mathrm{wgt}[\textit{i}-1]` contains the weight for the :math:`\textit{i}`\ th observation, for :math:`\textit{i} = 1,2,\ldots,n`. **stat** : float, ndarray, shape :math:`\left(4\right)` The following values are assigned to :math:`\mathrm{stat}`: :math:`\mathrm{stat}[0] = \beta_1` if :math:`\mathrm{isigma} < 0`, or :math:`\mathrm{stat}[0] = \beta_2` if :math:`\mathrm{isigma} > 0`. :math:`\mathrm{stat}[1] = \text{}` number of iterations used to calculate :math:`A`. :math:`\mathrm{stat}[2] = \text{}` number of iterations used to calculate final estimates of :math:`\theta` and :math:`\sigma`. :math:`\mathrm{stat}[3] = k`, the rank of the weighted least squares equations. .. _g02ha-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ldx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ldx} \geq n`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > m`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry: :math:`\mathrm{ipsi} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ipsi} = 0`, :math:`1`, :math:`2`, :math:`3` or :math:`4`. (`errno` :math:`3`) On entry, :math:`\mathrm{cucv} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{cucv} \geq m`. (`errno` :math:`3`) On entry, :math:`\mathrm{cucv} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{cucv}\geq \sqrt{m}`. (`errno` :math:`3`) On entry: :math:`\mathrm{h1}`, :math:`\mathrm{h2}`, :math:`\mathrm{h3}` incorrectly set. Constraint: :math:`0.0\leq \mathrm{h1}\leq \mathrm{h2}\leq \mathrm{h3}` and :math:`\mathrm{h3} > 0.0`. (`errno` :math:`3`) On entry: :math:`\mathrm{cpsi} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{cpsi} > 0.0`. (`errno` :math:`3`) On entry, :math:`\mathrm{dchi} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{dchi} > 0.0`. (`errno` :math:`3`) On entry, :math:`\mathrm{sigma} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{sigma} > 0.0`. (`errno` :math:`4`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} > 0`. (`errno` :math:`4`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol} > 0.0`. (`errno` :math:`5`) The number of iterations required to calculate the weights exceeds :math:`\mathrm{maxit}`. (Only if :math:`\mathrm{indw}\neq 0`.) (`errno` :math:`6`) The number of iterations required to calculate :math:`\beta_1` exceeds :math:`\mathrm{maxit}`. (Only if :math:`\mathrm{indw} < 0` and :math:`\mathrm{isigma} < 0`.) (`errno` :math:`7`) Iterations to calculate estimates of :math:`\mathrm{theta}` failed to converge in :math:`\mathrm{maxit}` iterations: :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`7`) The number of iterations required to calculate :math:`\theta` and :math:`\sigma` exceeds :math:`\mathrm{maxit}`. (`errno` :math:`12`) Error degrees of freedom :math:`n-k\leq 0`, where :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and the rank of :math:`\mathrm{x}`, :math:`k = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`12`) Estimated value of :math:`\mathrm{sigma}` is zero. **Warns** **NagAlgorithmicWarning** (`errno` :math:`8`) Weighted least squares equations not of full rank: rank :math:`\text{} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`9`) Failure to invert matrix while calculating covariance. (`errno` :math:`10`) Factor for covariance matrix :math:`\text{} = 0`, uncorrected :math:`\left({X^\mathrm{T}X}\right)^{-1}` given. (`errno` :math:`11`) Variance of an element of :math:`\mathrm{theta}\leq 0.0`, correlations set to :math:`0`. .. _g02ha-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` For the linear regression model .. math:: y = X\theta +\epsilon \text{,} .. rst-class:: nag-rules-none nag-align-left +-----+-------------------------------------------------------------------------------------------------------------------------------+ |where|:math:`y` is a vector of length :math:`n` of the dependent variable, | +-----+-------------------------------------------------------------------------------------------------------------------------------+ | |:math:`X` is an :math:`n\times m` matrix of independent variables of column rank :math:`k`, | +-----+-------------------------------------------------------------------------------------------------------------------------------+ | |:math:`\theta` is a vector of length :math:`m` of unknown parameters, | +-----+-------------------------------------------------------------------------------------------------------------------------------+ |and |:math:`\epsilon` is a vector of length :math:`n` of unknown errors with :math:`\mathrm{var}\left(\epsilon_i\right) = \sigma^2`,| +-----+-------------------------------------------------------------------------------------------------------------------------------+ ``robustm`` calculates the M-estimates given by the solution, :math:`\hat{\theta }`, to the equation .. math:: \sum_{{i = 1}}^n\psi \left(r_i/\left(\sigma w_i\right)\right)w_ix_{{ij}} = 0\text{, }\quad j = 1,2,\ldots,m\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |where|:math:`r_i` is the :math:`i`\ th residual, i.e., the :math:`i`\ th element of :math:`r = y-X\hat{\theta }`, | +-----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | |:math:`\psi` is a suitable weight function, | +-----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | |:math:`w_i` are suitable weights, | +-----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |and |:math:`\sigma` may be estimated at each iteration by the median absolute deviation of the residuals :math:`\displaystyle \hat{\sigma } = \mathrm{med}_i\left({\left[\left\lvert r_i\right\rvert \right]/\beta_1}\right)`| +-----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ or as the solution to .. math:: \sum_{{i = 1}}^n\chi \left(r_i/\left(\hat{\sigma }w_i\right)\right)w_i^2 = \left(n-k\right)\beta_2 for suitable weight function :math:`\chi`, where :math:`\beta_1` and :math:`\beta_2` are constants, chosen so that the estimator of :math:`\sigma` is asymptotically unbiased if the errors, :math:`\epsilon_i`, have a Normal distribution. Alternatively :math:`\sigma` may be held at a constant value. The above describes the Schweppe type regression. If the :math:`w_i` are assumed to equal :math:`1` for all :math:`i` then Huber type regression is obtained. A third type, due to Mallows, replaces `(1) <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02haf.html#eqn1>`__ by .. math:: \sum_{{i = 1}}^n\psi \left(r_i/\sigma \right)w_ix_{{ij}} = 0\text{, }\quad j = 1,2,\ldots,m\text{.} This may be obtained by use of the transformations .. math:: \begin{array}{ll}w_i^*←\sqrt{w_i}&\\y_i^*←y_i \sqrt{w_i}&\\x_{{ij}}^*←x_{{ij}} \sqrt{w_i}\text{,}&j = 1,2,\ldots,m\end{array} (see Section 3 of Marazzi (1987a)). For Huber and Schweppe type regressions, :math:`\beta_1` is the 75th percentile of the standard Normal distribution. For Mallows type regression :math:`\beta_1` is the solution to .. math:: \frac{1}{n}\sum_{{i = 1}}^n\Phi \left(\beta_1/\sqrt{w_i}\right) = 0.75\text{,} where :math:`\Phi` is the standard Normal cumulative distribution function (see :meth:`specfun.cdf_normal <naginterfaces.library.specfun.cdf_normal>`). :math:`\beta_2` is given by .. math:: \begin{array}{ll} \beta_2 = \int_{{-\infty }}^{\infty } \chi \left(z\right) \phi \left(z\right) {dz} &\text{in the Huber case;}\\&\\&\\ \beta_2 = \frac{1}{n} \sum_{{i = 1}}^n w_i \int_{{-\infty }}^{\infty } \chi \left(z\right) \phi \left(z\right) {dz} &\text{in the Mallows case;}\\&\\&\\ \beta_2 = \frac{1}{n} \sum_{{i = 1}}^n w_i^2 \int_{{-\infty }}^{\infty } \chi \left(z/w_i\right) \phi \left(z\right) {dz} &\text{in the Schweppe case;}\end{array} where :math:`\phi` is the standard Normal density, i.e., :math:`\frac{1}{\sqrt{2\pi }}\mathrm{exp}\left(-\frac{1}{2}x^2\right)\text{.}` The calculation of the estimates of :math:`\theta` can be formulated as an iterative weighted least squares problem with a diagonal weight matrix :math:`G` given by .. math:: G_{{ii}} = \left\{\begin{array}{ll} \frac{{\psi \left(r_i/\left(\sigma w_i\right)\right)}}{{\left(r_i/\left(\sigma w_i\right)\right)}} \text{,} &r_i\neq 0\\&\\ \psi^{\prime } \left(0\right) \text{,} &r_i = 0\end{array}\text{,}\right. where :math:`\psi^{\prime }\left(t\right)` is the derivative of :math:`\psi` at the point :math:`t`. The value of :math:`\theta` at each iteration is given by the weighted least squares regression of :math:`y` on :math:`X`. This is carried out by first transforming the :math:`y` and :math:`X` by .. math:: \begin{array}{ll}\tilde{y}_i = y_i\sqrt{G_{{ii}}}&\\\tilde{x}_{{ij}} = x_{{ij}}\sqrt{G_{{ii}}}\text{,}&j = 1,2,\ldots,m\end{array} and then solving the associated least squares problem. If :math:`X` is of full column rank then an orthogonal-triangular (:math:`QR`) decomposition is used; if not, a singular value decomposition is used. The following functions are available for :math:`\psi` and :math:`\chi` in ``robustm``. (a) **Unit Weights** .. math:: \psi \left(t\right) = t\text{, }\quad \chi \left(t\right) = \frac{t^2}{2}\text{.} This gives least squares regression. (#) **Huber's Function** .. math:: \psi \left(t\right) = \mathrm{max}\left({-c}, \mathrm{min}\left(c, t\right)\right)\text{, }\quad \chi \left(t\right) = \left\{\begin{array}{ll} \frac{t^2}{2} \text{,} &\left\lvert t\right\rvert \leq d\\&\\ \frac{d^2}{2} \text{,} &\left\lvert t\right\rvert > d\end{array}\right. (#) **Hampel's Piecewise Linear Function** .. math:: \psi_{{h_1,h_2,h_3}}\left(t\right) = -\psi_{{h_1,h_2,h_3}}\left(-t\right) = \left\{\begin{array}{ll}t\text{,}& 0\leq t\leq h_1 \\&\\h_1\text{,}& h_1 \leq t\leq h_2 \\&\\ h_1 \left(h_3-t\right) / \left(h_3-h_2\right) \text{,} & h_2 \leq t\leq h_3 \\&\\0\text{,}&h_3 < t\end{array}\right. .. math:: \chi \left(t\right) = \left\{\begin{array}{ll} \frac{t^2}{2} \text{,} &\left\lvert t\right\rvert \leq d\\&\\ \frac{d^2}{2} \text{,} &\left\lvert t\right\rvert > d\end{array}\right. (#) **Andrew's Sine Wave Function** .. math:: \psi \left(t\right) = \left\{\begin{array}{ll}\sin\left(t\right)\text{,}&-\pi \leq t\leq \pi \\&\\0\text{,}&\left\lvert t\right\rvert > \pi \end{array}\right. \quad \text{ }\quad \chi \left(t\right) = \left\{\begin{array}{ll} \frac{t^2}{2} \text{,} &\left\lvert t\right\rvert \leq d\\&\\ \frac{d^2}{2} \text{,} &\left\lvert t\right\rvert > d\end{array}\right. (#) **Tukey's Bi-weight** .. math:: \psi \left(t\right) = \left\{\begin{array}{ll} t \left(1-t^2\right)^2 \text{,} &\left\lvert t\right\rvert \leq 1\\&\\0\text{,}&\left\lvert t\right\rvert > 1\end{array}\right. \quad \text{ }\quad \chi \left(t\right) = \left\{\begin{array}{ll} \frac{t^2}{2} \text{,} &\left\lvert t\right\rvert \leq d\\&\\ \frac{d^2}{2} \text{,} &\left\lvert t\right\rvert > d\end{array}\right. where :math:`c`, :math:`h_1`, :math:`h_2`, :math:`h_3`, and :math:`d` are given constants. Several schemes for calculating weights have been proposed, see Hampel `et al.` (1986) and Marazzi (1987a). As the different independent variables may be measured on different scales, one group of proposed weights aims to bound a standardized measure of influence. To obtain such weights the matrix :math:`A` has to be found such that: .. math:: \frac{1}{n}\sum_{{i = 1}}^nu\left(\left\lVert z_i\right\rVert_2\right)z_iz_i^\mathrm{T} = I and .. math:: z_i = Ax_i\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+------------------------------------------------------------------------------------------+ |where|:math:`x_i` is a vector of length :math:`m` containing the :math:`i`\ th row of :math:`X`,| +-----+------------------------------------------------------------------------------------------+ | |:math:`A` is an :math:`m\times m` lower triangular matrix, | +-----+------------------------------------------------------------------------------------------+ |and |:math:`u` is a suitable function. | +-----+------------------------------------------------------------------------------------------+ The weights are then calculated as .. math:: w_i = f\left(\left\lVert z_i\right\rVert_2\right) for a suitable function :math:`f`. ``robustm`` finds :math:`A` using the iterative procedure .. math:: A_k = \left(S_k+I\right)A_{{k-1}}\text{,} where :math:`S_k = \left(s_{{jl}}\right)`, .. math:: s_{{jl}} = \left\{\begin{array}{ll} {-\mathrm{min}\left[\mathrm{max}\left({h_{{jl}}/n}, {-BL}\right), {BL}\right]} \text{,} &j > \mathrm{l}\\&\\ {-\mathrm{min}\left[\mathrm{max}\left({\frac{1}{2}\left(h_{{jj}}/n-1\right)}, {-BD}\right), {BD}\right]} \text{,}&j = \mathrm{l}\end{array}\right. and .. math:: h_{{jl}} = \sum_{{i = 1}}^nu\left(\left\lVert z_i\right\rVert_2\right)z_{{ij}}z_{{il}} and :math:`BL` and :math:`BD` are bounds set at :math:`0.9`. Two weights are available in ``robustm``: (i) **Krasker--Welsch Weights** .. math:: u\left(t\right) = g_1\left(\frac{c}{t}\right)\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+------------------------------------------------------------------------------------------------------------+ |where|:math:`g_1\left(t\right) = t^2+\left(1-t^2\right)\left(2\Phi \left(t\right)-1\right)-2t\phi \left(t\right)`,| +-----+------------------------------------------------------------------------------------------------------------+ | |:math:`\Phi \left(t\right)` is the standard Normal cumulative distribution function, | +-----+------------------------------------------------------------------------------------------------------------+ | |:math:`\phi \left(t\right)` is the standard Normal probability density function, | +-----+------------------------------------------------------------------------------------------------------------+ |and |:math:`f\left(t\right) = \frac{1}{t}`. | +-----+------------------------------------------------------------------------------------------------------------+ These are for use with Schweppe type regression. (#) **Maronna's Proposed Weights** .. math:: \begin{array}{l} u\left(t\right) = \left\{\begin{array}{ll} \frac{c}{t^2} &\left\lvert t\right\rvert > c\\1&\left\lvert t\right\rvert \leq c\end{array}\right. \\ f\left(t\right) = \sqrt{u\left(t\right)} \text{.} \end{array} These are for use with Mallows type regression. Finally the asymptotic variance-covariance matrix, :math:`C`, of the estimates :math:`\theta` is calculated. For Huber type regression .. math:: C = f_H\left(X^\mathrm{T}X\right)^{-1}\hat{\sigma }^2\text{,} where .. math:: f_H = \frac{1}{{n-m}}\frac{{\sum_{{i = 1}}^n\psi^2\left(r_i/\hat{\sigma }\right)}}{\left(\frac{1}{n}\sum_{{i = 1}}^n\psi^{\prime }\left(\frac{r_i}{\hat{\sigma }}\right)\right)^2}\kappa^2 .. math:: \kappa^2 = 1+\frac{m}{n}\frac{{\frac{1}{n}\sum_{{i = 1}}^n\left(\psi^{\prime }\left(r_i/\hat{\sigma }\right)-\frac{1}{n}\sum_{{i = 1}}^n\psi^{\prime }\left(r_i/\hat{\sigma }\right)\right)^2}}{\left(\frac{1}{n}\sum_{{i = 1}}^n\psi^{\prime }\left(\frac{r_i}{\hat{\sigma }}\right)\right)^2}\text{.} See Huber (1981) and Marazzi (1987b). For Mallows and Schweppe type regressions :math:`C` is of the form .. math:: \frac{\hat{\sigma }}{n}^2S_1^{-1}S_2S_1^{-1}\text{,} where :math:`S_1 = \frac{1}{n}X^\mathrm{T}DX` and :math:`S_2 = \frac{1}{n}X^\mathrm{T}PX`. :math:`D` is a diagonal matrix such that the :math:`i`\ th element approximates :math:`E\left(\psi^{\prime }\left(r_i/\left(\sigma w_i\right)\right)\right)` in the Schweppe case and :math:`E\left(\psi^{\prime }\left(r_i/\sigma \right)w_i\right)` in the Mallows case. :math:`P` is a diagonal matrix such that the :math:`i`\ th element approximates :math:`E\left(\psi^2\left(r_i/\left(\sigma w_i\right)\right)w_i^2\right)` in the Schweppe case and :math:`E\left(\psi^2\left(r_i/\sigma \right)w_i^2\right)` in the Mallows case. Two approximations are available in ``robustm``: (1) Average over the :math:`r_i` .. math:: \begin{array}{cc}\text{Schweppe}&\text{Mallows}\\&\\ D_i = \left(\frac{1}{n}\sum_{{j = 1}}^n\psi^{\prime }\left(\frac{r_j}{{\hat{\sigma }w_i}}\right)\right) w_i \quad \text{ }\quad & D_i = \left(\frac{1}{n}\sum_{{j = 1}}^n\psi^{\prime }\left(\frac{r_j}{\hat{\sigma }}\right)\right) w_i \\&\\ P_i = \left(\frac{1}{n}\sum_{{j = 1}}^n\psi^2\left(\frac{r_j}{{\hat{\sigma }w_i}}\right)\right) w_i^2 \quad \text{ }\quad & P_i = \left(\frac{1}{n}\sum_{{j = 1}}^n\psi^2\left(\frac{r_j}{\hat{\sigma }}\right)\right) w_i^2 \end{array} (#) Replace expected value by observed .. math:: \begin{array}{cc}\text{Schweppe}&\text{Mallows}\\&\\ D_i = \psi^{\prime } \left(\frac{r_i}{{\hat{\sigma }w_i}}\right) w_i \quad \text{ }\quad & D_i = \psi^{\prime } \left(\frac{r_i}{\hat{\sigma }}\right) w_i \\&\\ P_i = \psi^2 \left(\frac{r_i}{{\hat{\sigma }w_i}}\right) w_i^2 \quad \text{ }\quad & P_i = \psi^2 \left(\frac{r_i}{\hat{\sigma }}\right) w_i^2 \end{array}\text{.} See Hampel `et al.` (1986) and Marazzi (1987b). **Note:** there is no explicit provision in the function for a constant term in the regression model. However, the addition of a dummy variable whose value is :math:`1.0` for all observations will produce a value of :math:`\hat{\theta }` corresponding to the usual constant term. ``robustm`` is based on routines in ROBETH; see Marazzi (1987a). .. _g02ha-py2-py-references: **References** Hampel, F R, Ronchetti, E M, Rousseeuw, P J and Stahel, W A, 1986, `Robust Statistics. The Approach Based on Influence Functions`, Wiley Huber, P J, 1981, `Robust Statistics`, Wiley Marazzi, A, 1987, `Weights for bounded influence regression in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 3, Institut Universitaire de Médecine Sociale et Préventive, Lausanne Marazzi, A, 1987, `Subroutines for robust and bounded influence regression in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 2, Institut Universitaire de Médecine Sociale et Préventive, Lausanne """ raise NotImplementedError
[docs]def robustm_wts(ucv, x, a, bl=0.9, bd=0.9, tol=5e-5, maxit=50, nitmon=0, data=None, io_manager=None): r""" ``robustm_wts`` finds, for a real matrix :math:`X` of full column rank, a lower triangular matrix :math:`A` such that :math:`\left(A^\mathrm{T}A\right)^{-1}` is proportional to a robust estimate of the covariance of the variables. ``robustm_wts`` is intended for the calculation of weights of bounded influence regression using :meth:`robustm_user`. .. _g02hb-py2-py-doc: For full information please refer to the NAG Library document for g02hb https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hbf.html .. _g02hb-py2-py-parameters: **Parameters** **ucv** : callable retval = ucv(t, data=None) :math:`\mathrm{ucv}` must return the value of the function :math:`u` for a given value of its argument. The value of :math:`u` must be non-negative. **Parameters** **t** : float The argument for which :math:`\mathrm{ucv}` must be evaluated. **data** : arbitrary, optional, modifiable in place User-communication data for callback functions. **Returns** **retval** : float The value of :math:`u\left(t\right)` evaluated at :math:`\mathrm{t}`. **x** : float, array-like, shape :math:`\left(n, m\right)` The real matrix :math:`X`, i.e., the independent variables. :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}\textit{j}`\ th element of :math:`\mathrm{x}`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **a** : float, array-like, shape :math:`\left(m\times \left(m+1\right)/2\right)` An initial estimate of the lower triangular real matrix :math:`A`. Only the lower triangular elements must be given and these should be stored row-wise in the array. The diagonal elements must be :math:`\text{}\neq 0`, although in practice will usually be :math:`\text{} > 0`. If the magnitudes of the columns of :math:`X` are of the same order the identity matrix will often provide a suitable initial value for :math:`A`. If the columns of :math:`X` are of different magnitudes, the diagonal elements of the initial value of :math:`A` should be approximately inversely proportional to the magnitude of the columns of :math:`X`. **bl** : float, optional The magnitude of the bound for the off-diagonal elements of :math:`S_k`. **bd** : float, optional The magnitude of the bound for the diagonal elements of :math:`S_k`. **tol** : float, optional The relative precision for the final value of :math:`A`. Iteration will stop when the maximum value of :math:`\left\lvert s_{{jl}}\right\rvert` is less than :math:`\mathrm{tol}`. **maxit** : int, optional The maximum number of iterations that will be used during the calculation of :math:`A`. A value of :math:`\mathrm{maxit} = 50` will often be adequate. **nitmon** : int, optional Determines the amount of information that is printed on each iteration. :math:`\mathrm{nitmon} > 0` The value of :math:`A` and the maximum value of :math:`\left\lvert s_{{jl}}\right\rvert` will be printed at the first and every :math:`\mathrm{nitmon}` iterations. :math:`\mathrm{nitmon}\leq 0` No iteration monitoring is printed. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see :class:`~naginterfaces.base.utils.FileObjManager`). **data** : arbitrary, optional User-communication data for callback functions. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **a** : float, ndarray, shape :math:`\left(m\times \left(m+1\right)/2\right)` The lower triangular elements of the matrix :math:`A`, stored row-wise. **z** : float, ndarray, shape :math:`\left(n\right)` The value :math:`\left\lVert z_{\textit{i}}\right\rVert_2`, for :math:`\textit{i} = 1,2,\ldots,n`. **nit** : int The number of iterations performed. .. _g02hb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ldx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ldx} \geq n`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq m`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{bd} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{bd} > 0.0`. (`errno` :math:`2`) On entry, :math:`\mathrm{bl} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{bl} > 0.0`. (`errno` :math:`2`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} > 0`. (`errno` :math:`2`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol} > 0.0`. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and the :math:`i`\ th diagonal element of :math:`A` is :math:`0`. Constraint: all diagonal elements of :math:`A` must be non-zero. (`errno` :math:`3`) Value returned by :math:`\mathrm{ucv}` function :math:`\text{} < 0`: :math:`u\left(\langle\mathit{\boldsymbol{value}}\rangle\right) = \langle\mathit{\boldsymbol{value}}\rangle`. The value of :math:`u` must be non-negative. (`errno` :math:`4`) Iterations to calculate weights failed to converge in :math:`\mathrm{maxit}` iterations: :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. .. _g02hb-py2-py-notes: **Notes** In fitting the linear regression model .. math:: y = X\theta +\epsilon \text{,} .. rst-class:: nag-rules-none nag-align-left +-----+---------------------------------------------------------------------+ |where|:math:`y` is a vector of length :math:`n` of the dependent variable, | +-----+---------------------------------------------------------------------+ | |:math:`X` is an :math:`n\times m` matrix of independent variables, | +-----+---------------------------------------------------------------------+ | |:math:`\theta` is a vector of length :math:`m` of unknown parameters,| +-----+---------------------------------------------------------------------+ |and |:math:`\epsilon` is a vector of length :math:`n` of unknown errors, | +-----+---------------------------------------------------------------------+ it may be desirable to bound the influence of rows of the :math:`X` matrix. This can be achieved by calculating a weight for each observation. Several schemes for calculating weights have been proposed (see Hampel `et al.` (1986) and Marazzi (1987)). As the different independent variables may be measured on different scales one group of proposed weights aims to bound a standardized measure of influence. To obtain such weights the matrix :math:`A` has to be found such that .. math:: \frac{1}{n}\sum_{{i = 1}}^nu\left(\left\lVert z_i\right\rVert_2\right)z_iz_i^\mathrm{T} = I\text{ }\quad \left(I\text{ is the identity matrix}\right) and .. math:: z_i = Ax_i\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+----------------------------------------------------------------------------------------------------------+ |where|:math:`x_i` is a vector of length :math:`m` containing the elements of the :math:`i`\ th row of :math:`X`,| +-----+----------------------------------------------------------------------------------------------------------+ | |:math:`A` is an :math:`m\times m` lower triangular matrix, | +-----+----------------------------------------------------------------------------------------------------------+ | |:math:`z_i` is a vector of length :math:`m`, | +-----+----------------------------------------------------------------------------------------------------------+ |and |:math:`u` is a suitable function. | +-----+----------------------------------------------------------------------------------------------------------+ The weights for use with :meth:`robustm_user` may then be computed using .. math:: w_i = f\left(\left\lVert z_i\right\rVert_2\right) for a suitable user-supplied function :math:`f`. ``robustm_wts`` finds :math:`A` using the iterative procedure .. math:: A_k = \left(S_k+I\right)A_{{k-1}}\text{,} where :math:`S_k = \left(s_{{jl}}\right)`, for :math:`\textit{l} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`, is a lower triangular matrix such that :math:`s_{{jl}} = \left\{\begin{array}{ll}-\mathrm{min}\left[\mathrm{max}\left(h_{{jl}}/n,-\textit{BL}\right),\textit{BL}\right]\text{,}&j > l\\&\\-\mathrm{min}\left[\mathrm{max}\left(\frac{1}{2}\left(h_{{jj}}/n-1\right),-\textit{BD}\right),\textit{BD}\right]\text{,}&j = l\end{array}\right.` :math:`h_{{jl}} = \sum_{{i = 1}}^nu\left(\left\lVert z_i\right\rVert_2\right)z_{{ij}}z_{{il}}` and :math:`\textit{BD}` and :math:`\textit{BL}` are suitable bounds. In addition the values of :math:`\left\lVert z_i\right\rVert_2`, for :math:`i = 1,2,\ldots,n`, are calculated. ``robustm_wts`` is based on routines in ROBETH; see Marazzi (1987). .. _g02hb-py2-py-references: **References** Hampel, F R, Ronchetti, E M, Rousseeuw, P J and Stahel, W A, 1986, `Robust Statistics. The Approach Based on Influence Functions`, Wiley Huber, P J, 1981, `Robust Statistics`, Wiley Marazzi, A, 1987, `Weights for bounded influence regression in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 3, Institut Universitaire de Médecine Sociale et Préventive, Lausanne """ raise NotImplementedError
[docs]def robustm_user(psi, psip0, beta, indw, isigma, x, y, wgt, theta, sigma, chi=None, tol=5e-5, eps=5e-6, maxit=50, nitmon=0, data=None, io_manager=None): r""" ``robustm_user`` performs bounded influence regression (:math:`M`-estimates) using an iterative weighted least squares algorithm. .. _g02hd-py2-py-doc: For full information please refer to the NAG Library document for g02hd https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hdf.html .. _g02hd-py2-py-parameters: **Parameters** **psi** : callable retval = psi(t, data=None) :math:`\mathrm{psi}` must return the value of the weight function :math:`\psi` for a given value of its argument. **Parameters** **t** : float The argument for which :math:`\mathrm{psi}` must be evaluated. **data** : arbitrary, optional, modifiable in place User-communication data for callback functions. **Returns** **retval** : float The value of the weight function :math:`\psi` evaluated at :math:`\mathrm{t}`. **psip0** : float The value of :math:`\psi^{\prime }\left(0\right)`. **beta** : float If :math:`\mathrm{isigma} < 0`, :math:`\mathrm{beta}` must specify the value of :math:`\beta_1`. For Huber and Schweppe type regressions, :math:`\beta_1` is the :math:`75`\ th percentile of the standard Normal distribution (see :meth:`stat.inv_cdf_normal <naginterfaces.library.stat.inv_cdf_normal>`). For Mallows type regression :math:`\beta_1` is the solution to .. math:: \frac{1}{n}\sum_{{i = 1}}^n\Phi \left(\beta_1/\sqrt{w_i}\right) = 0.75\text{,} where :math:`\Phi` is the standard Normal cumulative distribution function (see :meth:`specfun.cdf_normal <naginterfaces.library.specfun.cdf_normal>`). If :math:`\mathrm{isigma} > 0`, :math:`\mathrm{beta}` must specify the value of :math:`\beta_2`. .. math:: \begin{array}{lll}\beta_2 = &\int_{{-\infty }}^{\infty }\chi \left(z\right)\phi \left(z\right){dz}\text{,}&\text{in the Huber case;}\\&&\\&&\\\beta_2 = &\frac{1}{n}\sum_{{i = 1}}^nw_i\int_{{-\infty }}^{\infty }\chi \left(z\right)\phi \left(z\right){dz}\text{,}&\text{in the Mallows case;}\\&&\\&&\\\beta_2 = &\frac{1}{n}\sum_{{i = 1}}^nw_i^2\int_{{-\infty }}^{\infty }\chi \left(z/w_i\right)\phi \left(z\right){dz}\text{,}&\text{in the Schweppe case;}\end{array} where :math:`\phi` is the standard normal density, i.e., :math:`\frac{1}{\sqrt{2\pi }}\mathrm{exp}\left(-\frac{1}{2}x^2\right)`. If :math:`\mathrm{isigma} = 0`, :math:`\mathrm{beta}` is not referenced. **indw** : int Determines the type of regression to be performed. :math:`\mathrm{indw} = 0` Huber type regression. :math:`\mathrm{indw} < 0` Mallows type regression. :math:`\mathrm{indw} > 0` Schweppe type regression. **isigma** : int Determines how :math:`\sigma` is to be estimated. :math:`\mathrm{isigma} = 0` :math:`\sigma` is held constant at its initial value. :math:`\mathrm{isigma} < 0` :math:`\sigma` is estimated by median absolute deviation of residuals. :math:`\mathrm{isigma} > 0` :math:`\sigma` is estimated using the :math:`\chi` function. **x** : float, array-like, shape :math:`\left(n, m\right)` The values of the :math:`X` matrix, i.e., the independent variables. :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}\textit{j}`\ th element of :math:`\mathrm{x}`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. If :math:`\mathrm{indw} < 0`, during calculations the elements of :math:`\mathrm{x}` will be transformed as described in :ref:`Notes <g02hd-py2-py-notes>`. Before exit the inverse transformation will be applied. As a result there may be slight differences between the input :math:`\mathrm{x}` and the output :math:`\mathrm{x}`. **y** : float, array-like, shape :math:`\left(n\right)` The data values of the dependent variable. :math:`\mathrm{y}[\textit{i}-1]` must contain the value of :math:`y` for the :math:`\textit{i}`\ th observation, for :math:`\textit{i} = 1,2,\ldots,n`. If :math:`\mathrm{indw} < 0`, during calculations the elements of :math:`\mathrm{y}` will be transformed as described in :ref:`Notes <g02hd-py2-py-notes>`. Before exit the inverse transformation will be applied. As a result there may be slight differences between the input :math:`\mathrm{y}` and the output :math:`\mathrm{y}`. **wgt** : float, array-like, shape :math:`\left(n\right)` The weight for the :math:`\textit{i}`\ th observation, for :math:`\textit{i} = 1,2,\ldots,n`. If :math:`\mathrm{indw} < 0`, during calculations elements of :math:`\mathrm{wgt}` will be transformed as described in :ref:`Notes <g02hd-py2-py-notes>`. Before exit the inverse transformation will be applied. As a result there may be slight differences between the input :math:`\mathrm{wgt}` and the output :math:`\mathrm{wgt}`. If :math:`\mathrm{wgt}[i-1]\leq 0`, the :math:`i`\ th observation is not included in the analysis. If :math:`\mathrm{indw} = 0`, :math:`\mathrm{wgt}` is not referenced. **theta** : float, array-like, shape :math:`\left(m\right)` Starting values of the parameter vector :math:`\theta`. These may be obtained from least squares regression. Alternatively if :math:`\mathrm{isigma} < 0` and :math:`\mathrm{sigma} = 1` or if :math:`\mathrm{isigma} > 0` and :math:`\mathrm{sigma}` approximately equals the standard deviation of the dependent variable, :math:`y`, then :math:`\mathrm{theta}[\textit{i}-1] = 0.0`, for :math:`\textit{i} = 1,2,\ldots,m` may provide reasonable starting values. **sigma** : float A starting value for the estimation of :math:`\sigma`. :math:`\mathrm{sigma}` should be approximately the standard deviation of the residuals from the model evaluated at the value of :math:`\theta` given by :math:`\mathrm{theta}` on entry. **chi** : None or callable retval = chi(t, data=None), optional Note: if this argument is **None** then a NAG-supplied facility will be used. If :math:`\mathrm{isigma} > 0`, :math:`\mathrm{chi}` must return the value of the weight function :math:`\chi` for a given value of its argument. The value of :math:`\chi` must be non-negative. **Parameters** **t** : float The argument for which :math:`\mathrm{chi}` must be evaluated. **data** : arbitrary, optional, modifiable in place User-communication data for callback functions. **Returns** **retval** : float The value of the weight function :math:`\chi` evaluated at :math:`\mathrm{t}`. **tol** : float, optional The relative precision for the final estimates. Convergence is assumed when both the relative change in the value of :math:`\mathrm{sigma}` and the relative change in the value of each element of :math:`\mathrm{theta}` are less than :math:`\mathrm{tol}`. It is advisable for :math:`\mathrm{tol}` to be greater than :math:`100\times \text{machine precision}`. **eps** : float, optional A relative tolerance to be used to determine the rank of :math:`X`. See :meth:`linsys.real_gen_solve <naginterfaces.library.linsys.real_gen_solve>` for further details. If :math:`\mathrm{eps} < \text{machine precision}` or :math:`\mathrm{eps} > 1.0`, machine precision will be used in place of :math:`\mathrm{tol}`. A reasonable value for :math:`\mathrm{eps}` is :math:`5.0\times 10^{-6}` where this value is possible. **maxit** : int, optional The maximum number of iterations that should be used during the estimation. A value of :math:`\mathrm{maxit} = 50` should be adequate for most uses. **nitmon** : int, optional Determines the amount of information that is printed on each iteration. :math:`\mathrm{nitmon}\leq 0` No information is printed. :math:`\mathrm{nitmon} > 0` On the first and every :math:`\mathrm{nitmon}` iterations the values of :math:`\mathrm{sigma}`, :math:`\mathrm{theta}` and the change in :math:`\mathrm{theta}` during the iteration are printed. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see :class:`~naginterfaces.base.utils.FileObjManager`). **data** : arbitrary, optional User-communication data for callback functions. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **x** : float, ndarray, shape :math:`\left(n, m\right)` Unchanged, except as described above. **y** : float, ndarray, shape :math:`\left(n\right)` Unchanged, except as described above. **wgt** : float, ndarray, shape :math:`\left(n\right)` Unchanged, except as described above. **theta** : float, ndarray, shape :math:`\left(m\right)` The M-estimate of :math:`\theta_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,m`. **k** : int The column rank of the matrix :math:`X`. **sigma** : float The final estimate of :math:`\sigma` if :math:`\mathrm{isigma} \neq 0` or the value assigned on entry if :math:`\mathrm{isigma} = 0`. **rs** : float, ndarray, shape :math:`\left(n\right)` The residuals from the model evaluated at final value of :math:`\mathrm{theta}`, i.e., :math:`\mathrm{rs}` contains the vector :math:`\left(y-X\hat{\theta }\right)`. **nit** : int The number of iterations that were used during the estimation. .. _g02hd-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ldx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ldx} \geq n`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > m`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\mathrm{beta} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{beta} > 0.0`. (`errno` :math:`2`) On entry, :math:`\mathrm{sigma} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{sigma} > 0.0`. (`errno` :math:`3`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} > 0`. (`errno` :math:`3`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol} > 0.0`. (`errno` :math:`4`) Value given by :math:`\mathrm{chi}` function :math:`\text{} < 0`: :math:`\mathrm{chi}\left(\langle\mathit{\boldsymbol{value}}\rangle\right) = \langle\mathit{\boldsymbol{value}}\rangle`. The value of :math:`\mathrm{chi}` must be non-negative. (`errno` :math:`5`) Estimated value of :math:`\mathrm{sigma}` is zero. (`errno` :math:`6`) Iterations to solve the weighted least squares equations failed to converge. (`errno` :math:`8`) The function has failed to converge in :math:`\mathrm{maxit}` iterations. (`errno` :math:`9`) Having removed cases with zero weight, the value of :math:`n-\mathrm{k}\leq 0`, i.e., no degree of freedom for error. This error will only occur if :math:`\mathrm{isigma} > 0`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`7`) The weighted least squares equations are not of full rank. This may be due to the :math:`X` matrix not being of full rank, in which case the results will be valid. It may also occur if some of the :math:`G_{{ii}}` values become very small or zero, see `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hdf.html#fcomments>`__. The rank of the equations is given by :math:`\mathrm{k}`. If the matrix just fails the test for nonsingularity then the result :math:`\mathrm{errno}` = 7 and :math:`\mathrm{k} = m` is possible (see :meth:`linsys.real_gen_solve <naginterfaces.library.linsys.real_gen_solve>`). .. _g02hd-py2-py-notes: **Notes** For the linear regression model .. math:: y = X\theta +\epsilon \text{,} .. rst-class:: nag-rules-none nag-align-left +-----+-----------------------------------------------------------------------------------------------------------------------+ |where|:math:`y` is a vector of length :math:`n` of the dependent variable, | +-----+-----------------------------------------------------------------------------------------------------------------------+ | |:math:`X` is an :math:`n\times m` matrix of independent variables of column rank :math:`k`, | +-----+-----------------------------------------------------------------------------------------------------------------------+ | |:math:`\theta` is a vector of length :math:`m` of unknown parameters, | +-----+-----------------------------------------------------------------------------------------------------------------------+ |and |:math:`\epsilon` is a vector of length :math:`n` of unknown errors with var :math:`\left(\epsilon_i\right) = \sigma^2`,| +-----+-----------------------------------------------------------------------------------------------------------------------+ ``robustm_user`` calculates the M-estimates given by the solution, :math:`\hat{\theta }`, to the equation .. math:: \sum_{{i = 1}}^n\psi \left(r_i/\left(\sigma w_i\right)\right)w_ix_{{ij}} = 0\text{, }\quad j = 1,2,\ldots,m\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |where|:math:`r_i` is the :math:`i`\ th residual, i.e., the :math:`i`\ th element of the vector :math:`r = y-X\hat{\theta }`, | +-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | |:math:`\psi` is a suitable weight function, | +-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | |:math:`w_i` are suitable weights such as those that can be calculated by using output from :meth:`robustm_wts`, | +-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |and |:math:`\sigma` may be estimated at each iteration by the median absolute deviation of the residuals :math:`\hat{\sigma } = \mathrm{med}_i\left({\left[\left\lvert r_i\right\rvert \right]/\beta_1}\right)`| +-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ or as the solution to .. math:: \sum_{{i = 1}}^n\chi \left(r_i/\left(\hat{\sigma }w_i\right)\right)w_i^2 = \left(n-k\right)\beta_2 for a suitable weight function :math:`\chi`, where :math:`\beta_1` and :math:`\beta_2` are constants, chosen so that the estimator of :math:`\sigma` is asymptotically unbiased if the errors, :math:`\epsilon_i`, have a Normal distribution. Alternatively :math:`\sigma` may be held at a constant value. The above describes the Schweppe type regression. If the :math:`w_i` are assumed to equal :math:`1` for all :math:`i`, then Huber type regression is obtained. A third type, due to Mallows, replaces `(1) <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hdf.html#eqn1>`__ by .. math:: \sum_{{i = 1}}^n\psi \left(r_i/\sigma \right)w_ix_{{ij}} = 0\text{, }\quad j = 1,2,\ldots,m\text{.} This may be obtained by use of the transformations .. math:: \begin{array}{ll}w_i^*&←\sqrt{w_i}\\y_i^*&←y_i\sqrt{w_i}\\x_{{ij}}^*&←x_{{ij}}\sqrt{w_i}\text{, }\quad j = 1,2,\ldots,m\end{array} (see Marazzi (1987)). The calculation of the estimates of :math:`\theta` can be formulated as an iterative weighted least squares problem with a diagonal weight matrix :math:`G` given by .. math:: G_{{ii}} = \left\{\begin{array}{cl}\frac{{\psi \left(r_i/\left(\sigma w_i\right)\right)}}{{\left(r_i/\left(\sigma w_i\right)\right)}}\text{,}&r_i\neq 0\\&\\\psi^{\prime }\left(0\right)\text{,}&r_i = 0\text{.}\end{array}\right. \text{.} The value of :math:`\theta` at each iteration is given by the weighted least squares regression of :math:`y` on :math:`X`. This is carried out by first transforming the :math:`y` and :math:`X` by .. math:: \begin{array}{ll}\tilde{y}_i& = y_i\sqrt{G_{{ii}}}\\\tilde{x}_{{ij}}& = x_{{ij}}\sqrt{G_{{ii}}}\text{, }\quad j = 1,2,\ldots,m\end{array} and then using :meth:`linsys.real_gen_solve <naginterfaces.library.linsys.real_gen_solve>`. If :math:`X` is of full column rank then an orthogonal-triangular (:math:`QR`) decomposition is used; if not, a singular value decomposition is used. Observations with zero or negative weights are not included in the solution. **Note:** there is no explicit provision in the function for a constant term in the regression model. However, the addition of a dummy variable whose value is :math:`1.0` for all observations will produce a value of :math:`\hat{\theta }` corresponding to the usual constant term. ``robustm_user`` is based on routines in ROBETH, see Marazzi (1987). .. _g02hd-py2-py-references: **References** Hampel, F R, Ronchetti, E M, Rousseeuw, P J and Stahel, W A, 1986, `Robust Statistics. The Approach Based on Influence Functions`, Wiley Huber, P J, 1981, `Robust Statistics`, Wiley Marazzi, A, 1987, `Subroutines for robust and bounded influence regression in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 2, Institut Universitaire de Médecine Sociale et Préventive, Lausanne """ raise NotImplementedError
[docs]def robustm_user_varmat(psi, psp, indw, indc, sigma, x, rs, wgt, data=None): r""" ``robustm_user_varmat`` calculates an estimate of the asymptotic variance-covariance matrix for the bounded influence regression estimates (M-estimates). It is intended for use with :meth:`robustm_user`. .. _g02hf-py2-py-doc: For full information please refer to the NAG Library document for g02hf https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hff.html .. _g02hf-py2-py-parameters: **Parameters** **psi** : callable retval = psi(t, data=None) :math:`\mathrm{psi}` must return the value of the :math:`\psi` function for a given value of its argument. **Parameters** **t** : float The argument for which :math:`\mathrm{psi}` must be evaluated. **data** : arbitrary, optional, modifiable in place User-communication data for callback functions. **Returns** **retval** : float The value of the function :math:`\psi` evaluated at :math:`\mathrm{t}`. **psp** : callable retval = psp(t, data=None) :math:`\mathrm{psp}` must return the value of :math:`\psi^{\prime }\left(t\right) = \frac{d}{{dt}}\psi \left(t\right)` for a given value of its argument. **Parameters** **t** : float The argument for which :math:`\mathrm{psp}` must be evaluated. **data** : arbitrary, optional, modifiable in place User-communication data for callback functions. **Returns** **retval** : float The value of :math:`\psi^{\prime }\left(t\right)` evaluated at :math:`\mathrm{t}`. **indw** : int The type of regression for which the asymptotic variance-covariance matrix is to be calculated. :math:`\mathrm{indw} = -1` Mallows type regression. :math:`\mathrm{indw} = 0` Huber type regression. :math:`\mathrm{indw} = 1` Schweppe type regression. **indc** : int If :math:`\mathrm{indw} \neq 0`, :math:`\mathrm{indc}` must specify the approximation to be used. If :math:`\mathrm{indc} = 1`, averaging over residuals. If :math:`\mathrm{indc} = 0`, replacing expected by observed. If :math:`\mathrm{indw} = 0`, :math:`\mathrm{indc}` is not referenced. **sigma** : float The value of :math:`\hat{\sigma }`, as given by :meth:`robustm_user`. **x** : float, array-like, shape :math:`\left(n, m\right)` The values of the :math:`X` matrix, i.e., the independent variables. :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}\textit{j}`\ th element of :math:`X`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **rs** : float, array-like, shape :math:`\left(n\right)` The residuals from the bounded influence regression. These are given by :meth:`robustm_user`. **wgt** : float, array-like, shape :math:`\left(n\right)` If :math:`\mathrm{indw} \neq 0`, :math:`\mathrm{wgt}` must contain the vector of weights used by the bounded influence regression. These should be used with :meth:`robustm_user`. If :math:`\mathrm{indw} = 0`, :math:`\mathrm{wgt}` is not referenced. **data** : arbitrary, optional User-communication data for callback functions. **Returns** **c** : float, ndarray, shape :math:`\left(m, m\right)` The estimate of the variance-covariance matrix. **wk** : float, ndarray, shape :math:`\left(m\times \left(n+m+1\right)+2\times n\right)` If :math:`\mathrm{indw} \neq 0`, :math:`\mathrm{wk}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,n`, will contain the diagonal elements of the matrix :math:`D` and :math:`\mathrm{wk}[\textit{i}-1]`, for :math:`\textit{i} = n+1,\ldots,2n`, will contain the diagonal elements of matrix :math:`P`. The rest of the array is used as workspace. .. _g02hf-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ldc} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ldc} \geq m`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ldx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ldx} \geq n`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > m`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\mathrm{sigma} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{sigma}\geq 0.0`. (`errno` :math:`3`) :math:`\textit{S}_1` matrix is singular or almost singular. (`errno` :math:`3`) :math:`X^\mathrm{T}X` matrix not positive definite. (`errno` :math:`4`) Correction factor :math:`= 0` (Huber type regression). .. _g02hf-py2-py-notes: **Notes** For a description of bounded influence regression see :meth:`robustm_user`. Let :math:`\theta` be the regression parameters and let :math:`C` be the asymptotic variance-covariance matrix of :math:`\hat{\theta }`. Then for Huber type regression .. math:: C = f_H\left(X^\mathrm{T}X\right)^{-1}\hat{\sigma }^2\text{,} where .. math:: f_H = \frac{1}{{n-m}}\frac{{\sum_{{i = 1}}^n\psi^2\left(r_i/\hat{\sigma }\right)}}{\left(\frac{1}{n}\sum \psi^{\prime }\left(\frac{r_i}{\hat{\sigma }}\right)\right)^2}\kappa^2 .. math:: \kappa^2 = 1+\frac{m}{n}\frac{{\frac{1}{n}\sum_{{i = 1}}^n\left(\psi^{\prime }\left(r_i/\hat{\sigma }\right)-\frac{1}{n}\sum_{{i = 1}}^n\psi^{\prime }\left(r_i/\hat{\sigma }\right)\right)^2}}{\left(\frac{1}{n}\sum_{{i = 1}}^n\psi^{\prime }\left(\frac{r_i}{\hat{\sigma }}\right)\right)^2}\text{,} see Huber (1981) and Marazzi (1987). For Mallows and Schweppe type regressions, :math:`C` is of the form .. math:: \frac{\hat{\sigma }}{n}^2S_1^{-1}S_2S_1^{-1}\text{,} where :math:`S_1 = \frac{1}{n}X^\mathrm{T}DX` and :math:`S_2 = \frac{1}{n}X^\mathrm{T}PX`. :math:`D` is a diagonal matrix such that the :math:`i`\ th element approximates :math:`E\left(\psi^{\prime }\left(r_i/\left(\sigma w_i\right)\right)\right)` in the Schweppe case and :math:`E\left(\psi^{\prime }\left(r_i/\sigma \right)w_i\right)` in the Mallows case. :math:`P` is a diagonal matrix such that the :math:`i`\ th element approximates :math:`E\left(\psi^2\left(r_i/\left(\sigma w_i\right)\right)w_i^2\right)` in the Schweppe case and :math:`E\left(\psi^2\left(r_i/\sigma \right)w_i^2\right)` in the Mallows case. Two approximations are available in ``robustm_user_varmat``: (1) Average over the :math:`r_i` .. math:: \begin{array}{cc}\text{Schweppe}&\text{Mallows}\\&\\D_i = \left(\frac{1}{n}\sum_{{j = 1}}^n\psi^{\prime }\left(\frac{r_j}{{\hat{\sigma }w_i}}\right)\right) w_i\quad \text{ }\quad &D_i = \left(\frac{1}{n}\sum_{{j = 1}}^n\psi^{\prime }\left(\frac{r_j}{\hat{\sigma }}\right)\right) w_i\\&\\P_i = \left(\frac{1}{n}\sum_{{j = 1}}^n\psi^2\left(\frac{r_j}{{\hat{\sigma }w_i}}\right)\right) w_i^2\quad \text{ }\quad &P_i = \left(\frac{1}{n}\sum_{{j = 1}}^n\psi^2\left(\frac{r_j}{\hat{\sigma }}\right)\right) w_i^2\end{array} (#) Replace expected value by observed .. math:: \begin{array}{cc}\text{Schweppe}&\text{Mallows}\\&\\D_i = \psi^{\prime } \left(\frac{r_i}{{\hat{\sigma }w_i}}\right) w_i \quad \text{ }\quad &D_i = \psi^{\prime } \left(\frac{r_i}{\hat{\sigma }}\right) w_i\\&\\P_i = \psi^2 \left(\frac{r_i}{{\hat{\sigma }w_i}}\right) w_i^2 \quad \text{ }\quad &P_i = \psi^2 \left(\frac{r_i}{\hat{\sigma }}\right) w_i^2\end{array} See Hampel `et al.` (1986) and Marazzi (1987). In all cases :math:`\hat{\sigma }` is a robust estimate of :math:`\sigma`. ``robustm_user_varmat`` is based on routines in ROBETH; see Marazzi (1987). .. _g02hf-py2-py-references: **References** Hampel, F R, Ronchetti, E M, Rousseeuw, P J and Stahel, W A, 1986, `Robust Statistics. The Approach Based on Influence Functions`, Wiley Huber, P J, 1981, `Robust Statistics`, Wiley Marazzi, A, 1987, `Subroutines for robust and bounded influence regression in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 2, Institut Universitaire de Médecine Sociale et Préventive, Lausanne """ raise NotImplementedError
[docs]def robustm_corr_huber(x, eps, maxit=150, nitmon=0, tol=5e-5, io_manager=None): r""" ``robustm_corr_huber`` computes a robust estimate of the covariance matrix for an expected fraction of gross errors. .. _g02hk-py2-py-doc: For full information please refer to the NAG Library document for g02hk https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hkf.html .. _g02hk-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **eps** : float :math:`\epsilon`, the expected fraction of gross errors expected in the sample. **maxit** : int, optional The maximum number of iterations that will be used during the calculation of the covariance matrix. **nitmon** : int, optional Indicates the amount of information on the iteration that is printed. :math:`\mathrm{nitmon} > 0` The value of :math:`A`, :math:`\theta` and :math:`\delta` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hkf.html#accuracy>`__) will be printed at the first and every :math:`\mathrm{nitmon}` iterations. :math:`\mathrm{nitmon}\leq 0` No iteration monitoring is printed. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see :class:`~naginterfaces.base.utils.FileObjManager`). **tol** : float, optional The relative precision for the final estimates of the covariance matrix. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **cov** : float, ndarray, shape :math:`\left(m\times \left(m+1\right)/2\right)` A robust estimate of the covariance matrix, :math:`C`. The upper triangular part of the matrix :math:`C` is stored packed by columns. :math:`C_{{ij}}` is returned in :math:`\mathrm{cov}[\left(j\times \left(j-1\right)/2+i\right)-1]`, :math:`i\leq j`. **theta** : float, ndarray, shape :math:`\left(m\right)` The robust estimate of the location parameters :math:`\theta_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,m`. **nit** : int The number of iterations performed. .. _g02hk-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} > 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol} > 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{eps} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0.0\leq \mathrm{eps}\leq 1.0`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq m`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`2`) On entry, a variable has a constant value, i.e., all elements in column :math:`\langle\mathit{\boldsymbol{value}}\rangle` of :math:`\mathrm{x}` are identical. (`errno` :math:`3`) The iterative procedure to find :math:`C` has failed to converge in :math:`\mathrm{maxit}` iterations. (`errno` :math:`4`) The iterative procedure to find :math:`C` has become unstable. This may happen if the value of :math:`\mathrm{eps}` is too large for the sample. .. _g02hk-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` For a set of :math:`n` observations on :math:`m` variables in a matrix :math:`X`, a robust estimate of the covariance matrix, :math:`C`, and a robust estimate of location, :math:`\theta`, are given by .. math:: C = \tau^2\left(A^\mathrm{T}A\right)^{-1}\text{,} where :math:`\tau^2` is a correction factor and :math:`A` is a lower triangular matrix found as the solution to the following equations: .. math:: z_i = A\left(x_i-\theta \right)\text{,} .. math:: \frac{1}{n}\sum_{{i = 1}}^nw\left(\left\lVert z_i\right\rVert_2\right)z_i = 0\text{,} and .. math:: \frac{1}{n}\sum_{{i = 1}}^nu\left(\left\lVert z_i\right\rVert_2\right)z_iz_i^\mathrm{T}-I = 0\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+-------------------------------------------------------------------------------------------------------------------+ |where|:math:`x_i` is a vector of length :math:`m` containing the elements of the :math:`i`\ th row of :math:`\mathrm{x}`,| +-----+-------------------------------------------------------------------------------------------------------------------+ | |:math:`z_i` is a vector of length :math:`m`, | +-----+-------------------------------------------------------------------------------------------------------------------+ | |:math:`I` is the identity matrix and :math:`0` is the zero matrix, | +-----+-------------------------------------------------------------------------------------------------------------------+ |and |:math:`w` and :math:`u` are suitable functions. | +-----+-------------------------------------------------------------------------------------------------------------------+ ``robustm_corr_huber`` uses weight functions: .. math:: \begin{array}{ll}u\left(t\right) = \frac{a_u}{t^2}\text{,}&\text{if }t < a_u^2\\&\\u\left(t\right) = 1\text{,}&\text{if }a_u^2\leq t\leq b_u^2\\&\\u\left(t\right) = \frac{{b_u}}{t^2}\text{,}&\text{if }t > b_u^2\end{array} and .. math:: \begin{array}{ll}w\left(t\right) = 1\text{,}&\text{if }t\leq c_w\\&\\w\left(t\right) = \frac{c_w}{t}\text{,}&\text{if }t > c_w\end{array} for constants :math:`a_u`, :math:`b_u` and :math:`c_w`. These functions solve a minimax problem considered by Huber (see Huber (1981)). The values of :math:`a_u`, :math:`b_u` and :math:`c_w` are calculated from the expected fraction of gross errors, :math:`\epsilon` (see Huber (1981) and Marazzi (1987)). The expected fraction of gross errors is the estimated proportion of outliers in the sample. In order to make the estimate asymptotically unbiased under a Normal model a correction factor, :math:`\tau^2`, is calculated, (see Huber (1981) and Marazzi (1987)). The matrix :math:`C` is calculated using :meth:`robustm_corr_user_deriv`. Initial estimates of :math:`\theta_j`, for :math:`j = 1,2,\ldots,m`, are given by the median of the :math:`j`\ th column of :math:`X` and the initial value of :math:`A` is based on the median absolute deviation (see Marazzi (1987)). ``robustm_corr_huber`` is based on routines in ROBETH; see Marazzi (1987). .. _g02hk-py2-py-references: **References** Huber, P J, 1981, `Robust Statistics`, Wiley Marazzi, A, 1987, `Weights for bounded influence regression in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 3, Institut Universitaire de Médecine Sociale et Préventive, Lausanne """ raise NotImplementedError
[docs]def robustm_corr_user_deriv(ucv, indm, x, a, theta, bl=0.9, bd=0.9, maxit=150, nitmon=0, tol=5e-5, data=None, io_manager=None): r""" ``robustm_corr_user_deriv`` calculates a robust estimate of the covariance matrix for user-supplied weight functions and their derivatives. .. _g02hl-py2-py-doc: For full information please refer to the NAG Library document for g02hl https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hlf.html .. _g02hl-py2-py-parameters: **Parameters** **ucv** : callable (u, ud, w, wd) = ucv(t, data=None) :math:`\mathrm{ucv}` must return the values of the functions :math:`u` and :math:`w` and their derivatives for a given value of its argument. **Parameters** **t** : float The argument for which the functions :math:`u` and :math:`w` must be evaluated. **data** : arbitrary, optional, modifiable in place User-communication data for callback functions. **Returns** **u** : float The value of the :math:`u` function at the point :math:`\mathrm{t}`. **ud** : float The value of the derivative of the :math:`u` function at the point :math:`\mathrm{t}`. **w** : float The value of the :math:`w` function at the point :math:`\mathrm{t}`. **wd** : float The value of the derivative of the :math:`w` function at the point :math:`\mathrm{t}`. **indm** : int Indicates which form of the function :math:`v` will be used. :math:`\mathrm{indm} = 1` :math:`v = 1`. :math:`\mathrm{indm}\neq 1` :math:`v = u`. **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **a** : float, array-like, shape :math:`\left(m\times \left(m+1\right)/2\right)` An initial estimate of the lower triangular real matrix :math:`A`. Only the lower triangular elements must be given and these should be stored row-wise in the array. The diagonal elements must be :math:`\text{}\neq 0`, and in practice will usually be :math:`\text{} > 0`. If the magnitudes of the columns of :math:`X` are of the same order, the identity matrix will often provide a suitable initial value for :math:`A`. If the columns of :math:`X` are of different magnitudes, the diagonal elements of the initial value of :math:`A` should be approximately inversely proportional to the magnitude of the columns of :math:`X`. **theta** : float, array-like, shape :math:`\left(m\right)` An initial estimate of the location parameter, :math:`\theta_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,m`. In many cases an initial estimate of :math:`\theta_{\textit{j}} = 0`, for :math:`\textit{j} = 1,2,\ldots,m`, will be adequate. Alternatively medians may be used as given by :meth:`univar.robust_1var_median <naginterfaces.library.univar.robust_1var_median>`. **bl** : float, optional The magnitude of the bound for the off-diagonal elements of :math:`S_k`, :math:`BL`. **bd** : float, optional The magnitude of the bound for the diagonal elements of :math:`S_k`, :math:`BD`. **maxit** : int, optional The maximum number of iterations that will be used during the calculation of :math:`A`. **nitmon** : int, optional Indicates the amount of information on the iteration that is printed. :math:`\mathrm{nitmon} > 0` The value of :math:`A`, :math:`\theta` and :math:`\delta` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hlf.html#accuracy>`__) will be printed at the first and every :math:`\mathrm{nitmon}` iterations. :math:`\mathrm{nitmon}\leq 0` No iteration monitoring is printed. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see :class:`~naginterfaces.base.utils.FileObjManager`). **tol** : float, optional The relative precision for the final estimates of the covariance matrix. Iteration will stop when maximum :math:`\delta` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hlf.html#accuracy>`__) is less than :math:`\mathrm{tol}`. **data** : arbitrary, optional User-communication data for callback functions. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **cov** : float, ndarray, shape :math:`\left(m\times \left(m+1\right)/2\right)` Contains a robust estimate of the covariance matrix, :math:`C`. The upper triangular part of the matrix :math:`C` is stored packed by columns (lower triangular stored by rows), :math:`C_{{ij}}` is returned in :math:`\mathrm{cov}[\left(j\times \left(j-1\right)/2+i\right)-1]`, :math:`i\leq j`. **a** : float, ndarray, shape :math:`\left(m\times \left(m+1\right)/2\right)` The lower triangular elements of the inverse of the matrix :math:`A`, stored row-wise. **wt** : float, ndarray, shape :math:`\left(n\right)` :math:`\mathrm{wt}[\textit{i}-1]` contains the weights, :math:`\textit{wt}_{\textit{i}} = u\left(\left\lVert z_{\textit{i}}\right\rVert_2\right)`, for :math:`\textit{i} = 1,2,\ldots,n`. **theta** : float, ndarray, shape :math:`\left(m\right)` Contains the robust estimate of the location parameter, :math:`\theta_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,m`. **nit** : int The number of iterations performed. .. _g02hl-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq m`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{bd} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{bd} > 0.0`. (`errno` :math:`2`) On entry, :math:`\mathrm{bl} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{bl} > 0.0`. (`errno` :math:`2`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} > 0`. (`errno` :math:`2`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol} > 0.0`. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and the :math:`i`\ th diagonal element of :math:`A` is :math:`0`. Constraint: all diagonal elements of :math:`A` must be non-zero. (`errno` :math:`3`) On entry, a variable has a constant value, i.e., all elements in column :math:`\langle\mathit{\boldsymbol{value}}\rangle` of :math:`\mathrm{x}` are identical. (`errno` :math:`4`) :math:`w` value returned by :math:`\mathrm{ucv} < 0.0`: :math:`w\left(\langle\mathit{\boldsymbol{value}}\rangle\right) = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{w}\geq 0.0`. (`errno` :math:`4`) :math:`\textit{u}` value returned by :math:`\mathrm{ucv} < 0.0`: :math:`u\left(\langle\mathit{\boldsymbol{value}}\rangle\right) = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{u}\geq 0.0`. (`errno` :math:`5`) Iterations to calculate weights failed to converge. **Warns** **NagAlgorithmicWarning** (`errno` :math:`6`) The sum :math:`\textit{D}_2` is zero. Try either a larger initial estimate of :math:`A` or make :math:`u` and :math:`w` less strict. (`errno` :math:`6`) The sum :math:`\textit{D}_3` is zero. Try either a larger initial estimate of :math:`A` or make :math:`u` and :math:`w` less strict. (`errno` :math:`6`) The sum :math:`\textit{D}_1` is zero. Try either a larger initial estimate of :math:`A` or make :math:`u` and :math:`w` less strict. .. _g02hl-py2-py-notes: **Notes** For a set of :math:`n` observations on :math:`m` variables in a matrix :math:`X`, a robust estimate of the covariance matrix, :math:`C`, and a robust estimate of location, :math:`\theta`, are given by: .. math:: C = \tau^2\left(A^\mathrm{T}A\right)^{-1}\text{,} where :math:`\tau^2` is a correction factor and :math:`A` is a lower triangular matrix found as the solution to the following equations. .. math:: z_i = A\left(x_i-\theta \right) .. math:: \frac{1}{n}\sum_{{i = 1}}^nw\left(\left\lVert z_i\right\rVert_2\right)z_i = 0 and .. math:: \frac{1}{n}\sum_{{i = 1}}^nu\left(\left\lVert z_i\right\rVert_2\right)z_iz_i^\mathrm{T}-v\left(\left\lVert z_i\right\rVert_2\right)I = 0\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+----------------------------------------------------------------------------------------------------------+ |where|:math:`x_i` is a vector of length :math:`m` containing the elements of the :math:`i`\ th row of :math:`X`,| +-----+----------------------------------------------------------------------------------------------------------+ | |:math:`z_i` is a vector of length :math:`m`, | +-----+----------------------------------------------------------------------------------------------------------+ | |:math:`I` is the identity matrix and :math:`0` is the zero matrix, | +-----+----------------------------------------------------------------------------------------------------------+ |and |:math:`w` and :math:`u` are suitable functions. | +-----+----------------------------------------------------------------------------------------------------------+ ``robustm_corr_user_deriv`` covers two situations: (i) :math:`v\left(t\right) = 1` for all :math:`t`, (#) :math:`v\left(t\right) = u\left(t\right)`. The robust covariance matrix may be calculated from a weighted sum of squares and cross-products matrix about :math:`\theta` using weights :math:`\textit{wt}_i = u\left(\left\lVert z_i\right\rVert \right)`. In case \(1) a divisor of :math:`n` is used and in case \(2) a divisor of :math:`\sum_{{i = 1}}^n\textit{wt}_i` is used. If :math:`w\left(.\right) = \sqrt{u\left(.\right)}`, then the robust covariance matrix can be calculated by scaling each row of :math:`X` by :math:`\sqrt{\textit{wt}_i}` and calculating an unweighted covariance matrix about :math:`\theta`. In order to make the estimate asymptotically unbiased under a Normal model a correction factor, :math:`\tau^2`, is needed. The value of the correction factor will depend on the functions employed (see Huber (1981) and Marazzi (1987)). ``robustm_corr_user_deriv`` finds :math:`A` using the iterative procedure as given by Huber. .. math:: A_k = \left(S_k+I\right)A_{{k-1}} and .. math:: \theta_{j_k} = \frac{{b_j}}{D_1}+\theta_{j_{{k-1}}}\text{,} where :math:`S_k = \left(s_{{jl}}\right)`, for :math:`\textit{l} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m`, is a lower triangular matrix such that: .. math:: s_{{jl}} = \left\{\begin{array}{ll}-\mathrm{min}\left[\mathrm{max}\left(h_{{jl}}/D_3,-\textit{BL}\right),\textit{BL}\right]\text{,}&j > l\\&\\-\mathrm{min}\left[\mathrm{max}\left(\left(h_{{jj}}/\left(2D_3-D_4/D_2\right)\right),-\textit{BD}\right),\textit{BD}\right]\text{,}&j = l\end{array}\right. \text{,} where :math:`D_1 = \sum_{{i = 1}}^n\left\{w\left(\left\lVert z_i\right\rVert_2\right)+\frac{1}{m}w^{\prime }\left(\left\lVert z_i\right\rVert_2\right)\left\lVert z_i\right\rVert_2\right\}` :math:`D_2 = \sum_{{i = 1}}^n\left\{\frac{1}{p}\left(u^{\prime }\left(\left\lVert z_i\right\rVert_2\right)\left\lVert z_i\right\rVert_2+2u\left(\left\lVert z_i\right\rVert_2\right)\right)\left\lVert z_i\right\rVert_2-v^{\prime }\left(\left\lVert z_i\right\rVert_2\right)\right\}\left\lVert z_i\right\rVert_2` :math:`D_3 = \frac{1}{{m+2}}\sum_{{i = 1}}^n\left\{\frac{1}{m}\left(u^{\prime }\left(\left\lVert z_i\right\rVert_2\right)\left\lVert z_i\right\rVert_2+2u\left(\left\lVert z_i\right\rVert_2\right)\right)+u\left(\left\lVert z_i\right\rVert_2\right)\right\}\left\lVert z_i\right\rVert_2^2` :math:`D_4 = \sum_{{i = 1}}^n\left\{\frac{1}{m}u\left(\left\lVert z_i\right\rVert_2\right)\left\lVert z_i\right\rVert_2^2-v\left(\left\lVert z_i\right\rVert_2^2\right)\right\}` :math:`h_{{jl}} = \sum_{{i = 1}}^nu\left(\left\lVert z_i\right\rVert_2\right)z_{{ij}}z_{{il}}`, for :math:`j > l` :math:`h_{{jj}} = \sum_{{i = 1}}^nu\left(\left\lVert z_i\right\rVert_2\right)\left(z_{{ij}}^2-\left\lVert z_{{ij}}\right\rVert_2^2/m\right)` :math:`b_j = \sum_{{i = 1}}^nw\left(\left\lVert z_i\right\rVert_2\right)\left(x_{{ij}}-b_j\right)` and :math:`\textit{BD}` and :math:`\textit{BL}` are suitable bounds. ``robustm_corr_user_deriv`` is based on routines in ROBETH; see Marazzi (1987). .. _g02hl-py2-py-references: **References** Huber, P J, 1981, `Robust Statistics`, Wiley Marazzi, A, 1987, `Weights for bounded influence regression in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 3, Institut Universitaire de Médecine Sociale et Préventive, Lausanne """ raise NotImplementedError
[docs]def robustm_corr_user(ucv, indm, x, a, theta, bl=0.9, bd=0.9, maxit=150, nitmon=0, tol=5e-5, data=None, io_manager=None): r""" ``robustm_corr_user`` computes a robust estimate of the covariance matrix for user-supplied weight functions. The derivatives of the weight functions are not required. .. _g02hm-py2-py-doc: For full information please refer to the NAG Library document for g02hm https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hmf.html .. _g02hm-py2-py-parameters: **Parameters** **ucv** : callable (u, w) = ucv(t, data=None) :math:`\mathrm{ucv}` must return the values of the functions :math:`u` and :math:`w` for a given value of its argument. **Parameters** **t** : float The argument for which the functions :math:`u` and :math:`w` must be evaluated. **data** : arbitrary, optional, modifiable in place User-communication data for callback functions. **Returns** **u** : float The value of the :math:`u` function at the point :math:`\mathrm{t}`. **w** : float The value of the :math:`w` function at the point :math:`\mathrm{t}`. **indm** : int Indicates which form of the function :math:`v` will be used. :math:`\mathrm{indm} = 1` :math:`v = 1`. :math:`\mathrm{indm}\neq 1` :math:`v = u`. **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **a** : float, array-like, shape :math:`\left(m\times \left(m+1\right)/2\right)` An initial estimate of the lower triangular real matrix :math:`A`. Only the lower triangular elements must be given and these should be stored row-wise in the array. The diagonal elements must be :math:`\text{}\neq 0`, and in practice will usually be :math:`\text{} > 0`. If the magnitudes of the columns of :math:`X` are of the same order, the identity matrix will often provide a suitable initial value for :math:`A`. If the columns of :math:`X` are of different magnitudes, the diagonal elements of the initial value of :math:`A` should be approximately inversely proportional to the magnitude of the columns of :math:`X`. **theta** : float, array-like, shape :math:`\left(m\right)` An initial estimate of the location parameter, :math:`\theta_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,m`. In many cases an initial estimate of :math:`\theta_{\textit{j}} = 0`, for :math:`\textit{j} = 1,2,\ldots,m`, will be adequate. Alternatively medians may be used as given by :meth:`univar.robust_1var_median <naginterfaces.library.univar.robust_1var_median>`. **bl** : float, optional The magnitude of the bound for the off-diagonal elements of :math:`S_k`, :math:`BL`. **bd** : float, optional The magnitude of the bound for the diagonal elements of :math:`S_k`, :math:`BD`. **maxit** : int, optional The maximum number of iterations that will be used during the calculation of :math:`A`. **nitmon** : int, optional Indicates the amount of information on the iteration that is printed. :math:`\mathrm{nitmon} > 0` The value of :math:`A`, :math:`\theta` and :math:`\delta` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hmf.html#accuracy>`__) will be printed at the first and every :math:`\mathrm{nitmon}` iterations. :math:`\mathrm{nitmon}\leq 0` No iteration monitoring is printed. **tol** : float, optional The relative precision for the final estimate of the covariance matrix. Iteration will stop when maximum :math:`\delta` (see `Accuracy <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02hmf.html#accuracy>`__) is less than :math:`\mathrm{tol}`. **data** : arbitrary, optional User-communication data for callback functions. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **cov** : float, ndarray, shape :math:`\left(m\times \left(m+1\right)/2\right)` A robust estimate of the covariance matrix, :math:`C`. The upper triangular part of the matrix :math:`C` is stored packed by columns (lower triangular stored by rows), that is :math:`C_{{ij}}` is returned in :math:`\mathrm{cov}[j\times \left(j-1\right)/2+i-1]`, :math:`i\leq j`. **a** : float, ndarray, shape :math:`\left(m\times \left(m+1\right)/2\right)` The lower triangular elements of the inverse of the matrix :math:`A`, stored row-wise. **wt** : float, ndarray, shape :math:`\left(n\right)` :math:`\mathrm{wt}[\textit{i}-1]` contains the weights, :math:`\textit{wt}_{\textit{i}} = u\left(\left\lVert z_{\textit{i}}\right\rVert_2\right)`, for :math:`\textit{i} = 1,2,\ldots,n`. **theta** : float, ndarray, shape :math:`\left(m\right)` Contains the robust estimate of the location parameter, :math:`\theta_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,m`. **nit** : int The number of iterations performed. .. _g02hm-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq m`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{bd} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{bd} > 0.0`. (`errno` :math:`2`) On entry, :math:`\mathrm{bl} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{bl} > 0.0`. (`errno` :math:`2`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} > 0`. (`errno` :math:`2`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol} > 0.0`. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and the :math:`i`\ th diagonal element of :math:`A` is :math:`0`. Constraint: all diagonal elements of :math:`A` must be non-zero. (`errno` :math:`3`) On entry, a variable has a constant value, i.e., all elements in column :math:`\langle\mathit{\boldsymbol{value}}\rangle` of :math:`\mathrm{x}` are identical. (`errno` :math:`4`) :math:`w` value returned by :math:`\mathrm{ucv} < 0.0`: :math:`w\left(\langle\mathit{\boldsymbol{value}}\rangle\right) = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{w}\geq 0.0`. (`errno` :math:`4`) :math:`\textit{u}` value returned by :math:`\mathrm{ucv} < 0.0`: :math:`u\left(\langle\mathit{\boldsymbol{value}}\rangle\right) = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{u}\geq 0.0`. (`errno` :math:`5`) Iterations to calculate weights failed to converge. **Warns** **NagAlgorithmicWarning** (`errno` :math:`6`) The sum :math:`D_2` is zero. Try either a larger initial estimate of :math:`A` or make :math:`u` and :math:`w` less strict. (`errno` :math:`6`) The sum :math:`D_1` is zero. Try either a larger initial estimate of :math:`A` or make :math:`u` and :math:`w` less strict. .. _g02hm-py2-py-notes: **Notes** For a set of :math:`n` observations on :math:`m` variables in a matrix :math:`X`, a robust estimate of the covariance matrix, :math:`C`, and a robust estimate of location, :math:`\theta`, are given by .. math:: C = \tau^2\left(A^\mathrm{T}A\right)^{-1}\text{,} where :math:`\tau^2` is a correction factor and :math:`A` is a lower triangular matrix found as the solution to the following equations. .. math:: z_i = A\left(x_i-\theta \right) .. math:: \frac{1}{n}\sum_{{i = 1}}^nw\left(\left\lVert z_i\right\rVert_2\right)z_i = 0 and .. math:: \frac{1}{n}\sum_{{i = 1}}^nu\left(\left\lVert z_i\right\rVert_2\right)z_iz_i^\mathrm{T}-v\left(\left\lVert z_i\right\rVert_2\right)I = 0\text{,} .. rst-class:: nag-rules-none nag-align-left +-----+----------------------------------------------------------------------------------------------------------+ |where|:math:`x_i` is a vector of length :math:`m` containing the elements of the :math:`i`\ th row of :math:`X`,| +-----+----------------------------------------------------------------------------------------------------------+ | |:math:`z_i` is a vector of length :math:`m`, | +-----+----------------------------------------------------------------------------------------------------------+ | |:math:`I` is the identity matrix and :math:`0` is the zero matrix. | +-----+----------------------------------------------------------------------------------------------------------+ |and |:math:`w` and :math:`u` are suitable functions. | +-----+----------------------------------------------------------------------------------------------------------+ ``robustm_corr_user`` covers two situations: (i) :math:`v\left(t\right) = 1` for all :math:`t`, (#) :math:`v\left(t\right) = u\left(t\right)`. The robust covariance matrix may be calculated from a weighted sum of squares and cross-products matrix about :math:`\theta` using weights :math:`\textit{wt}_i = u\left(\left\lVert z_i\right\rVert \right)`. In case \(1) a divisor of :math:`n` is used and in case \(2) a divisor of :math:`\sum_{{i = 1}}^n\textit{wt}_i` is used. If :math:`w\left(.\right) = \sqrt{u\left(.\right)}`, then the robust covariance matrix can be calculated by scaling each row of :math:`X` by :math:`\sqrt{\textit{wt}_i}` and calculating an unweighted covariance matrix about :math:`\theta`. In order to make the estimate asymptotically unbiased under a Normal model a correction factor, :math:`\tau^2`, is needed. The value of the correction factor will depend on the functions employed (see Huber (1981) and Marazzi (1987)). ``robustm_corr_user`` finds :math:`A` using the iterative procedure as given by Huber; see Huber (1981). .. math:: A_k = \left(S_k+I\right)A_{{k-1}} and .. math:: \theta_{j_k} = \frac{{b_j}}{D_1}+\theta_{j_{{k-1}}}\text{,} where :math:`S_k = \left(s_{{jl}}\right)`, for :math:`\textit{l} = 1,2,\ldots,m`, for :math:`\textit{j} = 1,2,\ldots,m` is a lower triangular matrix such that .. math:: s_{{jl}} = \left\{\begin{array}{ll}-\mathrm{min}\left[\mathrm{max}\left(h_{{jl}}/D_2,-\textit{BL}\right),\textit{BL}\right]\text{,}&j > l\\&\\-\mathrm{min}\left[\mathrm{max}\left(\frac{1}{2}\left(h_{{jj}}/D_2-1\right),-\textit{BD}\right),\textit{BD}\right]\text{,}&j = l\end{array}\right. \text{,} where :math:`D_1 = \sum_{{i = 1}}^nw\left(\left\lVert z_i\right\rVert_2\right)` :math:`D_2 = \sum_{{i = 1}}^nu\left(\left\lVert z_i\right\rVert_2\right)` :math:`h_{{jl}} = \sum_{{i = 1}}^nu\left(\left\lVert z_i\right\rVert_2\right)z_{{ij}}z_{{il}}`, for :math:`j\geq l` :math:`b_j = \sum_{{i = 1}}^nw\left(\left\lVert z_i\right\rVert_2\right)\left(x_{{ij}}-b_j\right)` and :math:`\textit{BD}` and :math:`\textit{BL}` are suitable bounds. The value of :math:`\tau` may be chosen so that :math:`C` is unbiased if the observations are from a given distribution. ``robustm_corr_user`` is based on routines in ROBETH; see Marazzi (1987). .. _g02hm-py2-py-references: **References** Huber, P J, 1981, `Robust Statistics`, Wiley Marazzi, A, 1987, `Weights for bounded influence regression in ROBETH`, Cah. Rech. Doc. IUMSP, No. 3 ROB 3, Institut Universitaire de Médecine Sociale et Préventive, Lausanne """ raise NotImplementedError
[docs]def mixeff_reml(dat, levels, yvid, cwid, fvid, fint, rvid, nvpr, vpr, rint, svid, gamma, lb, maxit=-1, tol=0.0, io_manager=None): r""" ``mixeff_reml`` fits a linear mixed effects regression model using restricted maximum likelihood (REML). .. deprecated:: 27.0.0.0 ``mixeff_reml`` is deprecated. Please use :meth:`lmm_init` followed by :meth:`lmm_fit` instead. See also the :ref:`Replacement Calls <replace>` document. .. _g02ja-py2-py-doc: For full information please refer to the NAG Library document for g02ja https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jaf.html .. _g02ja-py2-py-parameters: **Parameters** **dat** : float, array-like, shape :math:`\left(n, \textit{ncol}\right)` Array containing all of the data. For the :math:`i`\ th observation: :math:`\mathrm{dat}[i-1,\mathrm{yvid}-1]` holds the dependent variable, :math:`y`; if :math:`\mathrm{cwid}\neq 0`, :math:`\mathrm{dat}[i-1,\mathrm{cwid}-1]` holds the case weights; if :math:`\mathrm{svid}\neq 0`, :math:`\mathrm{dat}[i-1,\mathrm{svid}-1]` holds the subject variable. The remaining columns hold the values of the independent variables. **levels** : int, array-like, shape :math:`\left(\textit{ncol}\right)` :math:`\mathrm{levels}[i-1]` contains the number of levels associated with the :math:`i`\ th variable of the data matrix :math:`\mathrm{dat}`. If this variable is continuous or binary (i.e., only takes the values zero or one) then :math:`\mathrm{levels}[i-1]` should be :math:`1`; if the variable is discrete then :math:`\mathrm{levels}[i-1]` is the number of levels associated with it and :math:`\mathrm{dat}[\textit{j}-1,i-1]` is assumed to take the values :math:`1` to :math:`\mathrm{levels}[i-1]`, for :math:`\textit{j} = 1,2,\ldots,n`. **yvid** : int The column of :math:`\mathrm{dat}` holding the dependent, :math:`y`, variable. **cwid** : int The column of :math:`\mathrm{dat}` holding the case weights. If :math:`\mathrm{cwid} = 0`, no weights are used. **fvid** : int, array-like, shape :math:`\left(\textit{nfv}\right)` The columns of the data matrix :math:`\mathrm{dat}` holding the fixed independent variables with :math:`\mathrm{fvid}[i-1]` holding the column number corresponding to the :math:`i`\ th fixed variable. **fint** : int Flag indicating whether a fixed intercept is included (:math:`\mathrm{fint} = 1`). **rvid** : int, array-like, shape :math:`\left(\textit{nrv}\right)` The columns of the data matrix :math:`\mathrm{dat}` holding the random independent variables with :math:`\mathrm{rvid}[i-1]` holding the column number corresponding to the :math:`i`\ th random variable. **nvpr** : int If :math:`\mathrm{rint} = 1` and :math:`\mathrm{svid}\neq 0`, :math:`\mathrm{nvpr}` is the number of variance components being :math:`\text{estimated}-2`, (:math:`g-1`), else :math:`\mathrm{nvpr} = g`. If :math:`\textit{nrv} = 0`, :math:`\mathrm{nvpr}` is not referenced. **vpr** : int, array-like, shape :math:`\left(\textit{nrv}\right)` :math:`\mathrm{vpr}[i-1]` holds a flag indicating the variance of the :math:`i`\ th random variable. The variance of the :math:`i`\ th random variable is :math:`\sigma_j^2`, where :math:`j = \mathrm{vpr}[i-1]+1` if :math:`\mathrm{rint} = 1` and :math:`\mathrm{svid} \neq 0` and :math:`j = \mathrm{vpr}[i-1]` otherwise. Random variables with the same value of :math:`j` are assumed to be taken from the same distribution. **rint** : int Flag indicating whether a random intercept is included (:math:`\mathrm{rint} = 1`). If :math:`\mathrm{svid} = 0`, :math:`\mathrm{rint}` is not referenced. **svid** : int The column of :math:`\mathrm{dat}` holding the subject variable. If :math:`\mathrm{svid} = 0`, no subject variable is used. Specifying a subject variable is equivalent to specifying the interaction between that variable and all of the random-effects. Letting the notation :math:`Z_1\times Z_S` denote the interaction between variables :math:`Z_1` and :math:`Z_S`, fitting a model with :math:`\mathrm{rint} = 0`, random-effects :math:`Z_1+Z_2` and subject variable :math:`Z_S` is equivalent to fitting a model with random-effects :math:`Z_1\times Z_S+Z_2\times Z_S` and no subject variable. If :math:`\mathrm{rint} = 1` the model is equivalent to fitting :math:`Z_S+Z_1\times Z_S+Z_2\times Z_S` and no subject variable. **gamma** : float, array-like, shape :math:`\left(\mathrm{nvpr}+2\right)` Holds the initial values of the variance components, :math:`\gamma_0`, with :math:`\mathrm{gamma}[\textit{i}-1]` the initial value for :math:`\sigma_{\textit{i}}^2/\sigma_R^2`, for :math:`\textit{i} = 1,2,\ldots,g`. If :math:`\mathrm{rint} = 1` and :math:`\mathrm{svid}\neq 0`, :math:`g = \mathrm{nvpr}+1`, else :math:`g = \mathrm{nvpr}`. If :math:`\mathrm{gamma}[0] = {-1.0}`, the remaining elements of :math:`\mathrm{gamma}` are ignored and the initial values for the variance components are estimated from the data using MIVQUE0. **lb** : int The size of the array :math:`\mathrm{b}`. **maxit** : int, optional The maximum number of iterations. If :math:`\mathrm{maxit} < 0`, the default value of :math:`100` is used. If :math:`\mathrm{maxit} = 0`, the parameter estimates :math:`\left(\beta, \nu \right)` and corresponding standard errors are calculated based on the value of :math:`\gamma_0` supplied in :math:`\mathrm{gamma}`. **tol** : float, optional The tolerance used to assess convergence. If :math:`\mathrm{tol}\leq 0.0`, the default value of :math:`\epsilon^{0.7}` is used, where :math:`\epsilon` is the machine precision. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **gamma** : float, ndarray, shape :math:`\left(\mathrm{nvpr}+2\right)` :math:`\mathrm{gamma}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,g`, holds the final estimate of :math:`\sigma_{\textit{i}}^2` and :math:`\mathrm{gamma}[g]` holds the final estimate for :math:`\sigma_R^2`. **nff** : int The number of fixed effects estimated (i.e., the number of columns, :math:`p`, in the design matrix :math:`X`). **nrf** : int The number of random effects estimated (i.e., the number of columns, :math:`q`, in the design matrix :math:`Z`). **df** : int The degrees of freedom. **reml** : float :math:`-2l_R\left(\hat{\gamma }\right)` where :math:`l_R` is the log of the restricted maximum likelihood calculated at :math:`\hat{\gamma }`, the estimated variance components returned in :math:`\mathrm{gamma}`. **b** : float, ndarray, shape :math:`\left(\mathrm{lb}\right)` The parameter estimates, :math:`\left(\beta, \nu \right)`, with the first :math:`\mathrm{nff}` elements of :math:`\mathrm{b}` containing the fixed effect parameter estimates, :math:`\beta` and the next :math:`\mathrm{nrf}` elements of :math:`\mathrm{b}` containing the random effect parameter estimates, :math:`\nu`. **Fixed effects** If :math:`\mathrm{fint} = 1`, :math:`\mathrm{b}[0]` contains the estimate of the fixed intercept. Let :math:`L_i` denote the number of levels associated with the :math:`i`\ th fixed variable, that is :math:`L_i = \mathrm{levels}[ \mathrm{fvid}[i-1] -1]`. Define if :math:`\mathrm{fint} = 1`, :math:`F_1 = 2` else if :math:`\mathrm{fint} = 0`, :math:`F_1 = 1`; :math:`F_{{i+1}} = F_i+\mathrm{max}\left({L_i-1}, 1\right)`, :math:`i\geq 1`. Then for :math:`i = 1,2,\ldots,\textit{nfv}`: if :math:`L_i > 1`, :math:`\mathrm{b}[ F_i+\textit{j}-2 -1]` contains the parameter estimate for the :math:`\textit{j}`\ th level of the :math:`i`\ th fixed variable, for :math:`\textit{j} = 2,3,\ldots,L_i`; if :math:`L_i\leq 1`, :math:`\mathrm{b}[F_i-1]` contains the parameter estimate for the :math:`i`\ th fixed variable. **Random effects** Redefining :math:`L_i` to denote the number of levels associated with the :math:`i`\ th random variable, that is :math:`L_i = \mathrm{levels}[ \mathrm{rvid}[i-1] -1]`. Define if :math:`\mathrm{rint} = 1`, :math:`R_1 = 2` else if :math:`\mathrm{rint} = 0`, :math:`R_1 = 1`; :math:`R_{{i+1}} = R_i+L_i`, :math:`i\geq 1`. Then for :math:`i = 1,2,\ldots,\textit{nrv}`: if :math:`\mathrm{svid} = 0`, if :math:`L_i > 1`, :math:`\mathrm{b}[ \mathrm{nff} + R_i +\textit{j}-1 -1]` contains the parameter estimate for the :math:`\textit{j}`\ th level of the :math:`i`\ th random variable, for :math:`\textit{j} = 1,2,\ldots,L_i`; if :math:`L_i\leq 1`, :math:`\mathrm{b}[ \mathrm{nff} + R_i -1]` contains the parameter estimate for the :math:`i`\ th random variable; if :math:`\mathrm{svid}\neq 0`, let :math:`L_S` denote the number of levels associated with the subject variable, that is :math:`L_S = \mathrm{levels}[ \mathrm{svid} -1]`; if :math:`L_i > 1`, :math:`\mathrm{b}[ \mathrm{nff} + \left(\textit{s}-1\right) L_S + R_i + \textit{j} - 1 -1]` contains the parameter estimate for the interaction between the :math:`\textit{s}`\ th level of the subject variable and the :math:`\textit{j}`\ th level of the :math:`i`\ th random variable, for :math:`\textit{j} = 1,2,\ldots,L_i`, for :math:`\textit{s} = 1,2,\ldots,L_S`; if :math:`L_i\leq 1`, :math:`\mathrm{b}[ \mathrm{nff} + \left(\textit{s}-1\right) L_S + R_i -1]` contains the parameter estimate for the interaction between the :math:`\textit{s}`\ th level of the subject variable and the :math:`i`\ th random variable, for :math:`\textit{s} = 1,2,\ldots,L_S`; if :math:`\mathrm{rint} = 1`, :math:`\mathrm{b}[\mathrm{nff}]` contains the estimate of the random intercept. **se** : float, ndarray, shape :math:`\left(\mathrm{lb}\right)` The standard errors of the parameter estimates given in :math:`\mathrm{b}`. **warn** : int Is set to :math:`1` if a variance component was estimated to be a negative value during the fitting process. Otherwise :math:`\mathrm{warn}` is set to :math:`0`. If :math:`\mathrm{warn} = 1`, the negative estimate is set to zero and the estimation process allowed to continue. .. _g02ja-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: number of observations with nonzero weights must be greater than one. (`errno` :math:`1`) On entry, :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ncol} \geq 1`. (`errno` :math:`1`) On entry, :math:`\textit{nfv} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \textit{nfv} < \textit{ncol}`. (`errno` :math:`1`) On entry, :math:`\textit{nrv} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \textit{nrv} < \textit{ncol}` and :math:`\textit{nrv}+\mathrm{rint} > 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{nvpr} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{nrv} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \mathrm{nvpr}\leq \textit{nrv}` and (:math:`\textit{nrv} \neq 0` or :math:`\mathrm{nvpr} \geq 1`). (`errno` :math:`1`) On entry, :math:`\mathrm{yvid} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{yvid}\leq \textit{ncol}`. (`errno` :math:`1`) On entry, :math:`\mathrm{svid} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \mathrm{svid}\leq \textit{ncol}`. (`errno` :math:`1`) On entry, :math:`\mathrm{cwid} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \mathrm{cwid}\leq \textit{ncol}` and any supplied weights must be :math:`\text{}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{fint} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{fint} = 0` or :math:`1`. (`errno` :math:`1`) On entry, :math:`\mathrm{rint} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{rint} = 0` or :math:`1`. (`errno` :math:`1`) On entry, :math:`\mathrm{lb}` too small: :math:`\mathrm{lb} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`2`) On entry, :math:`\mathrm{levels}[\textit{i}] < 1`, for at least one :math:`\textit{i}`. (`errno` :math:`2`) On entry, :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{fvid}[i]\leq \textit{ncol}`, for all :math:`i`. (`errno` :math:`2`) On entry, :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{rvid}[i]\leq \textit{ncol}`, for all :math:`i`. (`errno` :math:`2`) On entry, :math:`\mathrm{nvpr} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{vpr}[i]\leq \mathrm{nvpr}`, for all :math:`i`. (`errno` :math:`2`) On entry, invalid data: categorical variable with value greater than that specified in :math:`\mathrm{levels}`. (`errno` :math:`2`) On entry, :math:`\mathrm{gamma}[i] < 0.0`, for at least one :math:`i`. (`errno` :math:`3`) Degrees of freedom :math:`< 1`: :math:`\mathrm{df} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`4`) Routine failed to converge to specified tolerance: :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`4`) Routine failed to converge in :math:`\mathrm{maxit}` iterations: :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. .. _g02ja-py2-py-notes: **Notes** ``mixeff_reml`` fits a model of the form: .. math:: y = X\beta +Z\nu +\epsilon where :math:`y` is a vector of :math:`n` observations on the dependent variable, :math:`X` is a known :math:`n\times p` design matrix for the fixed independent variables, :math:`\beta` is a vector of length :math:`p` of unknown `fixed effects`, :math:`Z` is a known :math:`n\times q` design matrix for the random independent variables, :math:`\nu` is a vector of length :math:`q` of unknown `random effects`, and :math:`\epsilon` is a vector of length :math:`n` of unknown random errors. Both :math:`\nu` and :math:`\epsilon` are assumed to have a Gaussian distribution with expectation zero and .. math:: \mathrm{Var}\left[\begin{array}{c}\nu \\\epsilon \end{array}\right] = \left[\begin{array}{cc}G&0\\0&R\end{array}\right] where :math:`R = \sigma_R^2I`, :math:`I` is the :math:`n\times n` identity matrix and :math:`G` is a diagonal matrix. It is assumed that the random variables, :math:`Z`, can be subdivided into :math:`g\leq q` groups with each group being identically distributed with expectations zero and variance :math:`\sigma_i^2`. The diagonal elements of matrix :math:`G`, therefore, take one of the values :math:`\left\{\sigma_i^2:i = 1,2,\ldots,g\right\}`, depending on which group the associated random variable belongs to. The model, therefore, contains three sets of unknowns, the fixed effects, :math:`\beta`, the random effects :math:`\nu` and a vector of :math:`g+1` variance components, :math:`\gamma`, where :math:`\gamma = \left\{\sigma_1^2, \sigma_2^2, \ldots, \sigma_{{g-1}}^2, \sigma_g^2, \sigma_R^2\right\}`. Rather than working directly with :math:`\gamma`, ``mixeff_reml`` uses an iterative process to estimate :math:`\gamma^* = \left\{{\sigma_1^2/\sigma_R^2}, {\sigma_2^2/\sigma_R^2}, \ldots, {\sigma_{{g-1}}^2/\sigma_R^2}, {\sigma_g^2/\sigma_R^2}, 1\right\}`. Due to the iterative nature of the estimation a set of initial values, :math:`\gamma_0`, for :math:`\gamma^*` is required. ``mixeff_reml`` allows these initial values either to be supplied by you or calculated from the data using the minimum variance quadratic unbiased estimators (MIVQUE0) suggested by Rao (1972). ``mixeff_reml`` fits the model using a quasi-Newton algorithm to maximize the restricted log-likelihood function: .. math:: -2l_R = \log\left(\left\lvert V\right\rvert \right)+\left(n-p\right)\log\left(r^{\prime }V^{-1}r\right)+\log\left(\left\lvert X^{\prime }V^{-1}X\right\rvert \right)+\left(n-p\right)\left(1+\log\left(2\pi /\left(n-p\right)\right)\right) where .. math:: V = ZGZ^{\prime }+R\text{, }\quad r = y-Xb\quad \text{ and }\quad b = \left(X^{\prime }V^{-1}X\right)^{-1}X^{\prime }V^{-1}y\text{.} Once the final estimates for :math:`\gamma^*` have been obtained, the value of :math:`\sigma_R^2` is given by: .. math:: \sigma_R^2 = \left(r^{\prime }V^{-1}r\right)/\left(n-p\right)\text{.} Case weights, :math:`W_c`, can be incorporated into the model by replacing :math:`X^{\prime }X` and :math:`Z^{\prime }Z` with :math:`X^{\prime }W_cX` and :math:`Z^{\prime }W_cZ` respectively, for a diagonal weight matrix :math:`W_c`. The log-likelihood, :math:`l_R`, is calculated using the sweep algorithm detailed in Wolfinger `et al.` (1994). .. _g02ja-py2-py-references: **References** Goodnight, J H, 1979, `A tutorial on the SWEEP operator`, The American Statistician (33(3)), 149--158 Harville, D A, 1977, `Maximum likelihood approaches to variance component estimation and to related problems`, JASA (72), 320--340 Rao, C R, 1972, `Estimation of variance and covariance components in a linear model`, J. Am. Stat. Assoc. (67), 112--115 Stroup, W W, 1989, `Predictable functions and prediction space in the mixed model procedure`, Applications of Mixed Models in Agriculture and Related Disciplines (Southern Cooperative Series Bulletin No. 343), 39--48 Wolfinger, R, Tobias, R and Sall, J, 1994, `Computing Gaussian likelihoods and their derivatives for general linear mixed models`, SIAM Sci. Statist. Comput. (15), 1294--1310 """ raise NotImplementedError
[docs]def mixeff_ml(dat, levels, yvid, cwid, fvid, fint, rvid, nvpr, vpr, rint, svid, gamma, lb, maxit=-1, tol=0.0, io_manager=None): r""" ``mixeff_ml`` fits a linear mixed effects regression model using maximum likelihood (ML). .. deprecated:: 27.0.0.0 ``mixeff_ml`` is deprecated. Please use :meth:`lmm_init` followed by :meth:`lmm_fit` instead. See also the :ref:`Replacement Calls <replace>` document. .. _g02jb-py2-py-doc: For full information please refer to the NAG Library document for g02jb https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jbf.html .. _g02jb-py2-py-parameters: **Parameters** **dat** : float, array-like, shape :math:`\left(n, \textit{ncol}\right)` Array containing all of the data. For the :math:`i`\ th observation: :math:`\mathrm{dat}[i-1,\mathrm{yvid}-1]` holds the dependent variable, :math:`y`; if :math:`\mathrm{cwid}\neq 0`, :math:`\mathrm{dat}[i-1,\mathrm{cwid}-1]` holds the case weights; if :math:`\mathrm{svid}\neq 0`, :math:`\mathrm{dat}[i-1,\mathrm{svid}-1]` holds the subject variable. The remaining columns hold the values of the independent variables. **levels** : int, array-like, shape :math:`\left(\textit{ncol}\right)` :math:`\mathrm{levels}[i-1]` contains the number of levels associated with the :math:`i`\ th variable of the data matrix :math:`\mathrm{dat}`. If this variable is continuous or binary (i.e., only takes the values zero or one) then :math:`\mathrm{levels}[i-1]` should be :math:`1`; if the variable is discrete then :math:`\mathrm{levels}[i-1]` is the number of levels associated with it and :math:`\mathrm{dat}[\textit{j}-1,i-1]` is assumed to take the values :math:`1` to :math:`\mathrm{levels}[i-1]`, for :math:`\textit{j} = 1,2,\ldots,n`. **yvid** : int The column of :math:`\mathrm{dat}` holding the dependent, :math:`y`, variable. **cwid** : int The column of :math:`\mathrm{dat}` holding the case weights. If :math:`\mathrm{cwid} = 0`, no weights are used. **fvid** : int, array-like, shape :math:`\left(\textit{nfv}\right)` The columns of the data matrix :math:`\mathrm{dat}` holding the fixed independent variables with :math:`\mathrm{fvid}[i-1]` holding the column number corresponding to the :math:`i`\ th fixed variable. **fint** : int Flag indicating whether a fixed intercept is included (:math:`\mathrm{fint} = 1`). **rvid** : int, array-like, shape :math:`\left(\textit{nrv}\right)` The columns of the data matrix :math:`\mathrm{dat}` holding the random independent variables with :math:`\mathrm{rvid}[i-1]` holding the column number corresponding to the :math:`i`\ th random variable. **nvpr** : int If :math:`\mathrm{rint} = 1` and :math:`\mathrm{svid}\neq 0`, :math:`\mathrm{nvpr}` is the number of variance components being :math:`\text{estimated}-2`, (:math:`g-1`), else :math:`\mathrm{nvpr} = g`. If :math:`\textit{nrv} = 0`, :math:`\mathrm{nvpr}` is not referenced. **vpr** : int, array-like, shape :math:`\left(\textit{nrv}\right)` :math:`\mathrm{vpr}[i-1]` holds a flag indicating the variance of the :math:`i`\ th random variable. The variance of the :math:`i`\ th random variable is :math:`\sigma_j^2`, where :math:`j = \mathrm{vpr}[i-1]+1` if :math:`\mathrm{rint} = 1` and :math:`\mathrm{svid} \neq 0` and :math:`j = \mathrm{vpr}[i-1]` otherwise. Random variables with the same value of :math:`j` are assumed to be taken from the same distribution. **rint** : int Flag indicating whether a random intercept is included (:math:`\mathrm{rint} = 1`). If :math:`\mathrm{svid} = 0`, :math:`\mathrm{rint}` is not referenced. **svid** : int The column of :math:`\mathrm{dat}` holding the subject variable. If :math:`\mathrm{svid} = 0`, no subject variable is used. Specifying a subject variable is equivalent to specifying the interaction between that variable and all of the random-effects. Letting the notation :math:`Z_1\times Z_S` denote the interaction between variables :math:`Z_1` and :math:`Z_S`, fitting a model with :math:`\mathrm{rint} = 0`, random-effects :math:`Z_1+Z_2` and subject variable :math:`Z_S` is equivalent to fitting a model with random-effects :math:`Z_1\times Z_S+Z_2\times Z_S` and no subject variable. If :math:`\mathrm{rint} = 1` the model is equivalent to fitting :math:`Z_S+Z_1\times Z_S+Z_2\times Z_S` and no subject variable. **gamma** : float, array-like, shape :math:`\left(\mathrm{nvpr}+2\right)` Holds the initial values of the variance components, :math:`\gamma_0`, with :math:`\mathrm{gamma}[\textit{i}-1]` the initial value for :math:`\sigma_{\textit{i}}^2/\sigma_R^2`, for :math:`\textit{i} = 1,2,\ldots,g`. If :math:`\mathrm{rint} = 1` and :math:`\mathrm{svid}\neq 0`, :math:`g = \mathrm{nvpr}+1`, else :math:`g = \mathrm{nvpr}`. If :math:`\mathrm{gamma}[0] = {-1.0}`, the remaining elements of :math:`\mathrm{gamma}` are ignored and the initial values for the variance components are estimated from the data using MIVQUE0. **lb** : int The size of the array :math:`\mathrm{b}`. **maxit** : int, optional The maximum number of iterations. If :math:`\mathrm{maxit} < 0`, the default value of :math:`100` is used. If :math:`\mathrm{maxit} = 0`, the parameter estimates :math:`\left(\beta, \nu \right)` and corresponding standard errors are calculated based on the value of :math:`\gamma_0` supplied in :math:`\mathrm{gamma}`. **tol** : float, optional The tolerance used to assess convergence. If :math:`\mathrm{tol}\leq 0.0`, the default value of :math:`\epsilon^{0.7}` is used, where :math:`\epsilon` is the machine precision. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **gamma** : float, ndarray, shape :math:`\left(\mathrm{nvpr}+2\right)` :math:`\mathrm{gamma}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,g`, holds the final estimate of :math:`\sigma_{\textit{i}}^2` and :math:`\mathrm{gamma}[g]` holds the final estimate for :math:`\sigma_R^2`. **nff** : int The number of fixed effects estimated (i.e., the number of columns, :math:`p`, in the design matrix :math:`X`). **nrf** : int The number of random effects estimated (i.e., the number of columns, :math:`q`, in the design matrix :math:`Z`). **df** : int The degrees of freedom. **ml** : float :math:`-2l_R\left(\hat{\gamma }\right)` where :math:`l_R` is the log of the maximum likelihood calculated at :math:`\hat{\gamma }`, the estimated variance components returned in :math:`\mathrm{gamma}`. **b** : float, ndarray, shape :math:`\left(\mathrm{lb}\right)` The parameter estimates, :math:`\left(\beta, \nu \right)`, with the first :math:`\mathrm{nff}` elements of :math:`\mathrm{b}` containing the fixed effect parameter estimates, :math:`\beta` and the next :math:`\mathrm{nrf}` elements of :math:`\mathrm{b}` containing the random effect parameter estimates, :math:`\nu`. **Fixed effects** If :math:`\mathrm{fint} = 1`, :math:`\mathrm{b}[0]` contains the estimate of the fixed intercept. Let :math:`L_i` denote the number of levels associated with the :math:`i`\ th fixed variable, that is :math:`L_i = \mathrm{levels}[ \mathrm{fvid}[i-1] -1]`. Define if :math:`\mathrm{fint} = 1`, :math:`F_1 = 2` else if :math:`\mathrm{fint} = 0`, :math:`F_1 = 1`; :math:`F_{{i+1}} = F_i+\mathrm{max}\left({L_i-1}, 1\right)`, :math:`i\geq 1`. Then for :math:`i = 1,2,\ldots,\textit{nfv}`: if :math:`L_i > 1`, :math:`\mathrm{b}[ F_i+\textit{j}-2 -1]` contains the parameter estimate for the :math:`\textit{j}`\ th level of the :math:`i`\ th fixed variable, for :math:`\textit{j} = 2,3,\ldots,L_i`; if :math:`L_i\leq 1`, :math:`\mathrm{b}[F_i-1]` contains the parameter estimate for the :math:`i`\ th fixed variable. **Random effects** Redefining :math:`L_i` to denote the number of levels associated with the :math:`i`\ th random variable, that is :math:`L_i = \mathrm{levels}[ \mathrm{rvid}[i-1] -1]`. Define if :math:`\mathrm{rint} = 1`, :math:`R_1 = 2` else if :math:`\mathrm{rint} = 0`, :math:`R_1 = 1`; :math:`R_{{i+1}} = R_i+L_i`, :math:`i\geq 1`. Then for :math:`i = 1,2,\ldots,\textit{nrv}`: if :math:`\mathrm{svid} = 0`, if :math:`L_i > 1`, :math:`\mathrm{b}[ \mathrm{nff} + R_i +\textit{j}-1 -1]` contains the parameter estimate for the :math:`\textit{j}`\ th level of the :math:`i`\ th random variable, for :math:`\textit{j} = 1,2,\ldots,L_i`; if :math:`L_i\leq 1`, :math:`\mathrm{b}[ \mathrm{nff} + R_i -1]` contains the parameter estimate for the :math:`i`\ th random variable; if :math:`\mathrm{svid}\neq 0`, let :math:`L_S` denote the number of levels associated with the subject variable, that is :math:`L_S = \mathrm{levels}[ \mathrm{svid} -1]`; if :math:`L_i > 1`, :math:`\mathrm{b}[ \mathrm{nff} + \left(\textit{s}-1\right) L_S + R_i + \textit{j} - 1 -1]` contains the parameter estimate for the interaction between the :math:`\textit{s}`\ th level of the subject variable and the :math:`\textit{j}`\ th level of the :math:`i`\ th random variable, for :math:`\textit{j} = 1,2,\ldots,L_i`, for :math:`\textit{s} = 1,2,\ldots,L_S`; if :math:`L_i\leq 1`, :math:`\mathrm{b}[ \mathrm{nff} + \left(\textit{s}-1\right) L_S + R_i -1]` contains the parameter estimate for the interaction between the :math:`\textit{s}`\ th level of the subject variable and the :math:`i`\ th random variable, for :math:`\textit{s} = 1,2,\ldots,L_S`; if :math:`\mathrm{rint} = 1`, :math:`\mathrm{b}[\mathrm{nff}]` contains the estimate of the random intercept. **se** : float, ndarray, shape :math:`\left(\mathrm{lb}\right)` The standard errors of the parameter estimates given in :math:`\mathrm{b}`. **warn** : int Is set to :math:`1` if a variance component was estimated to be a negative value during the fitting process. Otherwise :math:`\mathrm{warn}` is set to :math:`0`. If :math:`\mathrm{warn} = 1`, the negative estimate is set to zero and the estimation process allowed to continue. .. _g02jb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 1`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: number of observations with nonzero weights must be greater than one. (`errno` :math:`1`) On entry, :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ncol} \geq 1`. (`errno` :math:`1`) On entry, :math:`\textit{nfv} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \textit{nfv} < \textit{ncol}`. (`errno` :math:`1`) On entry, :math:`\textit{nrv} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \textit{nrv} < \textit{ncol}` and :math:`\textit{nrv}+\mathrm{rint} > 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{nvpr} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{nrv} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \mathrm{nvpr}\leq \textit{nrv}` and (:math:`\textit{nrv} \leq 0` or :math:`\mathrm{nvpr} \geq 1`). (`errno` :math:`1`) On entry, :math:`\mathrm{yvid} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{yvid}\leq \textit{ncol}`. (`errno` :math:`1`) On entry, :math:`\mathrm{svid} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \mathrm{svid}\leq \textit{ncol}`. (`errno` :math:`1`) On entry, :math:`\mathrm{cwid} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \mathrm{cwid}\leq \textit{ncol}` and any supplied weights must be :math:`\text{}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{fint} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{fint} = 0` or :math:`1`. (`errno` :math:`1`) On entry, :math:`\mathrm{rint} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{rint} = 0` or :math:`1`. (`errno` :math:`1`) On entry, :math:`\mathrm{lb}` too small: :math:`\mathrm{lb} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`2`) On entry, :math:`\mathrm{levels}[\textit{i}] < 1`, for at least one :math:`\textit{i}`. (`errno` :math:`2`) On entry, :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{fvid}[i]\leq \textit{ncol}`, for all :math:`i`. (`errno` :math:`2`) On entry, :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{rvid}[i]\leq \textit{ncol}`, for all :math:`i`. (`errno` :math:`2`) On entry, :math:`\mathrm{nvpr} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{vpr}[i]\leq \mathrm{nvpr}`, for all :math:`i`. (`errno` :math:`2`) On entry, invalid data: categorical variable with value greater than that specified in :math:`\mathrm{levels}`. (`errno` :math:`2`) On entry, :math:`\mathrm{gamma}[\textit{i}] < 0`, for at least one :math:`\textit{i}`. (`errno` :math:`3`) Degrees of freedom :math:`< 1`: :math:`\mathrm{df} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`4`) Routine failed to converge to specified tolerance: :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`4`) Routine failed to converge in :math:`\mathrm{maxit}` iterations: :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. .. _g02jb-py2-py-notes: **Notes** ``mixeff_ml`` fits a model of the form: .. math:: y = X\beta +Z\nu +\epsilon where :math:`y` is a vector of :math:`n` observations on the dependent variable, :math:`X` is a known :math:`n\times p` design matrix for the fixed independent variables, :math:`\beta` is a vector of length :math:`p` of unknown `fixed effects`, :math:`Z` is a known :math:`n\times q` design matrix for the random independent variables, :math:`\nu` is a vector of length :math:`q` of unknown `random effects`; and :math:`\epsilon` is a vector of length :math:`n` of unknown random errors. Both :math:`\nu` and :math:`\epsilon` are assumed to have a Gaussian distribution with expectation zero and .. math:: \mathrm{Var}\left[\begin{array}{c}\nu \\\epsilon \end{array}\right] = \left[\begin{array}{cc}G&0\\0&R\end{array}\right] where :math:`R = \sigma_R^2I`, :math:`I` is the :math:`n\times n` identity matrix and :math:`G` is a diagonal matrix. It is assumed that the random variables, :math:`Z`, can be subdivided into :math:`g\leq q` groups with each group being identically distributed with expectations zero and variance :math:`\sigma_i^2`. The diagonal elements of matrix :math:`G`, therefore, take one of the values :math:`\left\{\sigma_i^2:i = 1,2,\ldots,g\right\}`, depending on which group the associated random variable belongs to. The model, therefore, contains three sets of unknowns, the fixed effects, :math:`\beta`, the random effects :math:`\nu` and a vector of :math:`g+1` variance components, :math:`\gamma`, where :math:`\gamma = \left\{\sigma_1^2, \sigma_2^2, \ldots, \sigma_{{g-1}}^2, \sigma_g^2, \sigma_R^2\right\}`. Rather than working directly with :math:`\gamma`, ``mixeff_ml`` uses an iterative process to estimate :math:`\gamma^* = \left\{{\sigma_1^2/\sigma_R^2}, {\sigma_2^2/\sigma_R^2}, \ldots, {\sigma_{{g-1}}^2/\sigma_R^2}, {\sigma_g^2/\sigma_R^2}, 1\right\}`. Due to the iterative nature of the estimation a set of initial values, :math:`\gamma_0`, for :math:`\gamma^*` is required. ``mixeff_ml`` allows these initial values either to be supplied by you or calculated from the data using the minimum variance quadratic unbiased estimators (MIVQUE0) suggested by Rao (1972). ``mixeff_ml`` fits the model using a quasi-Newton algorithm to maximize the log-likelihood function: .. math:: -2l_R = \log\left(\left\lvert V\right\rvert \right)+\left(n\right)\log\left(r^{\prime }V^{-1}r\right)+\log\left(2\pi /n\right) where .. math:: V = ZGZ^{\prime }+R\text{, }\quad r = y-Xb\quad \text{ and }\quad b = \left(X^{\prime }V^{-1}X\right)^{-1}X^{\prime }V^{-1}y\text{.} Once the final estimates for :math:`\gamma^*` have been obtained, the value of :math:`\sigma_R^2` is given by: .. math:: \sigma_R^2 = \left(r^{\prime }V^{-1}r\right)/\left(n-p\right)\text{.} Case weights, :math:`W_c`, can be incorporated into the model by replacing :math:`X^{\prime }X` and :math:`Z^{\prime }Z` with :math:`X^{\prime }W_cX` and :math:`Z^{\prime }W_cZ` respectively, for a diagonal weight matrix :math:`W_c`. The log-likelihood, :math:`l_R`, is calculated using the sweep algorithm detailed in Wolfinger `et al.` (1994). .. _g02jb-py2-py-references: **References** Goodnight, J H, 1979, `A tutorial on the SWEEP operator`, The American Statistician (33(3)), 149--158 Harville, D A, 1977, `Maximum likelihood approaches to variance component estimation and to related problems`, JASA (72), 320--340 Rao, C R, 1972, `Estimation of variance and covariance components in a linear model`, J. Am. Stat. Assoc. (67), 112--115 Stroup, W W, 1989, `Predictable functions and prediction space in the mixed model procedure`, Applications of Mixed Models in Agriculture and Related Disciplines (Southern Cooperative Series Bulletin No. 343), 39--48 Wolfinger, R, Tobias, R and Sall, J, 1994, `Computing Gaussian likelihoods and their derivatives for general linear mixed models`, SIAM Sci. Statist. Comput. (15), 1294--1310 """ raise NotImplementedError
[docs]def mixeff_hier_init(dat, levels, y, fixed, rndm, wt=None): r""" ``mixeff_hier_init`` preprocesses a dataset prior to fitting a linear mixed effects regression model of the following form via either :meth:`mixeff_hier_reml` or :meth:`mixeff_hier_ml`. .. deprecated:: 27.0.0.0 ``mixeff_hier_init`` is deprecated. Please use :meth:`lmm_init` instead. See also the :ref:`Replacement Calls <replace>` document. .. _g02jc-py2-py-doc: For full information please refer to the NAG Library document for g02jc https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jcf.html .. _g02jc-py2-py-parameters: **Parameters** **dat** : float, array-like, shape :math:`\left(n, \textit{ncol}\right)` A matrix of data, with :math:`\mathrm{dat}[i-1,j-1]` holding the :math:`i`\ th observation on the :math:`j`\ th variable. The two design matrices :math:`X` and :math:`Z` are constructed from :math:`\mathrm{dat}` and the information given in :math:`\mathrm{fixed}` (for :math:`X`) and :math:`\mathrm{rndm}` (for :math:`Z`). **levels** : int, array-like, shape :math:`\left(\textit{ncol}\right)` :math:`\mathrm{levels}[i-1]` contains the number of levels associated with the :math:`i`\ th variable held in :math:`\mathrm{dat}`. If the :math:`i`\ th variable is continuous or binary (i.e., only takes the values zero or one), then :math:`\mathrm{levels}[i-1]` must be set to :math:`1`. Otherwise the :math:`i`\ th variable is assumed to take an integer value between :math:`1` and :math:`\mathrm{levels}[i-1]`, (i.e., the :math:`i`\ th variable is discrete with :math:`\mathrm{levels}[i-1]` levels). **y** : float, array-like, shape :math:`\left(n\right)` :math:`y`, the vector of observations on the dependent variable. **fixed** : int, array-like, shape :math:`\left(\textit{lfixed}\right)` Defines the structure of the fixed effects design matrix, :math:`X`. :math:`\mathrm{fixed}[0]` The number of variables, :math:`N_F`, to include as fixed effects (not including the intercept if present). :math:`\mathrm{fixed}[1]` The fixed intercept flag which must contain :math:`1` if a fixed intercept is to be included and :math:`0` otherwise. :math:`\mathrm{fixed}[2+i-1]` The column of :math:`\mathrm{dat}` holding the :math:`\textit{i}`\ th fixed variable, for :math:`\textit{i} = 1,2,\ldots,\mathrm{fixed}[0]`. See `Construction of the fixed effects design matrix, X <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jcf.html#fc-constructionofx>`__ for more details on the construction of :math:`X`. **rndm** : int, array-like, shape :math:`\left(:, \textit{nrndm}\right)` :math:`\mathrm{rndm}[i-1,j-1]` defines the structure of the `random effects` design matrix, :math:`Z`. The :math:`b`\ th column of :math:`\mathrm{rndm}` defines a block of columns in the design matrix :math:`Z`. :math:`\mathrm{rndm}[0,b-1]` The number of variables, :math:`N_{R_b}`, to include as random effects in the :math:`b`\ th block (not including the random intercept if present). :math:`\mathrm{rndm}[1,b-1]` The random intercept flag which must contain :math:`1` if block :math:`b` includes a random intercept and :math:`0` otherwise. :math:`\mathrm{rndm}[2+i-1,b-1]` The column of :math:`\mathrm{dat}` holding the :math:`\textit{i}`\ th random variable in the :math:`b`\ th block, for :math:`\textit{i} = 1,2,\ldots,\mathrm{rndm}[0,b-1]`. :math:`\mathrm{rndm}[3+N_{R_b}-1,b-1]` The number of subject variables, :math:`N_{S_b}`, for the :math:`b`\ th block. The subject variables define the nesting structure for this block. :math:`\mathrm{rndm}[3+N_{R_b}+i-1,b-1]` The column of :math:`\mathrm{dat}` holding the :math:`\textit{i}`\ th subject variable in the :math:`b`\ th block, for :math:`\textit{i} = 1,2,\ldots,\mathrm{rndm}[3+N_{R_b}-1,b-1]`. See `Construction of random effects design matrix, Z <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jcf.html#fc-constructionofz>`__ for more details on the construction of :math:`Z`. **wt** : None or float, array-like, shape :math:`\left(:\right)`, optional Note: the required length for this argument is determined as follows: if :math:`\mathrm{wt}\text{ is not }\mathbf{None}`: :math:`n`; otherwise: :math:`0`. Optionally, the weights to be used in the weighted regression. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. If weights are not provided then :math:`\mathrm{wt}` must be set to **None** and the effective number of observations is :math:`\textit{n}`. **Returns** **nff** : int :math:`p`, the number of fixed effects estimated, i.e., the number of columns in the design matrix :math:`X`. **nlsv** : int The number of levels for the overall subject variable (see `Construction of random effects design matrix, Z <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jcf.html#fc-constructionofz>`__ for a description of what this means). If there is no overall subject variable, :math:`\mathrm{nlsv} = 1`. **nrf** : int The number of random effects estimated in each of the overall subject blocks. The number of columns in the design matrix :math:`Z` is given by :math:`q = \mathrm{nrf}\times \mathrm{nlsv}`. **comm** : dict, communication object Communication structure. .. _g02jc-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`2`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 1`. (`errno` :math:`3`) On entry, :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ncol} \geq 0`. (`errno` :math:`4`) On entry, variable :math:`j` of observation :math:`i` is less than :math:`1` or greater than :math:`\mathrm{levels}[j-1]`: :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`, value :math:`= \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{levels}[j-1] = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`6`) On entry, :math:`\mathrm{levels}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{levels}[i-1] \geq 1`. (`errno` :math:`8`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{wt}[i-1] \geq 0.0`. (`errno` :math:`9`) On entry, number of fixed parameters, :math:`\langle\mathit{\boldsymbol{value}}\rangle` is less than zero. (`errno` :math:`10`) On entry, :math:`\textit{lfixed} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lfixed} \geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`11`) On entry, :math:`\textit{nrndm} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nrndm} > 0`. (`errno` :math:`12`) On entry, number of random parameters for random statement :math:`i` is less than :math:`0`: :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, number of parameters :math:`\text{} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`102`) On entry, more fixed factors than observations, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`108`) On entry, no observations due to zero weights. (`errno` :math:`109`) On entry, invalid value for fixed intercept flag: value :math:`\text{} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`112`) On entry, invalid value for random intercept flag for random statement :math:`i`: :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, value :math:`\text{} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`209`) On entry, index of fixed variable :math:`j` is less than :math:`1` or greater than :math:`\textit{ncol}`: :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`, index :math:`\text{} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`212`) On entry, must be at least one parameter, or an intercept in each random statement :math:`i`: :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`312`) On entry, index of random variable :math:`j` in random statement :math:`i` is less than :math:`1` or greater than :math:`\textit{ncol}`: :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`, index :math:`\text{} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ncol} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`412`) On entry, number of subject parameters for random statement :math:`i` is less than :math:`0`: :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, number of parameters :math:`\text{} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`512`) On entry, nesting variable :math:`j` in random statement :math:`i` has one level: :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`. .. _g02jc-py2-py-notes: **Notes** ``mixeff_hier_init`` must be called prior to fitting a linear mixed effects regression model with either :meth:`mixeff_hier_reml` or :meth:`mixeff_hier_ml`. The model fitting functions :meth:`mixeff_hier_reml` and :meth:`mixeff_hier_ml` fit a model of the following form: .. math:: y = X\beta +Z\nu +\epsilon .. rst-class:: nag-rules-none nag-align-left +-----+----------------------------------------------------------------------------------+ |where|:math:`y` is a vector of :math:`n` observations on the dependent variable, | +-----+----------------------------------------------------------------------------------+ | |:math:`X` is an :math:`n\times p` design matrix of `fixed` independent variables, | +-----+----------------------------------------------------------------------------------+ | |:math:`\beta` is a vector of :math:`p` unknown `fixed effects`, | +-----+----------------------------------------------------------------------------------+ | |:math:`Z` is an :math:`n\times q` design matrix of `random` independent variables,| +-----+----------------------------------------------------------------------------------+ | |:math:`\nu` is a vector of length :math:`q` of unknown `random effects`, | +-----+----------------------------------------------------------------------------------+ | |:math:`\epsilon` is a vector of length :math:`n` of unknown random errors, | +-----+----------------------------------------------------------------------------------+ and :math:`\nu` and :math:`\epsilon` are Normally distributed with expectation zero and variance/covariance matrix defined by .. math:: \mathrm{Var}\left[\begin{array}{c}\nu \\\epsilon \end{array}\right] = \left[\begin{array}{cc}G&0\\0&R\end{array}\right] where :math:`R = \sigma_R^2I`, :math:`I` is the :math:`n\times n` identity matrix and :math:`G` is a diagonal matrix. Case weights can be incorporated into the model by replacing :math:`X` and :math:`Z` with :math:`W_c^{{1/2}}X` and :math:`W_c^{{1/2}}Z` respectively where :math:`W_c` is a diagonal weight matrix. """ raise NotImplementedError
[docs]def mixeff_hier_reml(vpr, gamma, comm, iopt=None, ropt=None, io_manager=None): r""" ``mixeff_hier_reml`` fits a multi-level linear mixed effects regression model using restricted maximum likelihood (REML). Prior to calling ``mixeff_hier_reml`` the initialization function :meth:`mixeff_hier_init` must be called. .. deprecated:: 27.0.0.0 ``mixeff_hier_reml`` is deprecated. Please use :meth:`lmm_fit` instead. See also the :ref:`Replacement Calls <replace>` document. .. _g02jd-py2-py-doc: For full information please refer to the NAG Library document for g02jd https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jdf.html .. _g02jd-py2-py-parameters: **Parameters** **vpr** : int, array-like, shape :math:`\left(\textit{lvpr}\right)` A vector of flags indicating the mapping between the random variables specified in :math:`\textit{rndm}` and the variance components, :math:`\sigma_i^2`. See `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jdf.html#fcomments>`__ for more details. **gamma** : float, array-like, shape :math:`\left(\textit{nvpr}+1\right)` Holds the initial values of the variance components, :math:`\gamma_0`, with :math:`\mathrm{gamma}[\textit{i}-1]` the initial value for :math:`\sigma_{\textit{i}}^2/\sigma_R^2`, for :math:`\textit{i} = 1,2,\ldots,\textit{nvpr}`. If :math:`\mathrm{gamma}[0] = {-1.0}`, the remaining elements of :math:`\mathrm{gamma}` are ignored and the initial values for the variance components are estimated from the data using MIVQUE0. **comm** : dict, communication object, modified in place Communication structure. This argument must have been initialized by a prior call to :meth:`mixeff_hier_init`. **iopt** : None or int, array-like, shape :math:`\left(\textit{liopt}\right)`, optional Options passed to the optimization function. By default ``mixeff_hier_reml`` fits the specified model using a modified Newton optimization algorithm as implemented in :meth:`opt.bounds_mod_deriv2_comp <naginterfaces.library.opt.bounds_mod_deriv2_comp>`. In some cases, where the calculation of the derivatives is computationally expensive it may be more efficient to use a sequential QP algorithm. The sequential QP algorithm as implemented in :meth:`opt.nlp1_solve <naginterfaces.library.opt.nlp1_solve>` can be chosen by setting :math:`\mathrm{iopt}[4] = 1`. If :math:`\textit{liopt} < 5` or :math:`\mathrm{iopt}[4]\neq 1`, then the modified Newton algorithm will be used. Different options are available depending on the optimization function used. In all cases, using a value of :math:`-1` will cause the default value to be used. In addition only the first :math:`\textit{liopt}` values of :math:`\mathrm{iopt}` are used, so for example, if only the first element of :math:`\mathrm{iopt}` needs changing and default values for all other options are sufficient :math:`\textit{liopt}` can be set to :math:`1`. The following table lists the association between elements of :math:`\mathrm{iopt}` and arguments in the optimizer when the modified Newton algorithm is being used. .. rst-class:: nag-rules-none nag-align-left +---------+------------------------------------------------+-----------------------+----------------------------------+ |:math:`i`|Description |Equivalent argument |Default Value | +=========+================================================+=======================+==================================+ |:math:`0`|Number of iterations |:math:`\textit{maxcal}`|:math:`1000` | +---------+------------------------------------------------+-----------------------+----------------------------------+ |:math:`1`|Unit number for monitoring information |n/a |The advisory unit number | +---------+------------------------------------------------+-----------------------+----------------------------------+ |:math:`2`|Print options (:math:`1 =` print) |n/a |:math:`-1` (no printing performed)| +---------+------------------------------------------------+-----------------------+----------------------------------+ |:math:`3`|Frequency that monitoring information is printed|:math:`\textit{iprint}`|:math:`-1` | +---------+------------------------------------------------+-----------------------+----------------------------------+ |:math:`4`|Optimizer used |n/a |n/a | +---------+------------------------------------------------+-----------------------+----------------------------------+ If requested, monitoring information is displayed in a similar format to that given by the modified Newton optimizer. The following table lists the association between elements of :math:`\mathrm{iopt}` and options in the optimizer when the sequential QP algorithm is being used. .. rst-class:: nag-rules-none nag-align-left +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`i`|Description |Equivalent option |Default Value | +=========+=============================================================+=======================+=========================================================+ |:math:`0`|Number of iterations |'Major Iteration Limit'|:math:`\mathrm{max}\left(50,3\times \textit{nvpr}\right)`| +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`1`|Unit number for monitoring information |n/a |The advisory unit number | +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`2`|Print options (:math:`1 = \text{}` print, otherwise no print)|'List'/'Nolist' |:math:`-1` (no printing performed) | +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`3`|Frequency that monitoring information is printed |'Major Print Level' |:math:`0` | +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`4`|Optimizer used |n/a |n/a | +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`5`|Number of minor iterations |'Minor Iteration Limit'|:math:`\mathrm{max}\left(50,3\times \textit{nvpr}\right)`| +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`6`|Frequency that additional monitoring information is printed |'Minor Print Level' |:math:`0` | +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ If :math:`\textit{liopt}\leq 0`, default values are used for all options and :math:`\mathrm{iopt}` is not referenced. **ropt** : None or float, array-like, shape :math:`\left(\textit{lropt}\right)`, optional Options passed to the optimization function. Different options are available depending on the optimization function used. In all cases, using a value of :math:`-1.0` will cause the default value to be used. In addition only the first :math:`\textit{lropt}` values of :math:`\mathrm{ropt}` are used, so for example, if only the first element of :math:`\mathrm{ropt}` needs changing and default values for all other options are sufficient :math:`\textit{lropt}` can be set to :math:`1`. The following table lists the association between elements of :math:`\mathrm{ropt}` and arguments in the optimizer when the modified Newton algorithm is being used. .. rst-class:: nag-rules-none nag-align-left +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`i`|Description |Equivalent argument |Default Value | +=========+======================================+=======================+============================================================================================================================+ |:math:`0`|Sweep tolerance |n/a |:math:`\mathrm{max}\left(\sqrt{\mathrm{eps}},\sqrt{\mathrm{eps}}\times \mathrm{max}_i\left(\textit{zz}_{{ii}}\right)\right)`| +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`1`|Lower bound for :math:`\gamma^*` |n/a |:math:`\mathrm{eps}/100` | +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`2`|Upper bound for :math:`\gamma^*` |n/a |:math:`10^{20}` | +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`3`|Accuracy of linear minimizations |:math:`\textit{eta}` |:math:`0.9` | +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`4`|Accuracy to which solution is required|:math:`\textit{xtol}` |:math:`0.0` | +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`5`|Initial distance from solution |:math:`\textit{stepmx}`|:math:`100000.0` | +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ The following table lists the association between elements of :math:`\mathrm{ropt}` and options in the optimizer when the sequential QP algorithm is being used. .. rst-class:: nag-rules-none nag-align-left +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`i`|Description |Equivalent option |Default Value | +=========+================================+=======================+============================================================================================================================+ |:math:`0`|Sweep tolerance |n/a |:math:`\mathrm{max}\left(\sqrt{\mathrm{eps}},\sqrt{\mathrm{eps}}\times \mathrm{max}_i\left(\textit{zz}_{{ii}}\right)\right)`| +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`1`|Lower bound for :math:`\gamma^*`|n/a |:math:`\mathrm{eps}/100` | +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`2`|Upper bound for :math:`\gamma^*`|n/a |:math:`10^{20}` | +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`3`|Line search tolerance |'Line Search Tolerance'|:math:`0.9` | +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`4`|Optimality tolerance |'Optimality Tolerance' |:math:`\mathrm{eps}^{0.72}` | +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ where :math:`\mathrm{eps}` is the machine precision returned by :meth:`machine.precision <naginterfaces.library.machine.precision>` and :math:`\textit{zz}_{{ii}}` denotes the :math:`i` diagonal element of :math:`Z^\mathrm{T}Z`. If :math:`\textit{lropt}\leq 0`, then default values are used for all options and :math:`\mathrm{ropt}` may be set to **None**. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **gamma** : float, ndarray, shape :math:`\left(\textit{nvpr}+1\right)` :math:`\mathrm{gamma}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,\textit{nvpr}`, holds the final estimate of :math:`\sigma_{\textit{i}}^2` and :math:`\mathrm{gamma}[\textit{nvpr}]` holds the final estimate for :math:`\sigma_R^2`. **effn** : int Effective number of observations. If no weights were supplied to :meth:`mixeff_hier_init` or all supplied weights were nonzero, :math:`\mathrm{effn} = {\textit{n}}`. **rnkx** : int The rank of the design matrix, :math:`X`, for the fixed effects. **ncov** : int Number of variance components not estimated to be zero. If none of the variance components are estimated to be zero, then :math:`\mathrm{ncov} = \textit{nvpr}`. **lnlike** : float :math:`-2l_R\left(\hat{\gamma }\right)` where :math:`l_R` is the log of the restricted maximum likelihood calculated at :math:`\hat{\gamma }`, the estimated variance components returned in :math:`\mathrm{gamma}`. **estid** : int, ndarray, shape :math:`\left(:, \textit{lb}\right)` An array describing the parameter estimates returned in :math:`\mathrm{b}`. The first :math:`{\textit{nlsv}}\times {\textit{nrf}}` columns of :math:`\mathrm{estid}` describe the parameter estimates for the random effects and the last :math:`\textit{nff}` columns the parameter estimates for the fixed effects. For fixed effects: for :math:`l = {\textit{nrf}}\times {\textit{nlsv}}+1,\ldots,{\textit{nrf}}\times {\textit{nlsv}}+{\textit{nff}}` if :math:`\mathrm{b}[l-1]` contains the parameter estimate for the intercept, then .. math:: \mathrm{estid}[0,l-1] = \mathrm{estid}[1,l-1] = \mathrm{estid}[2,l-1] = 0\text{;} if :math:`\mathrm{b}[l-1]` contains the parameter estimate for the :math:`i`\ th level of the :math:`j`\ th fixed variable, that is the vector of values held in the :math:`k`\ th column of :math:`\textit{dat}` when :math:`{\textit{fixed}}[j+1] = k` then .. math:: \begin{array}{l} \mathrm{estid}[0,l-1] = 0\text{, }\quad \\ \mathrm{estid}[1,l-1] = j\text{, }\quad \\ \mathrm{estid}[2,l-1] = i\text{;} \end{array} if the :math:`j`\ th variable is continuous or binary, that is :math:`{\textit{levels}}[{\textit{fixed}}[j+1]-1] = 1`, :math:`\mathrm{estid}[2,l-1] = 0`; any remaining rows of the :math:`l`\ th column of :math:`\mathrm{estid}` are set to :math:`{0}`. For random effects: let :math:`N_{R_b}` denote the number of random variables in the :math:`b`\ th random statement, that is :math:`N_{R_b} = {\textit{rndm}}\left(1, b\right)`; :math:`R_{{jb}}` denote the :math:`j`\ th random variable from the :math:`b`\ th random statement, that is the vector of values held in the :math:`k`\ th column of :math:`\textit{dat}` when :math:`{\textit{rndm}}\left({2+j}, b\right) = k`; :math:`N_{S_b}` denote the number of subject variables in the :math:`b`\ th random statement, that is :math:`N_{S_b} = {\textit{rndm}}\left({3+N_{R_b}}, b\right)`; :math:`S_{{jb}}` denote the :math:`j`\ th subject variable from the :math:`b`\ th random statement, that is the vector of values held in the :math:`k`\ th column of :math:`\textit{dat}` when :math:`{\textit{rndm}}\left({3+N_{R_b}+j}, b\right) = k`; :math:`L\left(S_{{jb}}\right)` denote the number of levels for :math:`S_{{jb}}`, that is :math:`L\left(S_{{jb}}\right) = {\textit{levels}}[{\textit{rndm}}\left({3+N_{R_b}+j}, b\right)-1]`; then for :math:`l = 1,2,\ldots {\textit{nrf}}\times {\textit{nlsv}}`, if :math:`\mathrm{b}[l-1]` contains the parameter estimate for the :math:`i`\ th level of :math:`R_{{jb}}` when :math:`S_{{\textit{k}b}} = s_{\textit{k}}`, for :math:`\textit{k} = 1,2,\ldots,N_{S_b}` and :math:`1\leq s_{\textit{k}}\leq L\left(S_{{jb}}\right)`, i.e., :math:`s_k` is a valid value for the :math:`k`\ th subject variable, then .. math:: \begin{array}{l} \mathrm{estid}[0,l-1] = b\text{, }\quad \\ \mathrm{estid}[1,l-1] = j\text{, }\quad \\ \mathrm{estid}[2,l-1] = i\text{, }\quad \\ \mathrm{estid}[{3+k}-1,l-1] = s_k\text{, }k = 1,2,\ldots,N_{S_b}\text{;} \end{array} if the parameter being estimated is for the intercept, then :math:`\mathrm{estid}[1,l-1] = \mathrm{estid}[2,l-1] = 0`; if the :math:`j`\ th variable is continuous, or binary, that is :math:`L\left(S_{{jb}}\right) = 1`, then :math:`\mathrm{estid}[2,l-1] = 0`; the remaining rows of the :math:`l`\ th column of :math:`\mathrm{estid}` are set to :math:`{0}`. In some situations, certain combinations of variables are never observed. In such circumstances all elements of the :math:`l`\ th row of :math:`\mathrm{estid}` are set to :math:`-999`. **b** : float, ndarray, shape :math:`\left(\textit{nff}+\textit{nrf}\times \textit{nlsv}\right)` The parameter estimates, with the first :math:`{\textit{nrf}}\times {\textit{nlsv}}` elements of :math:`\mathrm{b}` containing the parameter estimates for the random effects, :math:`\nu`, and the remaining :math:`\textit{nff}` elements containing the parameter estimates for the fixed effects, :math:`\beta`. The order of these estimates are described by the :math:`\mathrm{estid}` argument. **se** : float, ndarray, shape :math:`\left(\textit{nff}+\textit{nrf}\times \textit{nlsv}\right)` The standard errors of the parameter estimates given in :math:`\mathrm{b}`. **czz** : float, ndarray, shape :math:`\left(\textit{nrf}, :\right)` If :math:`{\textit{nlsv}} = 1`, then :math:`\mathrm{czz}` holds the lower triangular portion of the matrix :math:`\left(1/\sigma^2\right)\left(Z^\mathrm{T} \hat{R}^{-1}Z + \hat{G}^{-1}\right)`, where :math:`\hat{R}` and :math:`\hat{G}` are the estimates of :math:`R` and :math:`G` respectively. If :math:`{\textit{nlsv}} > 1`, then :math:`\mathrm{czz}` holds this matrix in compressed form, with the first :math:`\textit{nrf}` columns holding the part of the matrix corresponding to the first level of the overall subject variable, the next :math:`\textit{nrf}` columns the part corresponding to the second level of the overall subject variable etc. **cxx** : float, ndarray, shape :math:`\left(\textit{nff}, :\right)` :math:`\mathrm{cxx}` holds the lower triangular portion of the matrix :math:`\left(1/\sigma^2\right)X^\mathrm{T}\hat{V}^{-1}X`, where :math:`\hat{V}` is the estimated value of :math:`V`. **cxz** : float, ndarray, shape :math:`\left(\textit{nff}, :\right)` If :math:`{\textit{nlsv}} = 1`, then :math:`\mathrm{cxz}` holds the matrix :math:`\left(1/\sigma^2\right)\left(X^\mathrm{T}\hat{V}^{-1}Z\right)\hat{G}`, where :math:`\hat{V}` and :math:`\hat{G}` are the estimates of :math:`V` and :math:`G` respectively. If :math:`{\textit{nlsv}} > 1`, then :math:`\mathrm{cxz}` holds this matrix in compressed form, with the first :math:`\textit{nrf}` columns holding the part of the matrix corresponding to the first level of the overall subject variable, the next :math:`\textit{nrf}` columns the part corresponding to the second level of the overall subject variable etc. .. _g02jd-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{lvpr} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lvpr} \geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`2`) On entry, :math:`\mathrm{vpr}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{nvpr} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{vpr}[i]\leq \textit{nvpr}`. (`errno` :math:`3`) On entry, :math:`\textit{nvpr} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \textit{nvpr}\leq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`4`) On entry, :math:`\mathrm{gamma}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{gamma}[0] = {-1.0}` or :math:`\mathrm{gamma}[i-1] \geq 0.0`. (`errno` :math:`9`) On entry, :math:`\textit{lb} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lb} \geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`11`) On entry, :math:`\textit{ldid} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ldid} \geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`21`) On entry, :math:`\mathrm{comm}`\ ['iopts'] has not been initialized correctly. (`errno` :math:`32`) On entry, at least one value of i, for :math:`\textit{i} = 1,2,\ldots,\textit{nvpr}`, does not appear in :math:`\mathrm{vpr}`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`101`) Optimal solution found, but requested accuracy not achieved. (`errno` :math:`102`) Too many major iterations. (`errno` :math:`103`) Current point cannot be improved upon. (`errno` :math:`104`) At least one negative estimate for :math:`\mathrm{gamma}` was obtained. All negative estimates have been set to zero. .. _g02jd-py2-py-notes: **Notes** ``mixeff_hier_reml`` fits a model of the form: .. math:: y = X\beta +Z\nu +\epsilon .. rst-class:: nag-rules-none nag-align-left +-----+--------------------------------------------------------------------------------------------+ |where|:math:`y` is a vector of :math:`n` observations on the dependent variable, | +-----+--------------------------------------------------------------------------------------------+ | |:math:`X` is a known :math:`n\times p` design matrix for the `fixed` independent variables, | +-----+--------------------------------------------------------------------------------------------+ | |:math:`\beta` is a vector of length :math:`p` of unknown `fixed effects`, | +-----+--------------------------------------------------------------------------------------------+ | |:math:`Z` is a known :math:`n\times q` design matrix for the `random` independent variables,| +-----+--------------------------------------------------------------------------------------------+ | |:math:`\nu` is a vector of length :math:`q` of unknown `random effects`, | +-----+--------------------------------------------------------------------------------------------+ |and |:math:`\epsilon` is a vector of length :math:`n` of unknown random errors. | +-----+--------------------------------------------------------------------------------------------+ Both :math:`\nu` and :math:`\epsilon` are assumed to have a Gaussian distribution with expectation zero and variance/covariance matrix defined by .. math:: \mathrm{Var}\left[\begin{array}{c}\nu \\\epsilon \end{array}\right] = \left[\begin{array}{cc}G&0\\0&R\end{array}\right] where :math:`R = \sigma_R^2I`, :math:`I` is the :math:`n\times n` identity matrix and :math:`G` is a diagonal matrix. It is assumed that the random variables, :math:`Z`, can be subdivided into :math:`g\leq q` groups with each group being identically distributed with expectation zero and variance :math:`\sigma_i^2`. The diagonal elements of matrix :math:`G`, therefore, take one of the values :math:`\left\{\sigma_i^2:i = 1,2,\ldots,g\right\}`, depending on which group the associated random variable belongs to. The model, therefore, contains three sets of unknowns: the fixed effects :math:`\beta`, the random effects :math:`\nu` and a vector of :math:`g+1` variance components :math:`\gamma`, where :math:`\gamma = \left\{\sigma_1^2, \sigma_2^2, \ldots, \sigma_{{g-1}}^2, \sigma_g^2, \sigma_R^2\right\}`. Rather than working directly with :math:`\gamma`, ``mixeff_hier_reml`` uses an iterative process to estimate :math:`\gamma^* = \left\{{\sigma_1^2/\sigma_R^2}, {\sigma_2^2/\sigma_R^2}, \ldots, {\sigma_{{g-1}}^2/\sigma_R^2}, {\sigma_g^2/\sigma_R^2}, 1\right\}`. Due to the iterative nature of the estimation a set of initial values, :math:`\gamma_0`, for :math:`\gamma^*` is required. ``mixeff_hier_reml`` allows these initial values either to be supplied by you or calculated from the data using the minimum variance quadratic unbiased estimators (MIVQUE0) suggested by Rao (1972). ``mixeff_hier_reml`` fits the model by maximizing the restricted log-likelihood function: .. math:: -2l_R = \log\left(\left\lvert V\right\rvert \right)+\left(n-p\right)\log\left(r^\mathrm{T}V^{-1}r\right)+\log\left(\left\lvert X^\mathrm{T}V^{-1}X\right\rvert \right)+\left(n-p\right)\left(1+\log\left(2\pi /\left(n-p\right)\right)\right) where .. math:: V = ZGZ^\mathrm{T}+R\text{, }\quad r = y-Xb\quad \text{ and }\quad b = \left(X^\mathrm{T}V^{-1}X\right)^{-1}X^\mathrm{T}V^{-1}y\text{.} Once the final estimates for :math:`\gamma^*` have been obtained, the value of :math:`\sigma_R^2` is given by .. math:: \sigma_R^2 = \left(r^\mathrm{T}V^{-1}r\right)/\left(n-p\right)\text{.} Case weights, :math:`W_c`, can be incorporated into the model by replacing :math:`X^\mathrm{T}X` and :math:`Z^\mathrm{T}Z` with :math:`X^\mathrm{T}W_cX` and :math:`Z^\mathrm{T}W_cZ` respectively, for a diagonal weight matrix :math:`W_c`. The log-likelihood, :math:`l_R`, is calculated using the sweep algorithm detailed in Wolfinger `et al.` (1994). .. _g02jd-py2-py-references: **References** Goodnight, J H, 1979, `A tutorial on the SWEEP operator`, The American Statistician (33(3)), 149--158 Harville, D A, 1977, `Maximum likelihood approaches to variance component estimation and to related problems`, JASA (72), 320--340 Rao, C R, 1972, `Estimation of variance and covariance components in a linear model`, J. Am. Stat. Assoc. (67), 112--115 Stroup, W W, 1989, `Predictable functions and prediction space in the mixed model procedure`, Applications of Mixed Models in Agriculture and Related Disciplines (Southern Cooperative Series Bulletin No. 343), 39--48 Wolfinger, R, Tobias, R and Sall, J, 1994, `Computing Gaussian likelihoods and their derivatives for general linear mixed models`, SIAM Sci. Statist. Comput. (15), 1294--1310 """ raise NotImplementedError
[docs]def mixeff_hier_ml(vpr, nvpr, gamma, comm, iopt=None, ropt=None, io_manager=None): r""" ``mixeff_hier_ml`` fits a multi-level linear mixed effects regression model using maximum likelihood (ML). Prior to calling ``mixeff_hier_ml`` the initialization function :meth:`mixeff_hier_init` must be called. .. deprecated:: 27.0.0.0 ``mixeff_hier_ml`` is deprecated. Please use :meth:`lmm_fit` instead. See also the :ref:`Replacement Calls <replace>` document. .. _g02je-py2-py-doc: For full information please refer to the NAG Library document for g02je https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jef.html .. _g02je-py2-py-parameters: **Parameters** **vpr** : int, array-like, shape :math:`\left(\textit{lvpr}\right)` A vector of flags indicating the mapping between the random variables specified in :math:`\textit{rndm}` and the variance components, :math:`\sigma_i^2`. See `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jef.html#fcomments>`__ for more details. **nvpr** : int :math:`g`, the number of variance components being estimated (excluding the overall variance, :math:`\sigma_R^2`). **gamma** : float, array-like, shape :math:`\left(\mathrm{nvpr}+1\right)` Holds the initial values of the variance components, :math:`\gamma_0`, with :math:`\mathrm{gamma}[\textit{i}-1]` the initial value for :math:`\sigma_{\textit{i}}^2/\sigma_R^2`, for :math:`\textit{i} = 1,2,\ldots,\mathrm{nvpr}`. If :math:`\mathrm{gamma}[0] = {-1.0}`, the remaining elements of :math:`\mathrm{gamma}` are ignored and the initial values for the variance components are estimated from the data using MIVQUE0. **comm** : dict, communication object, modified in place Communication structure. This argument must have been initialized by a prior call to :meth:`mixeff_hier_init`. **iopt** : None or int, array-like, shape :math:`\left(\textit{liopt}\right)`, optional Options passed to the optimization function. By default ``mixeff_hier_ml`` fits the specified model using a modified Newton optimization algorithm as implemented in :meth:`opt.bounds_mod_deriv2_comp <naginterfaces.library.opt.bounds_mod_deriv2_comp>`. In some cases, where the calculation of the derivatives is computationally expensive it may be more efficient to use a sequential QP algorithm. The sequential QP algorithm as implemented in :meth:`opt.nlp1_solve <naginterfaces.library.opt.nlp1_solve>` can be chosen by setting :math:`\mathrm{iopt}[4] = 1`. If :math:`\textit{liopt} < 5` or :math:`\mathrm{iopt}[4]\neq 1`, then the modified Newton algorithm will be used. Different options are available depending on the optimization function used. In all cases, using a value of :math:`-1` will cause the default value to be used. In addition only the first :math:`\textit{liopt}` values of :math:`\mathrm{iopt}` are used, so for example, if only the first element of :math:`\mathrm{iopt}` needs changing and default values for all other options are sufficient :math:`\textit{liopt}` can be set to :math:`1`. The following table lists the association between elements of :math:`\mathrm{iopt}` and arguments in the optimizer when the modified Newton algorithm is being used. .. rst-class:: nag-rules-none nag-align-left +---------+------------------------------------------------+-----------------------+----------------------------------+ |:math:`i`|Description |Equivalent argument |Default Value | +=========+================================================+=======================+==================================+ |:math:`0`|Number of iterations |:math:`\textit{maxcal}`|:math:`1000` | +---------+------------------------------------------------+-----------------------+----------------------------------+ |:math:`1`|Unit number for monitoring information |n/a |The advisory unit number | +---------+------------------------------------------------+-----------------------+----------------------------------+ |:math:`2`|Print options (:math:`1 =` print) |n/a |:math:`-1` (no printing performed)| +---------+------------------------------------------------+-----------------------+----------------------------------+ |:math:`3`|Frequency that monitoring information is printed|:math:`\textit{iprint}`|:math:`-1` | +---------+------------------------------------------------+-----------------------+----------------------------------+ |:math:`4`|Optimizer used |n/a |n/a | +---------+------------------------------------------------+-----------------------+----------------------------------+ If requested, monitoring information is displayed in a similar format to that given by the modified Newton optimizer. The following table lists the association between elements of :math:`\mathrm{iopt}` and options in the optimizer when the sequential QP algorithm is being used. .. rst-class:: nag-rules-none nag-align-left +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`i`|Description |Equivalent option |Default Value | +=========+=============================================================+=======================+=========================================================+ |:math:`0`|Number of iterations |'Major Iteration Limit'|:math:`\mathrm{max}\left(50,3\times \mathrm{nvpr}\right)`| +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`1`|Unit number for monitoring information |n/a |The advisory unit number | +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`2`|Print options (:math:`1 = \text{}` print, otherwise no print)|'List'/'Nolist' |:math:`-1` (no printing performed) | +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`3`|Frequency that monitoring information is printed |'Major Print Level' |:math:`0` | +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`4`|Optimizer used |n/a |n/a | +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`5`|Number of minor iterations |'Minor Iteration Limit'|:math:`\mathrm{max}\left(50,3\times \mathrm{nvpr}\right)`| +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ |:math:`6`|Frequency that additional monitoring information is printed |'Minor Print Level' |:math:`0` | +---------+-------------------------------------------------------------+-----------------------+---------------------------------------------------------+ If :math:`\textit{liopt}\leq 0`, default values are used for all options and :math:`\mathrm{iopt}` is not referenced. **ropt** : None or float, array-like, shape :math:`\left(\textit{lropt}\right)`, optional Options passed to the optimization function. Different options are available depending on the optimization function used. In all cases, using a value of :math:`-1.0` will cause the default value to be used. In addition only the first :math:`\textit{lropt}` values of :math:`\mathrm{ropt}` are used, so for example, if only the first element of :math:`\mathrm{ropt}` needs changing and default values for all other options are sufficient :math:`\textit{lropt}` can be set to :math:`1`. The following table lists the association between elements of :math:`\mathrm{ropt}` and arguments in the optimizer when the modified Newton algorithm is being used. .. rst-class:: nag-rules-none nag-align-left +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`i`|Description |Equivalent argument |Default Value | +=========+======================================+=======================+============================================================================================================================+ |:math:`0`|Sweep tolerance |n/a |:math:`\mathrm{max}\left(\sqrt{\mathrm{eps}},\sqrt{\mathrm{eps}}\times \mathrm{max}_i\left(\textit{zz}_{{ii}}\right)\right)`| +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`1`|Lower bound for :math:`\gamma^*` |n/a |:math:`\mathrm{eps}/100` | +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`2`|Upper bound for :math:`\gamma^*` |n/a |:math:`10^{20}` | +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`3`|Accuracy of linear minimizations |:math:`\textit{eta}` |:math:`0.9` | +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`4`|Accuracy to which solution is required|:math:`\textit{xtol}` |:math:`0.0` | +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`5`|Initial distance from solution |:math:`\textit{stepmx}`|:math:`100000.0` | +---------+--------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ The following table lists the association between elements of :math:`\mathrm{ropt}` and options in the optimizer when the sequential QP algorithm is being used. .. rst-class:: nag-rules-none nag-align-left +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`i`|Description |Equivalent option |Default Value | +=========+================================+=======================+============================================================================================================================+ |:math:`0`|Sweep tolerance |n/a |:math:`\mathrm{max}\left(\sqrt{\mathrm{eps}},\sqrt{\mathrm{eps}}\times \mathrm{max}_i\left(\textit{zz}_{{ii}}\right)\right)`| +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`1`|Lower bound for :math:`\gamma^*`|n/a |:math:`\mathrm{eps}/100` | +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`2`|Upper bound for :math:`\gamma^*`|n/a |:math:`10^{20}` | +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`3`|Line search tolerance |'Line Search Tolerance'|:math:`0.9` | +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ |:math:`4`|Optimality tolerance |'Optimality Tolerance' |:math:`\mathrm{eps}^{0.72}` | +---------+--------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------+ where :math:`\mathrm{eps}` is the machine precision returned by :meth:`machine.precision <naginterfaces.library.machine.precision>` and :math:`\textit{zz}_{{ii}}` denotes the :math:`i` diagonal element of :math:`Z^\mathrm{T}Z`. If :math:`\textit{lropt}\leq 0`, then default values are used for all options and :math:`\mathrm{ropt}` may be set to **None**. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **gamma** : float, ndarray, shape :math:`\left(\mathrm{nvpr}+1\right)` :math:`\mathrm{gamma}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,\mathrm{nvpr}`, holds the final estimate of :math:`\sigma_{\textit{i}}^2` and :math:`\mathrm{gamma}[\mathrm{nvpr}]` holds the final estimate for :math:`\sigma_R^2`. **effn** : int Effective number of observations. If no weights were supplied to :meth:`mixeff_hier_init` or all supplied weights were nonzero, :math:`\mathrm{effn} = {\textit{n}}`. **rnkx** : int The rank of the design matrix, :math:`X`, for the fixed effects. **ncov** : int Number of variance components not estimated to be zero. If none of the variance components are estimated to be zero, then :math:`\mathrm{ncov} = \mathrm{nvpr}`. **lnlike** : float :math:`-2l_R\left(\hat{\gamma }\right)` where :math:`l_R` is the log of the maximum likelihood calculated at :math:`\hat{\gamma }`, the estimated variance components returned in :math:`\mathrm{gamma}`. **estid** : int, ndarray, shape :math:`\left(:, \textit{lb}\right)` An array describing the parameter estimates returned in :math:`\mathrm{b}`. The first :math:`{\textit{nlsv}}\times {\textit{nrf}}` columns of :math:`\mathrm{estid}` describe the parameter estimates for the random effects and the last :math:`\textit{nff}` columns the parameter estimates for the fixed effects. For fixed effects: for :math:`l = {\textit{nrf}}\times {\textit{nlsv}}+1,\ldots,{\textit{nrf}}\times {\textit{nlsv}}+{\textit{nff}}` if :math:`\mathrm{b}[l-1]` contains the parameter estimate for the intercept, then .. math:: \mathrm{estid}[0,l-1] = \mathrm{estid}[1,l-1] = \mathrm{estid}[2,l-1] = 0\text{;} if :math:`\mathrm{b}[l-1]` contains the parameter estimate for the :math:`i`\ th level of the :math:`j`\ th fixed variable, that is the vector of values held in the :math:`k`\ th column of :math:`\textit{dat}` when :math:`{\textit{fixed}}[j+1] = k` then .. math:: \begin{array}{l} \mathrm{estid}[0,l-1] = 0\text{, }\quad \\ \mathrm{estid}[1,l-1] = j\text{, }\quad \\ \mathrm{estid}[2,l-1] = i\text{;} \end{array} if the :math:`j`\ th variable is continuous or binary, that is :math:`{\textit{levels}}[{\textit{fixed}}[j+1]-1] = 1`, :math:`\mathrm{estid}[2,l-1] = 0`; any remaining rows of the :math:`l`\ th column of :math:`\mathrm{estid}` are set to :math:`{0}`. For random effects: let :math:`N_{R_b}` denote the number of random variables in the :math:`b`\ th random statement, that is :math:`N_{R_b} = {\textit{rndm}}\left(1, b\right)`; :math:`R_{{jb}}` denote the :math:`j`\ th random variable from the :math:`b`\ th random statement, that is the vector of values held in the :math:`k`\ th column of :math:`\textit{dat}` when :math:`{\textit{rndm}}\left({2+j}, b\right) = k`; :math:`N_{S_b}` denote the number of subject variables in the :math:`b`\ th random statement, that is :math:`N_{S_b} = {\textit{rndm}}\left({3+N_{R_b}}, b\right)`; :math:`S_{{jb}}` denote the :math:`j`\ th subject variable from the :math:`b`\ th random statement, that is the vector of values held in the :math:`k`\ th column of :math:`\textit{dat}` when :math:`{\textit{rndm}}\left({3+N_{R_b}+j}, b\right) = k`; :math:`L\left(S_{{jb}}\right)` denote the number of levels for :math:`S_{{jb}}`, that is :math:`L\left(S_{{jb}}\right) = {\textit{levels}}[{\textit{rndm}}\left({3+N_{R_b}+j}, b\right)-1]`; then for :math:`l = 1,2,\ldots {\textit{nrf}}\times {\textit{nlsv}}`, if :math:`\mathrm{b}[l-1]` contains the parameter estimate for the :math:`i`\ th level of :math:`R_{{jb}}` when :math:`S_{{\textit{k}b}} = s_{\textit{k}}`, for :math:`\textit{k} = 1,2,\ldots,N_{S_b}` and :math:`1\leq s_{\textit{k}}\leq L\left(S_{{jb}}\right)`, i.e., :math:`s_k` is a valid value for the :math:`k`\ th subject variable, then .. math:: \begin{array}{l} \mathrm{estid}[0,l-1] = b\text{, }\quad \\ \mathrm{estid}[1,l-1] = j\text{, }\quad \\ \mathrm{estid}[2,l-1] = i\text{, }\quad \\ \mathrm{estid}[{3+k}-1,l-1] = s_k\text{, }k = 1,2,\ldots,N_{S_b}\text{;} \end{array} if the parameter being estimated is for the intercept, then :math:`\mathrm{estid}[1,l-1] = \mathrm{estid}[2,l-1] = 0`; if the :math:`j`\ th variable is continuous, or binary, that is :math:`L\left(S_{{jb}}\right) = 1`, then :math:`\mathrm{estid}[2,l-1] = 0`; the remaining rows of the :math:`l`\ th column of :math:`\mathrm{estid}` are set to :math:`{0}`. In some situations, certain combinations of variables are never observed. In such circumstances all elements of the :math:`l`\ th row of :math:`\mathrm{estid}` are set to :math:`-999`. **b** : float, ndarray, shape :math:`\left(\textit{nff}+\textit{nrf}\times \textit{nlsv}\right)` The parameter estimates, with the first :math:`{\textit{nrf}}\times {\textit{nlsv}}` elements of :math:`\mathrm{b}` containing the parameter estimates for the random effects, :math:`\nu`, and the remaining :math:`\textit{nff}` elements containing the parameter estimates for the fixed effects, :math:`\beta`. The order of these estimates are described by the :math:`\mathrm{estid}` argument. **se** : float, ndarray, shape :math:`\left(\textit{nff}+\textit{nrf}\times \textit{nlsv}\right)` The standard errors of the parameter estimates given in :math:`\mathrm{b}`. **czz** : float, ndarray, shape :math:`\left(\textit{nrf}, :\right)` If :math:`{\textit{nlsv}} = 1`, then :math:`\mathrm{czz}` holds the lower triangular portion of the matrix :math:`\left(1/\sigma^2\right)\left(Z^\mathrm{T} \hat{R}^{-1}Z + \hat{G}^{-1}\right)`, where :math:`\hat{R}` and :math:`\hat{G}` are the estimates of :math:`R` and :math:`G` respectively. If :math:`{\textit{nlsv}} > 1`, then :math:`\mathrm{czz}` holds this matrix in compressed form, with the first :math:`\textit{nrf}` columns holding the part of the matrix corresponding to the first level of the overall subject variable, the next :math:`\textit{nrf}` columns the part corresponding to the second level of the overall subject variable etc. **cxx** : float, ndarray, shape :math:`\left(\textit{nff}, :\right)` :math:`\mathrm{cxx}` holds the lower triangular portion of the matrix :math:`\left(1/\sigma^2\right)X^\mathrm{T}\hat{V}^{-1}X`, where :math:`\hat{V}` is the estimated value of :math:`V`. **cxz** : float, ndarray, shape :math:`\left(\textit{nff}, :\right)` If :math:`{\textit{nlsv}} = 1`, then :math:`\mathrm{cxz}` holds the matrix :math:`\left(1/\sigma^2\right)\left(X^\mathrm{T}\hat{V}^{-1}Z\right)\hat{G}`, where :math:`\hat{V}` and :math:`\hat{G}` are the estimates of :math:`V` and :math:`G` respectively. If :math:`{\textit{nlsv}} > 1`, then :math:`\mathrm{cxz}` holds this matrix in compressed form, with the first :math:`\textit{nrf}` columns holding the part of the matrix corresponding to the first level of the overall subject variable, the next :math:`\textit{nrf}` columns the part corresponding to the second level of the overall subject variable etc. .. _g02je-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{lvpr} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lvpr} \geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`2`) On entry, :math:`\mathrm{vpr}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{nvpr} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{vpr}[i]\leq \mathrm{nvpr}`. (`errno` :math:`3`) On entry, :math:`\mathrm{nvpr} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{nvpr}\leq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`4`) On entry, :math:`\mathrm{gamma}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{gamma}[0] = {-1.0}` or :math:`\mathrm{gamma}[i-1] \geq 0.0`. (`errno` :math:`9`) On entry, :math:`\textit{lb} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lb} \geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`11`) On entry, :math:`\textit{ldid} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ldid} \geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`21`) On entry, :math:`\mathrm{comm}`\ ['iopts'] has not been initialized correctly. (`errno` :math:`32`) On entry, at least one value of i, for :math:`\textit{i} = 1,2,\ldots,\mathrm{nvpr}`, does not appear in :math:`\mathrm{vpr}`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`101`) Optimal solution found, but requested accuracy not achieved. (`errno` :math:`102`) Too many major iterations. (`errno` :math:`103`) Current point cannot be improved upon. (`errno` :math:`104`) At least one negative estimate for :math:`\mathrm{gamma}` was obtained. All negative estimates have been set to zero. .. _g02je-py2-py-notes: **Notes** ``mixeff_hier_ml`` fits a model of the form: .. math:: y = X\beta +Z\nu +\epsilon .. rst-class:: nag-rules-none nag-align-left +-----+--------------------------------------------------------------------------------------------+ |where|:math:`y` is a vector of :math:`n` observations on the dependent variable, | +-----+--------------------------------------------------------------------------------------------+ | |:math:`X` is a known :math:`n\times p` design matrix for the `fixed` independent variables, | +-----+--------------------------------------------------------------------------------------------+ | |:math:`\beta` is a vector of length :math:`p` of unknown `fixed effects`, | +-----+--------------------------------------------------------------------------------------------+ | |:math:`Z` is a known :math:`n\times q` design matrix for the `random` independent variables,| +-----+--------------------------------------------------------------------------------------------+ | |:math:`\nu` is a vector of length :math:`q` of unknown `random effects`, | +-----+--------------------------------------------------------------------------------------------+ |and |:math:`\epsilon` is a vector of length :math:`n` of unknown random errors. | +-----+--------------------------------------------------------------------------------------------+ Both :math:`\nu` and :math:`\epsilon` are assumed to have a Gaussian distribution with expectation zero and variance/covariance matrix defined by .. math:: \mathrm{Var}\left[\begin{array}{c}\nu \\\epsilon \end{array}\right] = \left[\begin{array}{cc}G&0\\0&R\end{array}\right] where :math:`R = \sigma_R^2I`, :math:`I` is the :math:`n\times n` identity matrix and :math:`G` is a diagonal matrix. It is assumed that the random variables, :math:`Z`, can be subdivided into :math:`g\leq q` groups with each group being identically distributed with expectation zero and variance :math:`\sigma_i^2`. The diagonal elements of matrix :math:`G`, therefore, take one of the values :math:`\left\{\sigma_i^2:i = 1,2,\ldots,g\right\}`, depending on which group the associated random variable belongs to. The model, therefore, contains three sets of unknowns: the fixed effects :math:`\beta`, the random effects :math:`\nu` and a vector of :math:`g+1` variance components :math:`\gamma`, where :math:`\gamma = \left\{\sigma_1^2, \sigma_2^2, \ldots, \sigma_{{g-1}}^2, \sigma_g^2, \sigma_R^2\right\}`. Rather than working directly with :math:`\gamma`, ``mixeff_hier_ml`` uses an iterative process to estimate :math:`\gamma^* = \left\{{\sigma_1^2/\sigma_R^2}, {\sigma_2^2/\sigma_R^2}, \ldots, {\sigma_{{g-1}}^2/\sigma_R^2}, {\sigma_g^2/\sigma_R^2}, 1\right\}`. Due to the iterative nature of the estimation a set of initial values, :math:`\gamma_0`, for :math:`\gamma^*` is required. ``mixeff_hier_ml`` allows these initial values either to be supplied by you or calculated from the data using the minimum variance quadratic unbiased estimators (MIVQUE0) suggested by Rao (1972). ``mixeff_hier_ml`` fits the model by maximizing the log-likelihood function: .. math:: -2l_R = \log\left(\left\lvert V\right\rvert \right)+n\log\left(r^\mathrm{T}V^{-1}r\right)+\log\left(2\pi /n\right) where .. math:: V = ZGZ^\mathrm{T}+R\text{, }\quad r = y-Xb\quad \text{ and }\quad b = \left(X^\mathrm{T}V^{-1}X\right)^{-1}X^\mathrm{T}V^{-1}y\text{.} Once the final estimates for :math:`\gamma^*` have been obtained, the value of :math:`\sigma_R^2` is given by .. math:: \sigma_R^2 = \left(r^\mathrm{T}V^{-1}r\right)/\left(n-p\right)\text{.} Case weights, :math:`W_c`, can be incorporated into the model by replacing :math:`X^\mathrm{T}X` and :math:`Z^\mathrm{T}Z` with :math:`X^\mathrm{T}W_cX` and :math:`Z^\mathrm{T}W_cZ` respectively, for a diagonal weight matrix :math:`W_c`. The log-likelihood, :math:`l_R`, is calculated using the sweep algorithm detailed in Wolfinger `et al.` (1994). .. _g02je-py2-py-references: **References** Goodnight, J H, 1979, `A tutorial on the SWEEP operator`, The American Statistician (33(3)), 149--158 Harville, D A, 1977, `Maximum likelihood approaches to variance component estimation and to related problems`, JASA (72), 320--340 Rao, C R, 1972, `Estimation of variance and covariance components in a linear model`, J. Am. Stat. Assoc. (67), 112--115 Stroup, W W, 1989, `Predictable functions and prediction space in the mixed model procedure`, Applications of Mixed Models in Agriculture and Related Disciplines (Southern Cooperative Series Bulletin No. 343), 39--48 Wolfinger, R, Tobias, R and Sall, J, 1994, `Computing Gaussian likelihoods and their derivatives for general linear mixed models`, SIAM Sci. Statist. Comput. (15), 1294--1310 """ raise NotImplementedError
[docs]def lmm_init(hlmm, hddesc, hfixed, y, dat, hrndm=None, wt=None): r""" ``lmm_init`` preprocesses a dataset prior to fitting a linear mixed effects regression model via :meth:`lmm_fit`. Note: this function uses optional algorithmic parameters, see also: :meth:`blgm.optset <naginterfaces.library.blgm.optset>`, :meth:`blgm.optget <naginterfaces.library.blgm.optget>`, :meth:`lmm_fit`. .. _g02jf-py2-py-doc: For full information please refer to the NAG Library document for g02jf https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jff.html .. _g02jf-py2-py-parameters: **Parameters** **hlmm** : Handle, modified in place `On entry`: must be set to a null Handle or, alternatively, an existing G22 handle may be supplied in which case ``lmm_init`` will destroy the supplied G22 handle as if :meth:`blgm.handle_free <naginterfaces.library.blgm.handle_free>` had been called. `On exit`: holds a G22 handle to the internal data structure containing a description of the model. You **must not** change the G22 handle other than through the functions in submodule ``correg`` or submodule :mod:`~naginterfaces.library.blgm`. **hddesc** : Handle A G22 handle to the internal data structure containing a description of the data matrix, :math:`D`, as returned in :math:`\textit{hddesc}` by :meth:`blgm.lm_describe_data <naginterfaces.library.blgm.lm_describe_data>`. **hfixed** : Handle A G22 handle to the internal data structure containing a description of the fixed part of the model :math:`\mathcal{M}_f` as returned in :math:`\textit{hform}` by :meth:`blgm.lm_formula <naginterfaces.library.blgm.lm_formula>`. If :math:`\mathrm{hfixed}` is a null Handle then the model is assumed to not have a fixed part. **y** : float, array-like, shape :math:`\left(n\right)` :math:`y`, the vector of observations on the dependent variable. **dat** : float, array-like, shape :math:`\left(:, :\right)` The data matrix, :math:`D`. By default, :math:`D_{{ij}}`, the :math:`\textit{i}`\ th value for the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m_d`, for :math:`\textit{i} = 1,2,\ldots,n`, should be supplied in :math:`\mathrm{dat}[i-1,j-1]`. If the option 'Storage Order', described in :meth:`blgm.lm_describe_data <naginterfaces.library.blgm.lm_describe_data>`, is set to 'VAROBS', :math:`D_{{ij}}` should be supplied in :math:`\mathrm{dat}[j-1,i-1]`. If either :math:`y_i`, :math:`w_i` or :math:`D_{{ij}}`, for a variable :math:`j` used in the model, is NaN (Not A Number) then that value is treated as missing and the whole observation is excluded from the analysis. **hrndm** : None or Handle, list, shape :math:`\left(\textit{nrndm}\right)`, optional A series of G22 handles to internal data structures containing a description of the random part of the model :math:`\mathcal{M}_r` as returned in :math:`\textit{hform}` by :meth:`blgm.lm_formula <naginterfaces.library.blgm.lm_formula>`. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional Optionally, the diagonal elements of the weight matrix :math:`W_c`. If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model and the effective number of observations is the number of observations with nonzero weights. If weights are not provided then :math:`\mathrm{wt}` must be set to **None**, and the effective number of observations is :math:`n`. **Returns** **fnlsv** : int The number of levels for the overall subject variable in :math:`\mathcal{M}_f`. If there is no overall subject variable, :math:`\mathrm{fnlsv} = 1`. **nff** : int The number of fixed effects estimated in each of the :math:`\mathrm{fnlsv}` subject blocks. The number of columns, :math:`p`, in the design matrix :math:`X` is given by :math:`p = \mathrm{nff}\times \mathrm{fnlsv}`. **rnlsv** : int The number of levels for the overall subject variable in :math:`\mathcal{M}_r`. If there is no overall subject variable, :math:`\mathrm{rnlsv} = 1`. **nrf** : int The number of random effects estimated in each of the :math:`\mathrm{rnlsv}` subject blocks. The number of columns, :math:`q`, in the design matrix :math:`Z` is given by :math:`q = \mathrm{nrf}\times \mathrm{rnlsv}`. **nvpr** : int :math:`g`, the number of variance components being estimated (excluding the overall variance, :math:`\sigma_R^2`). This is defined by the number of terms in the random part of the model, :math:`\mathcal{M}_r` (see `Algorithmic Details <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jff.html#algdetails>`__ for details). **comm** : dict, communication object Communication structure. .. _g02jf-py2-py-other_params: **Other Parameters** **'Gamma Lower Bound'** : float Default :math:`\text{} = \sqrt{\text{machine precision}}/100` A lower bound for the elements of :math:`\gamma^*`, where :math:`\gamma^* = \gamma /\sigma_R^2`. **'Gamma Upper Bound'** : float Default :math:`\text{} = 10^{20}` An upper bound for the elements of :math:`\gamma^*`, where :math:`\gamma^* = \gamma /\sigma_R^2`. **'Initial Distance'** : float Default :math:`\text{} = 100000.0` The initial distance from the solution. When :math:`\text{‘Solver'} = \texttt{'E04LB'}`, :meth:`lmm_fit` passes 'Initial Distance' to the solver as :math:`\textit{stepmx}`. When :math:`\text{‘Solver'} = \texttt{'E04UC'}`, this option is ignored. **'Initial Value Strategy'** : int Default :math:`\text{} = \text{special}` Controls how :meth:`lmm_fit` will choose the initial values for the variance components, :math:`\gamma`, if not supplied. :math:`\text{‘Initial Value Strategy'} = 0` The MIVQUE0 estimates of the variance components based on the likelihood specified by 'Likelihood' are used. :math:`\text{‘Initial Value Strategy'} = 1` The MIVQUE0 estimates based on the maximum likelihood are used, irrespective of the value of 'Likelihood'. See Rao (1972) for a description of the minimum variance quadratic unbiased estimators (MIVQUE0). By default, for small problems, :math:`\text{‘Initial Value Strategy'} = 0` and for large problems :math:`\text{‘Initial Value Strategy'} = 1`. **'Likelihood'** : str Default :math:`\text{} = \texttt{'REML'}` 'Likelihood' defines whether :meth:`lmm_fit` will use the restricted maximum likelihood (REML) or the maximum likelihood (ML) when fitting the model. **'Linear Minimization Accuracy'** : float Default :math:`\text{} = 0.9` The accuracy of the linear minimizations. When :math:`\text{‘Solver'} = \texttt{'E04LB'}`, :meth:`lmm_fit` passes 'Linear Minimization Accuracy' to the solver as :math:`\textit{eta}`. When :math:`\text{‘Solver'} = \texttt{'E04UC'}`, this option is ignored. **'Line Search Tolerance'** : float Default :math:`\text{} = 0.9` The line search tolerance. When :math:`\text{‘Solver'} = \texttt{'E04LB'}`, this option is ignored. When :math:`\text{‘Solver'} = \texttt{'E04UC'}`, :meth:`lmm_fit` passes 'Line Search Tolerance' to the solver as 'Line Search Tolerance'. **'List'** : valueless Option 'List' enables printing of each option specification as it is supplied. 'NoList' suppresses this printing. **'NoList'** : valueless Default Option 'List' enables printing of each option specification as it is supplied. 'NoList' suppresses this printing. **'Major Iteration Limit'** : int Default :math:`\text{} = \text{special}` The number of major iterations. When :math:`\text{‘Solver'} = \texttt{'E04LB'}`, :meth:`lmm_fit` passes 'Major Iteration Limit' to the solver as :math:`\textit{maxcal}`. In this case, the default value used is :math:`1000`. When :math:`\text{‘Solver'} = \texttt{'E04UC'}`, :meth:`lmm_fit` passes 'Major Iteration Limit' to the solver as 'Major Iteration Limit'. In this case, the default value used is :math:`\mathrm{max}\left(50,3\times g\right)`, where :math:`g` is the number of variance components being estimated (excluding the overall variance, :math:`\sigma_R^2`). **'Major Print Level'** : int Default :math:`\text{} = \text{special}` The frequency that monitoring information is output to 'Unit Number'. When :math:`\text{‘Solver'} = \texttt{'E04LB'}`, :meth:`lmm_fit` passes 'Major Print Level' to the solver as :math:`\textit{iprint}`. In this case, the default value used is :math:`-1` and hence no monitoring information will be output. When :math:`\text{‘Solver'} = \texttt{'E04UC'}`, :meth:`lmm_fit` passes 'Major Print Level' to the solver as 'Major Print Level'. In this case, the default value used is :math:`0` and hence no monitoring information will be output. **'Maximum Number of Threads'** : int Default :math:`\text{} = \text{special}` Controls the maximum number of threads used by :meth:`lmm_fit` in a multithreaded library. By default, the maximum number of available threads are used. In a library that is not multithreaded, this option has no effect. **'Minor Iteration Limit'** : int Default :math:`\text{} = \mathrm{max}\left(50,3\times g\right)` The number of minor iterations. When :math:`\text{‘Solver'} = \texttt{'E04LB'}`, this option is ignored. When :math:`\text{‘Solver'} = \texttt{'E04UC'}`, :meth:`lmm_fit` passes 'Minor Iteration Limit' to the solver as 'Minor Iteration Limit'. In this case, the default value used is :math:`\mathrm{max}\left(50,3\times g\right)`, where :math:`g` is the number of variance components being estimated (excluding the overall variance, :math:`\sigma_R^2`). **'Minor Print Level'** : int Default :math:`\text{} = 0` The frequency that additional monitoring information is output to 'Unit Number'. When :math:`\text{‘Solver'} = \texttt{'E04LB'}`, this option is ignored. When :math:`\text{‘Solver'} = \texttt{'E04UC'}`, :meth:`lmm_fit` passes 'Minor Print Level' to the solver as 'Minor Print Level'. The default value of :math:`0` means that no additional monitoring information will be output. **'Optimality Tolerance'** : float Default :math:`\text{} = \text{machine precision}^{0.72}` The optimality tolerance. When :math:`\text{‘Solver'} = \texttt{'E04LB'}`, this option is ignored. When :math:`\text{‘Solver'} = \texttt{'E04UC'}`, :meth:`lmm_fit` passes 'Optimality Tolerance' to the solver as 'Optimality Tolerance'. **'Parallelisation Strategy'** : int Default :math:`\text{} = \text{special}` If :math:`\text{‘Maximum Number of Threads'} > 0` then 'Parallelisation Strategy' controls how :meth:`lmm_fit` is parallelised in a multithreaded library. :math:`\text{‘Parallelisation Strategy'} = 1` :meth:`lmm_fit` will attempt to parallelise operations involving :math:`Z`, even if :math:`\mathrm{rnlsv} = 1`. :math:`\text{‘Parallelisation Strategy'} = 2` :meth:`lmm_fit` will only attempt to parallelise operations involving :math:`Z`, if :math:`\mathrm{rnlsv} > 1`. By default, :math:`\text{‘Parallelisation Strategy'} = 1`, however, for some models / datasets, this may be slower than using :math:`\text{‘Parallelisation Strategy'} = 2` when :math:`\mathrm{rnlsv} = 1`. In a library that is not multithreaded, this option has no effect. **'Solution Accuracy'** : float Default :math:`\text{} = 0.0` The accuracy to which the solution is required. When :math:`\text{‘Solver'} = \texttt{'E04LB'}`, :meth:`lmm_fit` passes 'Solution Accuracy' to the solver as :math:`\textit{xtol}`. When :math:`\text{‘Solver'} = \texttt{'E04UC'}`, this option is ignored. **'Solver'** : str Default :math:`\text{} = \text{special}` Controls which solver :meth:`lmm_fit` will use when fitting the model. By default, :math:`\text{‘Solver'} = \texttt{'E04LB'}` is used for small problems and :math:`\text{‘Solver'} = \texttt{'E04UC'}`, otherwise. If :math:`\text{‘Solver'} = \texttt{'E04LB'}`, then the solver used is the one implemented in :meth:`opt.bounds_mod_deriv2_comp <naginterfaces.library.opt.bounds_mod_deriv2_comp>` and if :math:`\text{‘Solver'} = \texttt{'E04UC'}`, then the solver used is the one implemented in :meth:`opt.nlp1_solve <naginterfaces.library.opt.nlp1_solve>`. **'Sweep Tolerance'** : float Default :math:`\text{} = \text{special}` The sweep tolerance used by :meth:`lmm_fit` when performing the sweep operation Wolfinger `et al.` (1994). The default value used is :math:`\text{‘Sweep Tolerance'} = \mathrm{max}\left(\epsilon,\epsilon \times \left(\textit{max}_i\left(\left(Z^\mathrm{T}\right)_{{ii}}\right)\right)\right)`, where :math:`\epsilon = \sqrt{\text{machine precision}}`. **'Unit Number'** : int Default :math:`= \text{advisory message unit number}` The monitoring unit number to which :meth:`lmm_fit` will send any monitoring information. .. _g02jf-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) On entry, :math:`\mathrm{hlmm}` is not a null Handle or a recognised G22 handle. (`errno` :math:`21`) :math:`\mathrm{hddesc}` has not been initialized or is corrupt. (`errno` :math:`22`) :math:`\mathrm{hddesc}` is not a G22 handle as generated by :meth:`blgm.lm_describe_data <naginterfaces.library.blgm.lm_describe_data>`. (`errno` :math:`31`) :math:`\mathrm{hfixed}` has not been initialized or is corrupt. (`errno` :math:`32`) :math:`\mathrm{hfixed}` is not a G22 handle as generated by :meth:`blgm.lm_formula <naginterfaces.library.blgm.lm_formula>`. (`errno` :math:`33`) A variable name used when creating :math:`\mathrm{hfixed}` is not present in :math:`\mathrm{hddesc}`. Variable name: :math:`\langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`41`) On entry, :math:`\textit{nrndm} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nrndm} \geq 0`. (`errno` :math:`51`) :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`. :math:`\mathrm{hrndm}[i-1]` has not been initialized or is corrupt. (`errno` :math:`52`) :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`. :math:`\mathrm{hrndm}[i-1]` is not a G22 handle as generated by :meth:`blgm.lm_formula <naginterfaces.library.blgm.lm_formula>`. (`errno` :math:`53`) No model has been specified. (`errno` :math:`54`) A variable name used when creating :math:`\mathrm{hrndm}` is not present in :math:`\mathrm{hddesc}`. Variable name: :math:`\langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`61`) On entry, :math:`\textit{lwt} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lwt} = 0` or :math:`n`. (`errno` :math:`71`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 1`. (`errno` :math:`72`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n_d = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n\leq n_d`. (`errno` :math:`73`) On entry, no observations due to zero weights or missing values. (`errno` :math:`91`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{wt}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{wt}[i-1] \geq 0.0`. (`errno` :math:`101`) On entry, column :math:`j` of the data matrix, :math:`D`, is not consistent with information supplied in :math:`\mathrm{hddesc}`, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`111`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{lddat} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lddat}\geq n`. (`errno` :math:`112`) On entry, :math:`m_d = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{lddat} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lddat}\geq m_d`. (`errno` :math:`121`) On entry, :math:`m_d = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{sddat} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{sddat}\geq m_d`. (`errno` :math:`122`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{sddat} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{sddat}\geq n`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`34`) The fixed part of the model contains categorical variables, but no intercept or main effects terms have been requested. (`errno` :math:`102`) Column :math:`j` of the data matrix, :math:`D`, required rounding more than expected when being treated as a categorical variable, :math:`j = \langle\mathit{\boldsymbol{value}}\rangle`. .. _g02jf-py2-py-notes: **Notes** ``lmm_init`` must be called prior to fitting a linear mixed effects regression model via :meth:`lmm_fit`. The model is of the form: .. math:: y = X\beta +Z\nu +\epsilon .. rst-class:: nag-rules-none nag-align-left +-----+----------------------------------------------------------------------------------+ |where|:math:`y` is a vector of :math:`n` observations on the dependent variable, | +-----+----------------------------------------------------------------------------------+ | |:math:`X` is an :math:`n\times p` design matrix of `fixed` independent variables, | +-----+----------------------------------------------------------------------------------+ | |:math:`\beta` is a vector of :math:`p` unknown `fixed effects`, | +-----+----------------------------------------------------------------------------------+ | |:math:`Z` is an :math:`n\times q` design matrix of `random` independent variables,| +-----+----------------------------------------------------------------------------------+ | |:math:`\nu` is a vector of length :math:`q` of unknown `random effects`, | +-----+----------------------------------------------------------------------------------+ | |:math:`\epsilon` is a vector of length :math:`n` of unknown random errors. | +-----+----------------------------------------------------------------------------------+ Both :math:`\nu` and :math:`\epsilon` are assumed to have a Gaussian distribution with expectation zero and variance/covariance matrix defined by .. math:: \mathrm{Var}\left[\begin{array}{c}\nu \\\epsilon \end{array}\right] = \left[\begin{array}{cc}G&0\\0&R\end{array}\right] where :math:`R = \sigma_R^2I`, :math:`I` is the :math:`n\times n` identity matrix and :math:`G` is a diagonal matrix. It is assumed that the random variables, :math:`Z`, can be subdivided into :math:`g\leq q` groups with each group being identically distributed with expectation zero and variance :math:`\sigma_i^2`. The diagonal elements of matrix :math:`G`, therefore, take one of the values :math:`\left\{\sigma_i^2:i = 1,2,\ldots,g\right\}`, depending on which group the associated random variable belongs to. The model, therefore, contains three sets of unknowns: the fixed effects :math:`\beta`, the random effects :math:`\nu` and a vector of :math:`g+1` variance components :math:`\gamma`, where :math:`\gamma = \left\{\sigma_1^2, \sigma_2^2, \ldots, \sigma_{{g-1}}^2, \sigma_g^2, \sigma_R^2\right\}`. Case weights can be incorporated into the model by replacing :math:`X` and :math:`Z` with :math:`W_c^{{1/2}}X` and :math:`W_c^{{1/2}}Z` respectively where :math:`W_c` is a diagonal weight matrix. The design matrices, :math:`X` and :math:`Z`, are constructed from an :math:`n\times m_d` data matrix, :math:`D`, a description of the fixed independent variables, :math:`\mathcal{M}_f`, and a description of the random independent variables, :math:`\mathcal{M}_r`. See `Algorithmic Details <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jff.html#algdetails>`__ for further details. .. _g02jf-py2-py-references: **References** Rao, C R, 1972, `Estimation of variance and covariance components in a linear model`, J. Am. Stat. Assoc. (67), 112--115 Wolfinger, R, Tobias, R and Sall, J, 1994, `Computing Gaussian likelihoods and their derivatives for general linear mixed models`, SIAM Sci. Statist. Comput. (15), 1294--1310 See Also -------- :meth:`naginterfaces.library.examples.correg.lmm_init_combine_ex.main` """ raise NotImplementedError
[docs]def lmm_init_combine(hlmm, xcomm, ycomm): r""" ``lmm_init_combine`` combines output from multiple calls to :meth:`lmm_init` or :meth:`mixeff_hier_init`. .. _g02jg-py2-py-doc: For full information please refer to the NAG Library document for g02jg https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jgf.html .. _g02jg-py2-py-parameters: **Parameters** **hlmm** : Handle If the two sets of communication arrays were generated using :meth:`lmm_init`, then a G22 handle as generated by one of the calls to :meth:`lmm_init`. If the two sets of communication arrays were generated using :meth:`mixeff_hier_init`, then this argument is not referenced and need not be set. **xcomm** : dict, communication object, modified in place Communication structure for :math:`D_x`. This argument must have been initialized by a prior call to :meth:`lmm_init` or :meth:`mixeff_hier_init`. **ycomm** : dict, communication object Communication structure for :math:`D_y`. This argument must have been initialized by a prior call to :meth:`lmm_init` or :meth:`mixeff_hier_init`. .. _g02jg-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) :math:`\mathrm{hlmm}` has not been initialized or is corrupt. (`errno` :math:`12`) :math:`\mathrm{hlmm}` is not a G22 handle as generated by :meth:`lmm_init`. (`errno` :math:`41`) On entry, :math:`\mathrm{xcomm}`\ ['iopts'] has not been initialized correctly. (`errno` :math:`42`) On entry, the information stored in :math:`\mathrm{xcomm}`\ ['iopts'] and :math:`\mathrm{ycomm}`\ ['iopts'] is not compatible. (`errno` :math:`71`) On entry, :math:`\mathrm{ycomm}`\ ['iopts'] has not been initialized correctly. .. _g02jg-py2-py-notes: **Notes** Let :math:`D_x` and :math:`D_y` denote two sets of data, each with :math:`m` variables and :math:`n_x` and :math:`n_y` observations respectively. Let :math:`C_x` and :math:`C_y` denote two sets of communication arrays constructed by :meth:`lmm_init` or :meth:`mixeff_hier_init` from datasets :math:`D_x` and :math:`D_y` respectively. Then, given :math:`C_x` and :math:`C_y`, ``lmm_init_combine`` constructs a set of communication arrays, :math:`C_w`, as if a dataset :math:`D_z`, with :math:`m` variables and :math:`n_x+n_y` observations were supplied to :meth:`lmm_init` or :meth:`mixeff_hier_init`, with :math:`D_z` constructed as .. math:: D_z = \left(\begin{array}{c}D_x\\D_y\end{array}\right)\text{.} Splitting, and then recombining, the data in this manner allows for datasets with an arbitrarily large number of observations (:math:`n`) to be analysed and the preprocessing routines, :meth:`lmm_init` or :meth:`mixeff_hier_init`, to be run in parallel. It should be noted that, while the information in :math:`C_z`, should be consistent with the information in the communication arrays obtained by supplying :math:`D_z` to :meth:`lmm_init` or :meth:`mixeff_hier_init`, the ordering of that information may change. In practice, this means that whilst an analysis run using a set of communication arrays constructed using ``lmm_init_combine`` should give similar results to an analysis run using a set of communication arrays constructed directly from :meth:`lmm_init` or :meth:`mixeff_hier_init` they will not necessarily be identical. In addition, the order of the parameter estimates, :math:`\nu` and :math:`\beta` may differ. See Also -------- :meth:`naginterfaces.library.examples.correg.lmm_init_combine_ex.main` """ raise NotImplementedError
[docs]def lmm_fit(hlmm, gamma, comm, wantc=False, io_manager=None): r""" ``lmm_fit`` fits a multi-level linear mixed effects regression model using restricted maximum likelihood (REML) or maximum likelihood (ML). Prior to calling ``lmm_fit`` the initialization function :meth:`lmm_init` must be called. .. _g02jh-py2-py-doc: For full information please refer to the NAG Library document for g02jh https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02jhf.html .. _g02jh-py2-py-parameters: **Parameters** **hlmm** : Handle A G22 handle to the internal data structure containing a description of the required model as returned in :math:`\textit{hlmm}` by :meth:`lmm_init`. **gamma** : float, array-like, shape :math:`\left(\textit{nvpr}+1\right)` Holds the initial values of the variance components, :math:`\gamma_0`, with :math:`\mathrm{gamma}[\textit{i}-1]` the initial value for :math:`\sigma_{\textit{i}}^2/\sigma_R^2`, for :math:`\textit{i} = 1,2,\ldots,\textit{nvpr}`. If :math:`\mathrm{gamma}[0] = {-1.0}`, the remaining elements of :math:`\mathrm{gamma}` are ignored and the initial values for the variance components are estimated from the data using MIVQUE0. **comm** : dict, communication object, modified in place Communication structure. This argument must have been initialized by a prior call to :meth:`lmm_init`. **wantc** : bool, optional Flag indicating whether the arrays :math:`\mathrm{czz}`, :math:`\mathrm{cxx}` and :math:`\mathrm{cxz}` are required. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **gamma** : float, ndarray, shape :math:`\left(\textit{nvpr}+1\right)` :math:`\mathrm{gamma}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,\textit{nvpr}`, holds the final estimate of :math:`\sigma_{\textit{i}}^2` and :math:`\mathrm{gamma}[\textit{nvpr}]` holds the final estimate for :math:`\sigma_R^2`. Labels for the variance components can be obtained using :meth:`blgm.lm_submodel <naginterfaces.library.blgm.lm_submodel>`. **effn** : int Effective number of observations. If there are no weights, or all weights are nonzero, :math:`\mathrm{effn} = {\textit{n}}`. **rnkx** : int The rank of the design matrix, :math:`X`, for the fixed effects. **ncov** : int Number of variance components not estimated to be zero. If none of the variance components are estimated to be zero, :math:`\mathrm{ncov} = \textit{nvpr}`. **lnlike** : float :math:`-2l_R\left(\hat{\gamma }\right)` where :math:`l_R` is the log of the restricted maximum likelihood calculated at :math:`\hat{\gamma }`, the estimated variance components returned in :math:`\mathrm{gamma}`. **b** : float, ndarray, shape :math:`\left(\textit{fnlsv}\times \textit{nff}+\textit{rnlsv}\times \textit{nrf}\right)` The parameter estimates, with the first :math:`{\textit{nrf}}\times {\textit{rnlsv}}` elements of :math:`\mathrm{b}` containing the parameter estimates for the random effects, :math:`\nu`, and the remaining :math:`{\textit{nff}}\times {\textit{fnlsv}}` elements containing the parameter estimates for the fixed effects, :math:`\beta`. Labels for the parameter estimates can be obtained using :meth:`blgm.lm_submodel <naginterfaces.library.blgm.lm_submodel>`. **se** : float, ndarray, shape :math:`\left(\textit{fnlsv}\times \textit{nff}+\textit{rnlsv}\times \textit{nrf}\right)` The standard errors of the parameter estimates given in :math:`\mathrm{b}`. **czz** : None or float, ndarray, shape :math:`\left(:, :\right)` If :math:`{\textit{rnlsv}} = 1`, :math:`\mathrm{czz}` holds the lower triangular portion of the matrix :math:`\left(1/\sigma^2\right)\left(Z^\mathrm{T} \hat{R}^{-1}Z + \hat{G}^{-1}\right)`, where :math:`\hat{R}` and :math:`\hat{G}` are the estimates of :math:`R` and :math:`G` respectively. If :math:`{\textit{rnlsv}} > 1`, then :math:`\mathrm{czz}` holds this matrix in compressed form, with the first :math:`\textit{nrf}` columns holding the part of the matrix corresponding to the first level of the overall random subject variable, the next :math:`\textit{nrf}` columns holding the part of the matrix corresponding to the second level of the overall random subject variable etc. If :math:`\mathrm{wantc} = \mathbf{False}`, :math:`\mathrm{czz}` is returned as **None**. **cxx** : None or float, ndarray, shape :math:`\left(:, :\right)` If :math:`{\textit{fnlsv}} = 1`, :math:`\mathrm{cxx}` holds the lower triangular portion of the matrix :math:`\left(1/\sigma^2\right)\left(X^\mathrm{T} \hat{V}^{-1}X\right)`, where :math:`\hat{V}` is the estimated value of :math:`V`. If :math:`{\textit{fnlsv}} > 1`, then :math:`\mathrm{cxx}` holds this matrix in compressed form, with the first :math:`\textit{nff}` columns holding the part of the matrix corresponding to the first level of the overall fixed subject variable, the next :math:`\textit{nff}` columns holding the part of the matrix corresponding to the second level of the overall fixed subject variable, etc. If :math:`\mathrm{wantc} = \mathbf{False}`, :math:`\mathrm{cxx}` is returned as **None**. **cxz** : None or float, ndarray, shape :math:`\left(:, :\right)` :math:`\mathrm{cxz}` holds the matrix :math:`\left(1/\sigma^2\right)\left(X^\mathrm{T}\hat{V}^{-1}Z\right)\hat{G}`, where :math:`\hat{V}` and :math:`\hat{G}` are the estimates of :math:`V` and :math:`G` respectively. If :math:`\mathrm{wantc} = \mathbf{False}`, :math:`\mathrm{cxz}` is returned as **None**. .. _g02jh-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) :math:`\mathrm{hlmm}` has not been initialized or is corrupt. (`errno` :math:`12`) :math:`\mathrm{hlmm}` is not a G22 handle as generated by :meth:`lmm_init`. (`errno` :math:`21`) On entry, :math:`\textit{nvpr} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nvpr} \geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`31`) On entry, :math:`\mathrm{gamma}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{gamma}[0] = {-1.0}` or :math:`\mathrm{gamma}[i-1] \geq 0.0`. (`errno` :math:`81`) On entry, :math:`\textit{lb} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lb} \geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`171`) On entry, the communication arrays, :math:`\mathrm{comm}`\ ['iopts'] and :math:`\mathrm{comm}`\ ['opts'], have not been initialized correctly. **Warns** **NagAlgorithmicWarning** (`errno` :math:`1001`) Optimal solution found, but requested accuracy not achieved. (`errno` :math:`1002`) Too many major iterations. (`errno` :math:`1003`) Current point cannot be improved upon. (`errno` :math:`1004`) At least one negative estimate for :math:`\mathrm{gamma}` was obtained. All negative estimates have been set to zero. .. _g02jh-py2-py-notes: **Notes** ``lmm_fit`` fits a model of the form: .. math:: y = X\beta +Z\nu +\epsilon .. rst-class:: nag-rules-none nag-align-left +-----+--------------------------------------------------------------------------------------------+ |where|:math:`y` is a vector of :math:`n` observations on the dependent variable, | +-----+--------------------------------------------------------------------------------------------+ | |:math:`X` is a known :math:`n\times p` design matrix for the `fixed` independent variables, | +-----+--------------------------------------------------------------------------------------------+ | |:math:`\beta` is a vector of length :math:`p` of unknown `fixed effects`, | +-----+--------------------------------------------------------------------------------------------+ | |:math:`Z` is a known :math:`n\times q` design matrix for the `random` independent variables,| +-----+--------------------------------------------------------------------------------------------+ | |:math:`\nu` is a vector of length :math:`q` of unknown `random effects`, | +-----+--------------------------------------------------------------------------------------------+ |and |:math:`\epsilon` is a vector of length :math:`n` of unknown random errors. | +-----+--------------------------------------------------------------------------------------------+ Both :math:`\nu` and :math:`\epsilon` are assumed to have a Gaussian distribution with expectation zero and variance/covariance matrix defined by .. math:: \mathrm{Var}\left[\begin{array}{c}\nu \\\epsilon \end{array}\right] = \left[\begin{array}{cc}G&0\\0&R\end{array}\right] where :math:`R = \sigma_R^2I`, :math:`I` is the :math:`n\times n` identity matrix and :math:`G` is a diagonal matrix. It is assumed that the random variables, :math:`Z`, can be subdivided into :math:`g\leq q` groups with each group being identically distributed with expectation zero and variance :math:`\sigma_i^2`. The diagonal elements of matrix :math:`G`, therefore, take one of the values :math:`\left\{\sigma_i^2:i = 1,2,\ldots,g\right\}`, depending on which group the associated random variable belongs to. The model, therefore, contains three sets of unknowns: the fixed effects :math:`\beta`, the random effects :math:`\nu` and a vector of :math:`g+1` variance components :math:`\gamma`, where :math:`\gamma = \left\{\sigma_1^2, \sigma_2^2, \ldots, \sigma_{{g-1}}^2, \sigma_g^2, \sigma_R^2\right\}`. Rather than working directly with :math:`\gamma`, ``lmm_fit`` uses an iterative process to estimate :math:`\gamma^* = \left\{{\sigma_1^2/\sigma_R^2}, {\sigma_2^2/\sigma_R^2}, \ldots, {\sigma_{{g-1}}^2/\sigma_R^2}, {\sigma_g^2/\sigma_R^2}, 1\right\}`. Due to the iterative nature of the estimation a set of initial values, :math:`\gamma_0`, for :math:`\gamma^*` is required. ``lmm_fit`` allows these initial values either to be supplied by you or calculated from the data using the minimum variance quadratic unbiased estimators (MIVQUE0) suggested by Rao (1972). ``lmm_fit`` fits the model by maximizing the restricted log-likelihood function: .. math:: -2l_R = \log\left(\left\lvert V\right\rvert \right)+\left(n-p\right)\log\left(r^\mathrm{T}V^{-1}r\right)+\log\left(\left\lvert X^\mathrm{T}V^{-1}X\right\rvert \right)+\left(n-p\right)\left(1+\log\left(2\pi /\left(n-p\right)\right)\right) or the log-likelihood function: .. math:: -2l_R = \log\left(\left\lvert V\right\rvert \right)+n\log\left(r^\mathrm{T}V^{-1}r\right)+\log\left(2\pi /n\right) where .. math:: V = ZGZ^\mathrm{T}+R\text{, }\quad r = y-Xb\quad \text{ and }\quad b = \left(X^\mathrm{T}V^{-1}X\right)^{-1}X^\mathrm{T}V^{-1}y\text{.} By default the restricted log-likelihood function is used, the log-likelihood function can be chosen through the option 'Solver' as detailed in the documentation for :meth:`lmm_init`. Once the final estimates for :math:`\gamma^*` have been obtained, the value of :math:`\sigma_R^2` is given by .. math:: \sigma_R^2 = \left(r^\mathrm{T}V^{-1}r\right)/\left(n-p\right)\text{.} Case weights, :math:`W_c`, can be incorporated into the model by replacing :math:`X^\mathrm{T}X` and :math:`Z^\mathrm{T}Z` with :math:`X^\mathrm{T}W_cX` and :math:`Z^\mathrm{T}W_cZ` respectively, for a diagonal weight matrix :math:`W_c`. The log-likelihood, :math:`l_R`, is calculated using the sweep algorithm detailed in Wolfinger `et al.` (1994). .. _g02jh-py2-py-references: **References** Goodnight, J H, 1979, `A tutorial on the SWEEP operator`, The American Statistician (33(3)), 149--158 Harville, D A, 1977, `Maximum likelihood approaches to variance component estimation and to related problems`, JASA (72), 320--340 Rao, C R, 1972, `Estimation of variance and covariance components in a linear model`, J. Am. Stat. Assoc. (67), 112--115 Stroup, W W, 1989, `Predictable functions and prediction space in the mixed model procedure`, Applications of Mixed Models in Agriculture and Related Disciplines (Southern Cooperative Series Bulletin No. 343), 39--48 Wolfinger, R, Tobias, R and Sall, J, 1994, `Computing Gaussian likelihoods and their derivatives for general linear mixed models`, SIAM Sci. Statist. Comput. (15), 1294--1310 See Also -------- :meth:`naginterfaces.library.examples.correg.lmm_init_combine_ex.main` """ raise NotImplementedError
[docs]def ridge_opt(x, isx, y, h, opt, niter, tol, orig, optloo, tau=0.0): r""" ``ridge_opt`` calculates a ridge regression, optimizing the ridge parameter according to one of four prediction error criteria. .. _g02ka-py2-py-doc: For full information please refer to the NAG Library document for g02ka https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02kaf.html .. _g02ka-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` The values of independent variables in the data matrix :math:`X`. **isx** : int, array-like, shape :math:`\left(m\right)` Indicates which :math:`m` independent variables are included in the model. :math:`\mathrm{isx}[j-1] = 1` The :math:`j`\ th variable in :math:`\mathrm{x}` will be included in the model. :math:`\mathrm{isx}[j-1] = 0` Variable :math:`j` is excluded. **y** : float, array-like, shape :math:`\left(n\right)` The :math:`n` values of the dependent variable :math:`y`. **h** : float An initial value for the ridge regression parameter :math:`h`; used as a starting point for the optimization. **opt** : int The measure of prediction error used to optimize the ridge regression parameter :math:`h`. The value of :math:`\mathrm{opt}` must be set equal to one of: :math:`\mathrm{opt} = 1` Generalized cross-validation (GCV); :math:`\mathrm{opt} = 2` Unbiased estimate of variance (UEV) :math:`\mathrm{opt} = 3` Future prediction error (FPE) :math:`\mathrm{opt} = 4` Bayesian information criteron (BIC). **niter** : int The maximum number of iterations allowed to optimize the ridge regression parameter :math:`h`. **tol** : float Iterations of the ridge regression parameter :math:`h` will halt when consecutive values of :math:`h` lie within :math:`\mathrm{tol}`. **orig** : int If :math:`\mathrm{orig} = 1`, the parameter estimates :math:`b` are calculated for the original data; otherwise :math:`\mathrm{orig} = 2` and the parameter estimates :math:`\tilde{b}` are calculated for the standardized data. **optloo** : int If :math:`\mathrm{optloo} = 2`, the leave-one-out cross-validation estimate of prediction error is calculated; otherwise no such estimate is calculated and :math:`\mathrm{optloo} = 1`. **tau** : float, optional Singular values less than :math:`\mathrm{tau}` of the SVD of the data matrix :math:`X` will be set equal to zero. **Returns** **h** : float :math:`\mathrm{h}` is the optimized value of the ridge regression parameter :math:`h`. **niter** : int The number of iterations used to optimize the ridge regression parameter :math:`h` within :math:`\mathrm{tol}`. **nep** : float The number of effective parameters, :math:`\gamma`, in the model. **b** : float, ndarray, shape :math:`\left(\textit{ip}+1\right)` Contains the intercept and parameter estimates for the fitted ridge regression model in the order indicated by :math:`\mathrm{isx}`. The first element of :math:`\mathrm{b}` contains the estimate for the intercept; :math:`\mathrm{b}[\textit{j}]` contains the parameter estimate for the :math:`\textit{j}`\ th independent variable in the model, for :math:`\textit{j} = 1,2,\ldots,\textit{ip}`. **vif** : float, ndarray, shape :math:`\left(\textit{ip}\right)` The variance inflation factors in the order indicated by :math:`\mathrm{isx}`. For the :math:`\textit{j}`\ th independent variable in the model, :math:`\mathrm{vif}[\textit{j}-1]` is the value of :math:`v_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\textit{ip}`. **res** : float, ndarray, shape :math:`\left(n\right)` :math:`\mathrm{res}[\textit{i}-1]` is the value of the :math:`\textit{i}`\ th residual for the fitted ridge regression model, for :math:`\textit{i} = 1,2,\ldots,n`. **rss** : float The sum of squares of residual values. **df** : int The degrees of freedom for the residual sum of squares :math:`\mathrm{rss}`. **perr** : float, ndarray, shape :math:`\left(5\right)` The first four elements contain, in this order, the measures of prediction error: GCV, UEV, FPE and BIC. If :math:`\mathrm{optloo} = 2`, :math:`\mathrm{perr}[4]` is the LOOCV estimate of prediction error; otherwise :math:`\mathrm{perr}[4]` is not referenced. .. _g02ka-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{tau} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tau}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{opt} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{opt} = 1`, :math:`2`, :math:`3` or :math:`4`. (`errno` :math:`1`) On entry, :math:`\mathrm{h} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{h} > 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{optloo} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{optloo} = 1` or :math:`2`. (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol} > 0.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{niter} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{niter} \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{orig} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{orig} = 1` or :math:`2`. (`errno` :math:`2`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \leq n`. (`errno` :math:`2`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`; :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \textit{ip}\leq m`. (`errno` :math:`2`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{isx}[j-1] = 0` or :math:`1`. (`errno` :math:`2`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{sum}\left(\mathrm{isx}\right) = \textit{ip}`. (`errno` :math:`3`) SVD failed to converge. **Warns** **NagAlgorithmicWarning** (`errno` :math:`-1`) Maximum number of iterations used. .. _g02ka-py2-py-notes: **Notes** A linear model has the form: .. math:: y = c+X\beta +\epsilon \text{,} where :math:`y` is an :math:`n\times 1` matrix of values of a dependent variable; :math:`c` is a scalar intercept term; :math:`X` is an :math:`n\times m` matrix of values of independent variables; :math:`\beta` is an :math:`m\times 1` matrix of unknown values of parameters; :math:`\epsilon` is an :math:`n\times 1` matrix of unknown random errors such that variance of :math:`\epsilon = \sigma^2I`. Let :math:`\tilde{X}` be the mean-centred :math:`X` and :math:`\tilde{y}` the mean-centred :math:`y`. Furthermore, :math:`\tilde{X}` is scaled such that the diagonal elements of the cross product matrix :math:`\tilde{X}^\mathrm{T}\tilde{X}` are one. The linear model now takes the form: .. math:: \tilde{y} = \tilde{X}\tilde{\beta }+\epsilon \text{.} Ridge regression estimates the parameters :math:`\tilde{\beta }` in a penalised least squares sense by finding the :math:`\tilde{b}` that minimizes .. math:: \left\lVert \tilde{X}\tilde{b}-\tilde{y}\right\rVert^2+h\left\lVert \tilde{b}\right\rVert^2\text{,}\quad h > 0\text{,} where :math:`\left\lVert ·\right\rVert` denotes the :math:`\ell_2`-norm and :math:`h` is a scalar regularization or ridge parameter. For a given value of :math:`h`, the parameter estimates :math:`\tilde{b}` are found by evaluating .. math:: \tilde{b} = \left(\tilde{X}^\mathrm{T}\tilde{X}+hI\right)^{-1}\tilde{X}^\mathrm{T}\tilde{y}\text{.} Note that if :math:`h = 0` the ridge regression solution is equivalent to the ordinary least squares solution. Rather than calculate the inverse of (:math:`\tilde{X}^\mathrm{T}\tilde{X}+hI`) directly, ``ridge_opt`` uses the singular value decomposition (SVD) of :math:`\tilde{X}`. After decomposing :math:`\tilde{X}` into :math:`UDV^\mathrm{T}` where :math:`U` and :math:`V` are orthogonal matrices and :math:`D` is a diagonal matrix, the parameter estimates become .. math:: \tilde{b} = V \left(D^\mathrm{T}D+hI\right)^{-1}DU^\mathrm{T}\tilde{y}\text{.} A consequence of introducing the ridge parameter is that the effective number of parameters, :math:`\gamma`, in the model is given by the sum of diagonal elements of .. math:: D^\mathrm{T}D \left(D^\mathrm{T}D+hI\right)^{-1}\text{,} see Moody (1992) for details. Any multi-collinearity in the design matrix :math:`X` may be highlighted by calculating the variance inflation factors for the fitted model. The :math:`j`\ th variance inflation factor, :math:`v_j`, is a scaled version of the multiple correlation coefficient between independent variable :math:`j` and the other independent variables, :math:`R_j`, and is given by .. math:: v_j = \frac{1}{{1-R_j}}\text{,}\quad j = 1,2,\ldots,m\text{.} The :math:`m` variance inflation factors are calculated as the diagonal elements of the matrix: .. math:: \left(\tilde{X}^\mathrm{T}\tilde{X}+hI\right)^{-1}\tilde{X}^\mathrm{T}\tilde{X} \left(\tilde{X}^\mathrm{T}\tilde{X}+hI\right)^{-1}\text{,} which, using the SVD of :math:`\tilde{X}`, is equivalent to the diagonal elements of the matrix: .. math:: V \left(D^\mathrm{T}D+hI\right)^{-1}D^\mathrm{T}D \left(D^\mathrm{T}D+hI\right)^{-1}V^\mathrm{T}\text{.} Although parameter estimates :math:`\tilde{b}` are calculated by using :math:`\tilde{X}`, it is usual to report the parameter estimates :math:`b` associated with :math:`X`. These are calculated from :math:`\tilde{b}`, and the means and scalings of :math:`X`. Optionally, either :math:`\tilde{b}` or :math:`b` may be calculated. The method can adopt one of four criteria to minimize while calculating a suitable value for :math:`h`: (a) Generalized cross-validation (GCV): .. math:: \frac{{ns}}{\left(n-\gamma \right)^2}\text{;} (#) Unbiased estimate of variance (UEV): .. math:: \frac{s}{{n-\gamma }}\text{;} (#) Future prediction error (FPE): .. math:: \frac{1}{n}\left(s+\frac{{2\gamma s}}{{n-\gamma }}\right)\text{;} (#) Bayesian information criterion (BIC): .. math:: \frac{1}{n}\left(s+\frac{{\log\left(n\right)\gamma s}}{{n-\gamma }}\right)\text{;} where :math:`s` is the sum of squares of residuals. However, the function returns all four of the above prediction errors regardless of the one selected to minimize the ridge parameter, :math:`h`. Furthermore, the function will optionally return the leave-one-out cross-validation (LOOCV) prediction error. .. _g02ka-py2-py-references: **References** Hastie, T, Tibshirani, R and Friedman, J, 2003, `The Elements of Statistical Learning: Data Mining, Inference and Prediction`, Springer Series in Statistics Moody, J.E., 1992, `The effective number of parameters: An analysis of generalisation and regularisation in nonlinear learning systems`, In: Neural Information Processing Systems, (eds J E Moody, S J Hanson, and R P Lippmann), 4, 847--854, Morgan Kaufmann San Mateo CA See Also -------- :meth:`naginterfaces.library.examples.correg.ridge_opt_ex.main` """ raise NotImplementedError
[docs]def ridge(x, isx, y, h, wantb, wantvf, pec=None): r""" ``ridge`` calculates a ridge regression, with ridge parameters supplied by you. .. _g02kb-py2-py-doc: For full information please refer to the NAG Library document for g02kb https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02kbf.html .. _g02kb-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` The values of independent variables in the data matrix :math:`X`. **isx** : int, array-like, shape :math:`\left(m\right)` Indicates which :math:`m` independent variables are included in the model. :math:`\mathrm{isx}[j-1] = 1` The :math:`j`\ th variable in :math:`\mathrm{x}` will be included in the model. :math:`\mathrm{isx}[j-1] = 0` Variable :math:`j` is excluded. **y** : float, array-like, shape :math:`\left(n\right)` The :math:`n` values of the dependent variable :math:`y`. **h** : float, array-like, shape :math:`\left(\textit{lh}\right)` :math:`\mathrm{h}[j-1]` is the value of the :math:`j`\ th ridge parameter :math:`h`. **wantb** : int Defines the options for parameter estimates. :math:`\mathrm{wantb} = 0` Parameter estimates are not calculated and :math:`\mathrm{b}` is not referenced. :math:`\mathrm{wantb} = 1` Parameter estimates :math:`b` are calculated for the original data. :math:`\mathrm{wantb} = 2` Parameter estimates :math:`\tilde{b}` are calculated for the standardized data. **wantvf** : int Defines the options for variance inflation factors. :math:`\mathrm{wantvf} = 0` Variance inflation factors are not calculated and the array :math:`\mathrm{vf}` is not referenced. :math:`\mathrm{wantvf} = 1` Variance inflation factors are calculated. **pec** : None or str, length 1, array-like, shape :math:`\left(\textit{lpec}\right)`, optional If :math:`\mathrm{pec}` is not **None**, :math:`\mathrm{pec}[\textit{j}-1]` defines the :math:`\textit{j}`\ th prediction error, for :math:`\textit{j} = 1,2,\ldots,\textit{lpec}`; otherwise :math:`\mathrm{pec}` is not referenced. :math:`\mathrm{pec}[j-1] = \texttt{'B'}` Bayesian information criterion (BIC). :math:`\mathrm{pec}[j-1] = \texttt{'F'}` Future prediction error (FPE). :math:`\mathrm{pec}[j-1] = \texttt{'G'}` Generalized cross-validation (GCV). :math:`\mathrm{pec}[j-1] = \texttt{'L'}` Leave-one-out cross-validation (LOOCV). :math:`\mathrm{pec}[j-1] = \texttt{'U'}` Unbiased estimate of variance (UEV). **Returns** **nep** : float, ndarray, shape :math:`\left(\textit{lh}\right)` :math:`\mathrm{nep}[\textit{j}-1]` is the number of effective parameters, :math:`\gamma`, in the :math:`\textit{j}`\ th model, for :math:`\textit{j} = 1,2,\ldots,\textit{lh}`. **b** : float, ndarray, shape :math:`\left(:, :\right)` If :math:`\mathrm{wantb} \neq 0`, :math:`\mathrm{b}` contains the intercept and parameter estimates for the fitted ridge regression model in the order indicated by :math:`\mathrm{isx}`. :math:`\mathrm{b}[0,\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,\textit{lh}`, contains the estimate for the intercept; :math:`\mathrm{b}[\textit{i},j-1]` contains the parameter estimate for the :math:`\textit{i}`\ th independent variable in the model fitted with ridge parameter :math:`\mathrm{h}[j-1]`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. **vf** : float, ndarray, shape :math:`\left(:, :\right)` If :math:`\mathrm{wantvf} = 1`, the variance inflation factors. For the :math:`\textit{i}`\ th independent variable in a model fitted with ridge parameter :math:`\mathrm{h}[j-1]`, :math:`\mathrm{vf}[\textit{i}-1,j-1]` is the value of :math:`v_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. **pe** : None or float, ndarray, shape :math:`\left(:, :\right)` If :math:`\mathrm{pec}` is **None** on entry, :math:`\mathrm{pe}` is **None**; otherwise :math:`\mathrm{pe}[\textit{i}-1,\textit{j}-1]` contains the prediction error of criterion :math:`\mathrm{pec}[\textit{i}-1]` for the model fitted with ridge parameter :math:`\mathrm{h}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,\textit{lh}`, for :math:`\textit{i} = 1,2,\ldots,\textit{lpec}`. .. _g02kb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 1`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \leq n`. (`errno` :math:`1`) On entry, :math:`\mathrm{h}[j-1] < 0` for at least one :math:`j`. Constraint: :math:`\mathrm{h}[j-1] \leq 0.0`, for all :math:`j`. (`errno` :math:`1`) On entry, :math:`\textit{lh} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lh} > 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{wantb} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{wantb} = 0`, :math:`1` or :math:`2`. (`errno` :math:`1`) On entry, :math:`\mathrm{wantvf} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{wantvf} = 0` or :math:`1`. (`errno` :math:`1`) On entry, :math:`\mathrm{pec}[j-1]` is invalid for at least one :math:`j`. Constraint: if :math:`\mathrm{pec}` is not **None**, :math:`\mathrm{pec}[\textit{j}-1] = \texttt{'B'}`, :math:`\texttt{'F'}`, :math:`\texttt{'G'}`, :math:`\texttt{'L'}` or :math:`\texttt{'U'}`, for all :math:`j`. (`errno` :math:`2`) On entry, :math:`\mathrm{isx}[j-1] \neq 0` or :math:`1` for at least one :math:`j`. Constraint: :math:`\mathrm{isx}[j-1] = 0` or :math:`1`, for all :math:`j`. (`errno` :math:`2`) On entry, :math:`\textit{ip}` is not equal to the sum of elements in :math:`\mathrm{isx}`. Constraint: exactly :math:`\textit{ip}` elements of :math:`\mathrm{isx}` must be equal to :math:`1`. (`errno` :math:`2`) On entry, :math:`\textit{ldb} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{wantb} \neq 0`, :math:`\textit{ldb} \geq {\textit{ip}+1}`. (`errno` :math:`2`) On entry, :math:`\textit{ldvf} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{wantvf} \neq 0`, :math:`\textit{ldvf} \geq \textit{ip}`. (`errno` :math:`3`) On entry, :math:`\mathrm{wantb} = 0` and :math:`\mathrm{wantvf} = 0`. Constraint: :math:`\mathrm{wantb} = 0`, :math:`\mathrm{wantvf} = 1`. .. _g02kb-py2-py-notes: **Notes** A linear model has the form: .. math:: y = c+X\beta +\epsilon \text{,} where :math:`y` is an :math:`n\times 1` matrix of values of a dependent variable; :math:`c` is a scalar intercept term; :math:`X` is an :math:`n\times m` matrix of values of independent variables; :math:`\beta` is an :math:`m\times 1` matrix of unknown values of parameters; :math:`\epsilon` is an :math:`n\times 1` matrix of unknown random errors such that variance of :math:`{\epsilon = \sigma }^2I`. Let :math:`\tilde{X}` be the mean-centred :math:`X` and :math:`\tilde{y}` the mean-centred :math:`y`. Furthermore, :math:`\tilde{X}` is scaled such that the diagonal elements of the cross product matrix :math:`\tilde{X}^\mathrm{T}\tilde{X}` are one. The linear model now takes the form: .. math:: \tilde{y} = \tilde{X}\tilde{\beta }+\epsilon \text{.} Ridge regression estimates the parameters :math:`\tilde{\beta }` in a penalised least squares sense by finding the :math:`\tilde{b}` that minimizes .. math:: \left\lVert \tilde{X}\tilde{b}-\tilde{y}\right\rVert^2+h\left\lVert \tilde{b}\right\rVert^2\text{,}\quad h > 0\text{,} where :math:`\left\lVert ·\right\rVert` denotes the :math:`\ell_2`-norm and :math:`h` is a scalar regularization or ridge parameter. For a given value of :math:`h`, the parameters estimates :math:`\tilde{b}` are found by evaluating .. math:: \tilde{b} = \left(\tilde{X}^\mathrm{T}\tilde{X}+hI\right)^{-1}\tilde{X}^\mathrm{T}\tilde{y}\text{.} Note that if :math:`h = 0` the ridge regression solution is equivalent to the ordinary least squares solution. Rather than calculate the inverse of (:math:`\tilde{X}^\mathrm{T}\tilde{X}+hI`) directly, ``ridge`` uses the singular value decomposition (SVD) of :math:`\tilde{X}`. After decomposing :math:`\tilde{X}` into :math:`UDV^\mathrm{T}` where :math:`U` and :math:`V` are orthogonal matrices and :math:`D` is a diagonal matrix, the parameter estimates become .. math:: \tilde{b} = V \left(D^\mathrm{T}D+hI\right)^{-1}DU^\mathrm{T}\tilde{y}\text{.} A consequence of introducing the ridge parameter is that the effective number of parameters, :math:`\gamma`, in the model is given by the sum of diagonal elements of .. math:: D^\mathrm{T}D \left(D^\mathrm{T}D+hI\right)^{-1}\text{,} see Moody (1992) for details. Any multi-collinearity in the design matrix :math:`X` may be highlighted by calculating the variance inflation factors for the fitted model. The :math:`j`\ th variance inflation factor, :math:`v_j`, is a scaled version of the multiple correlation coefficient between independent variable :math:`j` and the other independent variables, :math:`R_j`, and is given by .. math:: v_j = \frac{1}{{1-R_j}}\text{,}\quad j = 1,2,\ldots,m\text{.} The :math:`m` variance inflation factors are calculated as the diagonal elements of the matrix: .. math:: \left(\tilde{X}^\mathrm{T}\tilde{X}+hI\right)^{-1}\tilde{X}^\mathrm{T}\tilde{X} \left(\tilde{X}^\mathrm{T}\tilde{X}+hI\right)^{-1}\text{,} which, using the SVD of :math:`\tilde{X}`, is equivalent to the diagonal elements of the matrix: .. math:: V \left(D^\mathrm{T}D+hI\right)^{-1}D^\mathrm{T}D \left(D^\mathrm{T}D+hI\right)^{-1}V^\mathrm{T}\text{.} Given a value of :math:`h`, any or all of the following prediction criteria are available: (a) Generalized cross-validation (GCV): .. math:: \frac{{ns}}{\left(n-\gamma \right)^2}\text{;} (#) Unbiased estimate of variance (UEV): .. math:: \frac{s}{{n-\gamma }}\text{;} (#) Future prediction error (FPE): .. math:: \frac{1}{n}\left(s+\frac{{2\gamma s}}{{n-\gamma }}\right)\text{;} (#) Bayesian information criterion (BIC): .. math:: \frac{1}{n}\left(s+\frac{{\log\left(n\right)\gamma s}}{{n-\gamma }}\right)\text{;} (#) Leave-one-out cross-validation (LOOCV), where :math:`s` is the sum of squares of residuals. Although parameter estimates :math:`\tilde{b}` are calculated by using :math:`\tilde{X}`, it is usual to report the parameter estimates :math:`b` associated with :math:`X`. These are calculated from :math:`\tilde{b}`, and the means and scalings of :math:`X`. Optionally, either :math:`\tilde{b}` or :math:`b` may be calculated. .. _g02kb-py2-py-references: **References** Hastie, T, Tibshirani, R and Friedman, J, 2003, `The Elements of Statistical Learning: Data Mining, Inference and Prediction`, Springer Series in Statistics Moody, J.E., 1992, `The effective number of parameters: An analysis of generalisation and regularisation in nonlinear learning systems`, In: Neural Information Processing Systems, (eds J E Moody, S J Hanson, and R P Lippmann), 4, 847--854, Morgan Kaufmann San Mateo CA """ raise NotImplementedError
[docs]def pls_svd(x, isx, y, iscale, xstd, ystd, maxfac, io_manager=None): r""" ``pls_svd`` fits an orthogonal scores partial least squares (PLS) regression by using singular value decomposition. .. _g02la-py2-py-doc: For full information please refer to the NAG Library document for g02la https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02laf.html .. _g02la-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, \textit{mx}\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th predictor variable, for :math:`\textit{j} = 1,2,\ldots,\textit{mx}`, for :math:`\textit{i} = 1,2,\ldots,n`. **isx** : int, array-like, shape :math:`\left(\textit{mx}\right)` Indicates which predictor variables are to be included in the model. :math:`\mathrm{isx}[j-1] = 1` The :math:`j`\ th predictor variable (with variates in the :math:`j`\ th column of :math:`X`) is included in the model. :math:`\mathrm{isx}[j-1] = 0` Otherwise. **y** : float, array-like, shape :math:`\left(n, \textit{my}\right)` :math:`\mathrm{y}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th response variable, for :math:`\textit{j} = 1,2,\ldots,\textit{my}`, for :math:`\textit{i} = 1,2,\ldots,n`. **iscale** : int Indicates how predictor variables are scaled. :math:`\mathrm{iscale} = 1` Data are scaled by the standard deviation of variables. :math:`\mathrm{iscale} = 2` Data are scaled by user-supplied scalings. :math:`\mathrm{iscale} = -1` No scaling. **xstd** : float, array-like, shape :math:`\left(\textit{ip}\right)` If :math:`\mathrm{iscale} = 2`, :math:`\mathrm{xstd}[\textit{j}-1]` must contain the user-supplied scaling for the :math:`\textit{j}`\ th predictor variable in the model, for :math:`\textit{j} = 1,2,\ldots,\textit{ip}`. Otherwise :math:`\mathrm{xstd}` need not be set. **ystd** : float, array-like, shape :math:`\left(\textit{my}\right)` If :math:`\mathrm{iscale} = 2`, :math:`\mathrm{ystd}[\textit{j}-1]` must contain the user-supplied scaling for the :math:`\textit{j}`\ th response variable in the model, for :math:`\textit{j} = 1,2,\ldots,\textit{my}`. Otherwise :math:`\mathrm{ystd}` need not be set. **maxfac** : int :math:`k`, the number of latent variables to calculate. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **xbar** : float, ndarray, shape :math:`\left(\textit{ip}\right)` Mean values of predictor variables in the model. **ybar** : float, ndarray, shape :math:`\left(\textit{my}\right)` The mean value of each response variable. **xstd** : float, ndarray, shape :math:`\left(\textit{ip}\right)` If :math:`\mathrm{iscale} = 1`, standard deviations of predictor variables in the model. Otherwise :math:`\mathrm{xstd}` is not changed. **ystd** : float, ndarray, shape :math:`\left(\textit{my}\right)` If :math:`\mathrm{iscale} = 1`, the standard deviation of each response variable. Otherwise :math:`\mathrm{ystd}` is not changed. **xres** : float, ndarray, shape :math:`\left(n, \textit{ip}\right)` The predictor variables' residual matrix :math:`X_k`. **yres** : float, ndarray, shape :math:`\left(n, \textit{my}\right)` The residuals for each response variable, :math:`Y_k`. **w** : float, ndarray, shape :math:`\left(\textit{ip}, \mathrm{maxfac}\right)` The :math:`\textit{j}`\ th column of :math:`W` contains the :math:`x`-weights :math:`w_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **p** : float, ndarray, shape :math:`\left(\textit{ip}, \mathrm{maxfac}\right)` The :math:`\textit{j}`\ th column of :math:`P` contains the :math:`x`-loadings :math:`p_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **t** : float, ndarray, shape :math:`\left(n, \mathrm{maxfac}\right)` The :math:`\textit{j}`\ th column of :math:`T` contains the :math:`x`-scores :math:`t_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **c** : float, ndarray, shape :math:`\left(\textit{my}, \mathrm{maxfac}\right)` The :math:`\textit{j}`\ th column of :math:`C` contains the :math:`y`-loadings :math:`c_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **u** : float, ndarray, shape :math:`\left(n, \mathrm{maxfac}\right)` The :math:`\textit{j}`\ th column of :math:`U` contains the :math:`y`-scores :math:`u_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **xcv** : float, ndarray, shape :math:`\left(\mathrm{maxfac}\right)` :math:`\mathrm{xcv}[\textit{j}-1]` contains the cumulative percentage of variance in the predictor variables explained by the first :math:`\textit{j}` factors, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **ycv** : float, ndarray, shape :math:`\left(\mathrm{maxfac}, \textit{my}\right)` :math:`\mathrm{ycv}[\textit{i}-1,\textit{j}-1]` is the cumulative percentage of variance of the :math:`\textit{j}`\ th response variable explained by the first :math:`\textit{i}` factors, for :math:`\textit{j} = 1,2,\ldots,\textit{my}`, for :math:`\textit{i} = 1,2,\ldots,\mathrm{maxfac}`. .. _g02la-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 1`. (`errno` :math:`1`) On entry, :math:`\textit{mx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{mx} > 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle]` is invalid. Constraint: :math:`\mathrm{isx}[j-1] = 0` or :math:`1`, for all :math:`j`. (`errno` :math:`1`) On entry, :math:`\textit{my} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{my} \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{iscale} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{iscale} = -1` or :math:`1`. (`errno` :math:`2`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{mx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1 < \textit{ip}\leq \textit{mx}`. (`errno` :math:`2`) On entry, :math:`\mathrm{maxfac} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{maxfac}\leq \textit{ip}`. (`errno` :math:`3`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`{\mathrm{sum}\left(\mathrm{isx}\right)} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: the sum of elements in :math:`\mathrm{isx}` must equal :math:`\textit{ip}`. .. _g02la-py2-py-notes: **Notes** Let :math:`X_1` be the mean-centred :math:`n\times m` data matrix :math:`X` of :math:`n` observations on :math:`m` predictor variables. Let :math:`Y_1` be the mean-centred :math:`n\times r` data matrix :math:`Y` of :math:`n` observations on :math:`r` response variables. The first of the :math:`k` factors PLS methods extract from the data predicts both :math:`X_1` and :math:`Y_1` by regressing on :math:`t_1` a column vector of :math:`n` scores: .. math:: \begin{array}{cc} \hat{X}_1 = t_1 p_1^\mathrm{T} &\\ \hat{Y}_1 = t_1 c_1^\mathrm{T} \text{,} & \text{with } t_1^\mathrm{T} t_1 = 1 \text{,} \end{array} where the column vectors of :math:`m` :math:`x`-loadings :math:`p_1` and :math:`r` :math:`y`-loadings :math:`c_1` are calculated in the least squares sense: .. math:: \begin{array}{c} p_1^\mathrm{T} = t_1^\mathrm{T} X_1 \\ c_1^\mathrm{T} = t_1^\mathrm{T} Y_1 \text{.} \end{array} The :math:`x`-score vector :math:`t_1 = X_1w_1` is the linear combination of predictor data :math:`X_1` that has maximum covariance with the :math:`y`-scores :math:`u_1 = Y_1c_1`, where the :math:`x`-weights vector :math:`w_1` is the normalised first left singular vector of :math:`X_1^\mathrm{T}Y_1`. The method extracts subsequent PLS factors by repeating the above process with the residual matrices: .. math:: \begin{array}{c} X_i = X_{{i-1}} - \hat{X}_{{i-1}} \\ Y_i = Y_{{i-1}} - \hat{Y}_{{i-1}} \text{,}\quad i = 2,3,\ldots,k \text{,} \end{array} and with orthogonal scores: .. math:: t_i^\mathrm{T}t_j = 0\text{,}\quad j = 1,2,\ldots,i-1\text{.} Optionally, in addition to being mean-centred, the data matrices :math:`X_1` and :math:`Y_1` may be scaled by standard deviations of the variables. If data are supplied mean-centred, the calculations are not affected within numerical accuracy. See Also -------- :meth:`naginterfaces.library.examples.correg.pls_ex.main` """ raise NotImplementedError
[docs]def pls_wold(x, isx, y, iscale, xstd, ystd, maxfac, maxit=200, tau=1.0e-4, io_manager=None): r""" ``pls_wold`` fits an orthogonal scores partial least squares (PLS) regression by using Wold's iterative method. .. _g02lb-py2-py-doc: For full information please refer to the NAG Library document for g02lb https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02lbf.html .. _g02lb-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, \textit{mx}\right)` :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th predictor variable, for :math:`\textit{j} = 1,2,\ldots,\textit{mx}`, for :math:`\textit{i} = 1,2,\ldots,n`. **isx** : int, array-like, shape :math:`\left(\textit{mx}\right)` Indicates which predictor variables are to be included in the model. :math:`\mathrm{isx}[j-1] = 1` The :math:`j`\ th predictor variable (with variates in the :math:`j`\ th column of :math:`X`) is included in the model. :math:`\mathrm{isx}[j-1] = 0` Otherwise. **y** : float, array-like, shape :math:`\left(n, \textit{my}\right)` :math:`\mathrm{y}[\textit{i}-1,\textit{j}-1]` must contain the :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th response variable, for :math:`\textit{j} = 1,2,\ldots,\textit{my}`, for :math:`\textit{i} = 1,2,\ldots,n`. **iscale** : int Indicates how predictor variables are scaled. :math:`\mathrm{iscale} = 1` Data are scaled by the standard deviation of variables. :math:`\mathrm{iscale} = 2` Data are scaled by user-supplied scalings. :math:`\mathrm{iscale} = -1` No scaling. **xstd** : float, array-like, shape :math:`\left(\textit{ip}\right)` If :math:`\mathrm{iscale} = 2`, :math:`\mathrm{xstd}[\textit{j}-1]` must contain the user-supplied scaling for the :math:`\textit{j}`\ th predictor variable in the model, for :math:`\textit{j} = 1,2,\ldots,\textit{ip}`. Otherwise :math:`\mathrm{xstd}` need not be set. **ystd** : float, array-like, shape :math:`\left(\textit{my}\right)` If :math:`\mathrm{iscale} = 2`, :math:`\mathrm{ystd}[\textit{j}-1]` must contain the user-supplied scaling for the :math:`\textit{j}`\ th response variable in the model, for :math:`\textit{j} = 1,2,\ldots,\textit{my}`. Otherwise :math:`\mathrm{ystd}` need not be set. **maxfac** : int :math:`k`, the number of latent variables to calculate. **maxit** : int, optional If :math:`\textit{my} = 1`, :math:`\mathrm{maxit}` is not referenced; otherwise the maximum number of iterations used to calculate the :math:`x`-weights. **tau** : float, optional If :math:`\textit{my} = 1`, :math:`\mathrm{tau}` is not referenced; otherwise the iterative procedure used to calculate the :math:`x`-weights will halt if the Euclidean distance between two subsequent estimates is less than or equal to :math:`\mathrm{tau}`. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **xbar** : float, ndarray, shape :math:`\left(\textit{ip}\right)` Mean values of predictor variables in the model. **ybar** : float, ndarray, shape :math:`\left(\textit{my}\right)` The mean value of each response variable. **xstd** : float, ndarray, shape :math:`\left(\textit{ip}\right)` If :math:`\mathrm{iscale} = 1`, standard deviations of predictor variables in the model. Otherwise :math:`\mathrm{xstd}` is not changed. **ystd** : float, ndarray, shape :math:`\left(\textit{my}\right)` If :math:`\mathrm{iscale} = 1`, the standard deviation of each response variable. Otherwise :math:`\mathrm{ystd}` is not changed. **xres** : float, ndarray, shape :math:`\left(n, \textit{ip}\right)` The predictor variables' residual matrix :math:`X_k`. **yres** : float, ndarray, shape :math:`\left(n, \textit{my}\right)` The residuals for each response variable, :math:`Y_k`. **w** : float, ndarray, shape :math:`\left(\textit{ip}, \mathrm{maxfac}\right)` The :math:`\textit{j}`\ th column of :math:`W` contains the :math:`x`-weights :math:`w_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **p** : float, ndarray, shape :math:`\left(\textit{ip}, \mathrm{maxfac}\right)` The :math:`\textit{j}`\ th column of :math:`P` contains the :math:`x`-loadings :math:`p_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **t** : float, ndarray, shape :math:`\left(n, \mathrm{maxfac}\right)` The :math:`\textit{j}`\ th column of :math:`T` contains the :math:`x`-scores :math:`t_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **c** : float, ndarray, shape :math:`\left(\textit{my}, \mathrm{maxfac}\right)` The :math:`\textit{j}`\ th column of :math:`C` contains the :math:`y`-loadings :math:`c_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **u** : float, ndarray, shape :math:`\left(n, \mathrm{maxfac}\right)` The :math:`\textit{j}`\ th column of :math:`U` contains the :math:`y`-scores :math:`u_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **xcv** : float, ndarray, shape :math:`\left(\mathrm{maxfac}\right)` :math:`\mathrm{xcv}[\textit{j}-1]` contains the cumulative percentage of variance in the predictor variables explained by the first :math:`\textit{j}` factors, for :math:`\textit{j} = 1,2,\ldots,\mathrm{maxfac}`. **ycv** : float, ndarray, shape :math:`\left(\mathrm{maxfac}, \textit{my}\right)` :math:`\mathrm{ycv}[\textit{i}-1,\textit{j}-1]` is the cumulative percentage of variance of the :math:`\textit{j}`\ th response variable explained by the first :math:`\textit{i}` factors, for :math:`\textit{j} = 1,2,\ldots,\textit{my}`, for :math:`\textit{i} = 1,2,\ldots,\mathrm{maxfac}`. .. _g02lb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 1`. (`errno` :math:`1`) On entry, :math:`\textit{mx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{mx} > 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle]` is invalid. Constraint: :math:`\mathrm{isx}[j-1] = 0` or :math:`1`, for all :math:`j`. (`errno` :math:`1`) On entry, :math:`\textit{my} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{my} \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{iscale} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{iscale} = -1` or :math:`1`. (`errno` :math:`2`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{mx} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1 < \textit{ip}\leq \textit{mx}`. (`errno` :math:`2`) On entry, :math:`\mathrm{maxfac} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{maxfac}\leq \textit{ip}`. (`errno` :math:`2`) On entry, :math:`\textit{my} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\textit{my} > 1`, :math:`\mathrm{maxit} > 1`. (`errno` :math:`2`) On entry, :math:`\mathrm{tau} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\textit{my} > 1`, :math:`\mathrm{tau} > 0.0`. (`errno` :math:`3`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`{\mathrm{sum}\left(\mathrm{isx}\right)} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: the sum of elements in :math:`\mathrm{isx}` must equal :math:`\textit{ip}`. .. _g02lb-py2-py-notes: **Notes** Let :math:`X_1` be the mean-centred :math:`n\times m` data matrix :math:`X` of :math:`n` observations on :math:`m` predictor variables. Let :math:`Y_1` be the mean-centred :math:`n\times r` data matrix :math:`Y` of :math:`n` observations on :math:`r` response variables. The first of the :math:`k` factors PLS methods extract from the data predicts both :math:`X_1` and :math:`Y_1` by regressing on a :math:`t_1` column vector of :math:`n` scores: .. math:: \begin{array}{cc} \hat{X}_1 = t_1 p_1^\mathrm{T} &\\ \hat{Y}_1 = t_1 c_1^\mathrm{T} \text{,} & \text{with } t_1^\mathrm{T} t_1 = 1 \text{,} \end{array} where the column vectors of :math:`m` :math:`x`-loadings :math:`p_1` and :math:`r` :math:`y`-loadings :math:`c_1` are calculated in the least squares sense: .. math:: \begin{array}{c} p_1^\mathrm{T} = t_1^\mathrm{T} X_1 \\ c_1^\mathrm{T} = t_1^\mathrm{T} Y_1 \text{.} \end{array} The :math:`x`-score vector :math:`t_1 = X_1w_1` is the linear combination of predictor data :math:`X_1` that has maximum covariance with the :math:`y`-scores :math:`u_1 = Y_1c_1`, where the :math:`x`-weights vector :math:`w_1` is the normalised first left singular vector of :math:`X_1^\mathrm{T}Y_1`. The method extracts subsequent PLS factors by repeating the above process with the residual matrices: .. math:: \begin{array}{c} X_i = X_{{i-1}} - \hat{X}_{{i-1}} \\ Y_i = Y_{{i-1}} - \hat{Y}_{{i-1}} \text{,}\quad i = 2,3,\ldots,k \text{,} \end{array} and with orthogonal scores: .. math:: t_i^\mathrm{T}t_j = 0\text{,}\quad j = 1,2,\ldots,i-1\text{.} Optionally, in addition to being mean-centred, the data matrices :math:`X_1` and :math:`Y_1` may be scaled by standard deviations of the variables. If data are supplied mean-centred, the calculations are not affected within numerical accuracy. .. _g02lb-py2-py-references: **References** Wold, H, 1966, `Estimation of principal components and related models by iterative least squares`, In: Multivariate Analysis, (ed P R Krishnaiah), 391--420, Academic Press NY """ raise NotImplementedError
[docs]def pls_fit(nfact, p, c, w, rcond, orig, xbar, ybar, iscale, xstd, ystd, vipopt, ycv): r""" ``pls_fit`` calculates parameter estimates for a given number of factors given the output from an orthogonal scores PLS regression (:meth:`pls_svd` or :meth:`pls_wold`). .. _g02lc-py2-py-doc: For full information please refer to the NAG Library document for g02lc https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02lcf.html .. _g02lc-py2-py-parameters: **Parameters** **nfact** : int :math:`l`, the number of factors to include in the calculation of parameter estimates. **p** : float, array-like, shape :math:`\left(\textit{ip}, \textit{maxfac}\right)` :math:`x`-loadings as returned from :meth:`pls_svd` and :meth:`pls_wold`. **c** : float, array-like, shape :math:`\left(\textit{my}, \textit{maxfac}\right)` :math:`y`-loadings as returned from :meth:`pls_svd` and :meth:`pls_wold`. **w** : float, array-like, shape :math:`\left(\textit{ip}, \textit{maxfac}\right)` :math:`x`-weights as returned from :meth:`pls_svd` and :meth:`pls_wold`. **rcond** : float Singular values of :math:`P^\mathrm{T}W` less than :math:`\mathrm{rcond}` times the maximum singular value are treated as zero when calculating parameter estimates. If :math:`\mathrm{rcond}` is negative, a value of :math:`0.005` is used. **orig** : int Indicates how parameter estimates are calculated. :math:`\mathrm{orig} = -1` Parameter estimates for the centred, and possibly, scaled data. :math:`\mathrm{orig} = 1` Parameter estimates for the original data. **xbar** : float, array-like, shape :math:`\left(\textit{ip}\right)` If :math:`\mathrm{orig} = 1`, mean values of predictor variables in the model; otherwise :math:`\mathrm{xbar}` is not referenced. **ybar** : float, array-like, shape :math:`\left(\textit{my}\right)` If :math:`\mathrm{orig} = 1`, mean value of each response variable in the model; otherwise :math:`\mathrm{ybar}` is not referenced. **iscale** : int If :math:`\mathrm{orig} = 1`, :math:`\mathrm{iscale}` must take the value supplied to either :meth:`pls_svd` or :meth:`pls_wold`; otherwise :math:`\mathrm{iscale}` is not referenced. **xstd** : float, array-like, shape :math:`\left(\textit{ip}\right)` If :math:`\mathrm{orig} = 1` and :math:`\mathrm{iscale} \neq -1`, the scalings of predictor variables in the model as returned from either :meth:`pls_svd` or :meth:`pls_wold`; otherwise :math:`\mathrm{xstd}` is not referenced. **ystd** : float, array-like, shape :math:`\left(\textit{my}\right)` If :math:`\mathrm{orig} = 1` and :math:`\mathrm{iscale} \neq -1`, the scalings of response variables as returned from either :meth:`pls_svd` or :meth:`pls_wold`; otherwise :math:`\mathrm{ystd}` is not referenced. **vipopt** : int A flag that determines variable influence on projections (VIP) options. :math:`\mathrm{vipopt} = 0` VIP are not calculated. :math:`\mathrm{vipopt} = 1` VIP are calculated for predictor variables using the mean explained variance in responses. :math:`\mathrm{vipopt} = \textit{my}` VIP are calculated for predictor variables for each response variable in the model. Note that setting :math:`\mathrm{vipopt} = \textit{my}` when :math:`\textit{my} = 1` gives the same result as setting :math:`\mathrm{vipopt} = 1` directly. **ycv** : float, array-like, shape :math:`\left(:, \textit{my}\right)` Note: the required extent for this argument in dimension 1 is determined as follows: if :math:`\mathrm{vipopt}\neq 0`: :math:`\mathrm{nfact}`; otherwise: :math:`0`. If :math:`\mathrm{vipopt} \neq 0`, :math:`\mathrm{ycv}[\textit{i}-1,\textit{j}-1]` is the cumulative percentage of variance of the :math:`\textit{j}`\ th response variable explained by the first :math:`\textit{i}` factors, for :math:`\textit{j} = 1,2,\ldots,\textit{my}`, for :math:`\textit{i} = 1,2,\ldots,\mathrm{nfact}`; otherwise :math:`\mathrm{ycv}` is not referenced. **Returns** **b** : float, ndarray, shape :math:`\left(\textit{ip}, \textit{my}\right)` :math:`\mathrm{b}[\textit{i}-1,\textit{j}-1]` contains the parameter estimate for the :math:`\textit{i}`\ th predictor variable in the model for the :math:`\textit{j}`\ th response variable, for :math:`\textit{j} = 1,2,\ldots,\textit{my}`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. **ob** : float, ndarray, shape :math:`\left(:, \textit{my}\right)` If :math:`\mathrm{orig} = 1`, :math:`\mathrm{ob}[0,\textit{j}-1]` contains the intercept value for the :math:`\textit{j}`\ th response variable, and :math:`\mathrm{ob}[\textit{i},\textit{j}-1]` contains the parameter estimate on the original scale for the :math:`\textit{i}`\ th predictor variable in the model, for :math:`\textit{j} = 1,2,\ldots,\textit{my}`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. Otherwise :math:`\mathrm{ob}` is not referenced. **vip** : float, ndarray, shape :math:`\left(:, \mathrm{vipopt}\right)` If :math:`\mathrm{vipopt} = 1`, :math:`\mathrm{vip}[\textit{i}-1,0]` contains the VIP statistic for the :math:`\textit{i}`\ th predictor variable in the model for all response variables, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. If :math:`\mathrm{vipopt} = \textit{my}`, :math:`\mathrm{vip}[\textit{i}-1,\textit{j}-1]` contains the VIP statistic for the :math:`\textit{i}`\ th predictor variable in the model for the :math:`\textit{j}`\ th response variable, for :math:`\textit{j} = 1,2,\ldots,\textit{my}`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. Otherwise :math:`\mathrm{vip}` is not referenced. .. _g02lc-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} > 1`. (`errno` :math:`1`) On entry, :math:`\textit{my} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{my} \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{orig} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{orig} = -1` or :math:`1`. (`errno` :math:`1`) On entry, :math:`\mathrm{iscale} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{orig} = 1`, :math:`\mathrm{iscale} = -1` or :math:`1`. (`errno` :math:`1`) On entry, :math:`\mathrm{vipopt} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{my} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{vipopt} = 0`, :math:`1` or :math:`\textit{my}`. (`errno` :math:`2`) On entry, :math:`\textit{maxfac} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \textit{maxfac}\leq \textit{ip}`. (`errno` :math:`2`) On entry, :math:`\mathrm{nfact} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{maxfac} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{nfact}\leq \textit{maxfac}`. (`errno` :math:`2`) On entry, :math:`\textit{ldob} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{orig} = 1`, :math:`\textit{ldob} \geq {\textit{ip}+1}`. (`errno` :math:`2`) On entry, :math:`\textit{ldycv} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{nfact} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{vipopt}\neq 0`, :math:`\textit{ldycv} \geq \mathrm{nfact}`. (`errno` :math:`2`) On entry, :math:`\textit{ldvip} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{vipopt}\neq 0`, :math:`\textit{ldvip} \geq \textit{ip}`. .. _g02lc-py2-py-notes: **Notes** The parameter estimates :math:`B` for a :math:`l`-factor orthogonal scores PLS model with :math:`m` predictor variables and :math:`r` response variables are given by, .. math:: B = W\left(P^\mathrm{T}W\right)^{-1}C^\mathrm{T}\text{,}\quad B \in \mathbb{R}^{{m\times r}}\text{,} where :math:`W` is the :math:`m\times k` (:math:`\geq l`) matrix of :math:`x`-weights; :math:`P` is the :math:`m\times k` matrix of :math:`x`-loadings; and :math:`C` is the :math:`r\times k` matrix of :math:`y`-loadings for a fitted PLS model. The parameter estimates :math:`B` are for centred, and possibly scaled, predictor data :math:`X_1` and response data :math:`Y_1`. Parameter estimates may also be given for the predictor data :math:`X` and response data :math:`Y`. Optionally, ``pls_fit`` will calculate variable influence on projection (VIP) statistics, see Wold (1994). .. _g02lc-py2-py-references: **References** Wold, S, 1994, `PLS for multivariate linear modelling QSAR: chemometric methods in molecular design`, Methods and Principles in Medicinal Chemistry, (ed van de Waterbeemd H), Verlag-Chemie See Also -------- :meth:`naginterfaces.library.examples.correg.pls_ex.main` """ raise NotImplementedError
[docs]def pls_pred(orig, xbar, ybar, iscale, xstd, ystd, b, isz, z): r""" ``pls_pred`` calculates predictions given the output from an orthogonal scores PLS regression (:meth:`pls_svd` or :meth:`pls_wold`) and :meth:`pls_fit`. .. _g02ld-py2-py-doc: For full information please refer to the NAG Library document for g02ld https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02ldf.html .. _g02ld-py2-py-parameters: **Parameters** **orig** : int Indicates how parameter estimates are supplied. :math:`\mathrm{orig} = 1` Parameter estimates are for the original data. :math:`\mathrm{orig} = -1` Parameter estimates are for the centred, and possibly scaled, data. **xbar** : float, array-like, shape :math:`\left(\textit{ip}\right)` If :math:`\mathrm{orig} = -1`, :math:`\mathrm{xbar}` must contain mean values of predictor variables in the model; otherwise :math:`\mathrm{xbar}` is not referenced. **ybar** : float, array-like, shape :math:`\left(\textit{my}\right)` If :math:`\mathrm{orig} = -1`, :math:`\mathrm{ybar}` must contain the mean value of each response variable in the model; otherwise :math:`\mathrm{ybar}` is not referenced. **iscale** : int If :math:`\mathrm{orig} = -1`, :math:`\mathrm{iscale}` must take the value supplied to either :meth:`pls_svd` or :meth:`pls_wold`; otherwise :math:`\mathrm{iscale}` is not referenced. **xstd** : float, array-like, shape :math:`\left(\textit{ip}\right)` If :math:`\mathrm{orig} = -1` and :math:`\mathrm{iscale} \neq -1`, :math:`\mathrm{xstd}` must contain the scalings of predictor variables in the model as returned from either :meth:`pls_svd` or :meth:`pls_wold`; otherwise :math:`\mathrm{xstd}` is not referenced. **ystd** : float, array-like, shape :math:`\left(\textit{my}\right)` If :math:`\mathrm{orig} = -1` and :math:`\mathrm{iscale} \neq -1`, :math:`\mathrm{ystd}` must contain the scalings of response variables as returned from either :meth:`pls_svd` or :meth:`pls_wold`; otherwise :math:`\mathrm{ystd}` is not referenced. **b** : float, array-like, shape :math:`\left(:, \textit{my}\right)` Note: the required extent for this argument in dimension 1 is determined as follows: if :math:`\mathrm{orig}=-1`: :math:`\textit{ip}`; if :math:`\mathrm{orig}=1`: :math:`{1+\textit{ip}}`; otherwise: :math:`0`. If :math:`\mathrm{orig} = -1`, :math:`\mathrm{b}` must contain the parameter estimate for the centred, and possibly scaled, data as returned by :meth:`pls_fit`; otherwise :math:`\mathrm{b}` must contain the parameter estimates for the original data as returned by :meth:`pls_fit`. **isz** : int, array-like, shape :math:`\left(\textit{mz}\right)` Indicates which predictor variables are to be included in the model. Predictor variables included from :math:`\mathrm{z}` must be in the same order as those included in the fitted model. If :math:`\mathrm{isz}[\textit{j}-1] = 1`, the :math:`\textit{j}`\ th predictor variable is included in the model, for :math:`\textit{j} = 1,2,\ldots,\textit{mz}`, otherwise :math:`\mathrm{isz}[j-1] = 0`. **z** : float, array-like, shape :math:`\left(n, \textit{mz}\right)` :math:`\mathrm{z}[\textit{i}-1,\textit{j}-1]` contains the :math:`\textit{i}`\ th observation on the :math:`\textit{j}`\ th available predictor variable, for :math:`\textit{j} = 1,2,\ldots,\textit{mz}`, for :math:`\textit{i} = 1,2,\ldots,n`. **Returns** **yhat** : float, ndarray, shape :math:`\left(n, \textit{my}\right)` :math:`\mathrm{yhat}[i-1,j-1]` contains the :math:`i`\ th predicted value of the :math:`j`\ th :math:`y`-variable in the model. .. _g02ld-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} > 1`. (`errno` :math:`1`) On entry, :math:`\textit{my} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{my} \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{orig} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{orig} = -1` or :math:`1`. (`errno` :math:`1`) On entry, :math:`\mathrm{iscale} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{orig} = -1`, :math:`\mathrm{iscale} = -1`, :math:`1` or :math:`2`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{isz}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{isz}[j-1] = 0` or :math:`1`. (`errno` :math:`2`) On entry, :math:`\textit{ldb} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`{\textit{ip}+1} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{orig} = 1`, :math:`\textit{ldb}\geq 1+\textit{ip}`. (`errno` :math:`2`) On entry, :math:`\textit{ldb} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{orig} = -1`, :math:`\textit{ldb} \geq \textit{ip}`. (`errno` :math:`2`) On entry, :math:`\textit{mz} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{mz} \geq \textit{ip}`. (`errno` :math:`3`) On entry, the number of elements of :math:`\mathrm{isz}` equal to :math:`1` is not :math:`\textit{ip}`. .. _g02ld-py2-py-notes: **Notes** ``pls_pred`` calculates the predictions :math:`\hat{Y}` of a PLS model given a set :math:`Z` of test data and a set :math:`B` of parameter estimates as returned by :meth:`pls_fit`. If :meth:`pls_fit` returns parameter estimates for the original data scale, no further information is required. If :meth:`pls_fit` returns parameter estimates for the centred, and possibly scaled, data, further information is required. The means of variables in the fitted model must be supplied. In the case of a PLS model fitted by using scaled data, the means and standard deviations of variables in the fitted model must also be supplied. These means and standard deviations are those returned by either :meth:`pls_svd` and :meth:`pls_wold`. See Also -------- :meth:`naginterfaces.library.examples.correg.pls_ex.main` """ raise NotImplementedError
[docs]def lars(mtype, d, y, pred=3, prey=1, isx=None, mnstep=None, ropt=None, io_manager=None): r""" ``lars`` performs Least Angle Regression (LARS), forward stagewise linear regression or Least Absolute Shrinkage and Selection Operator (LASSO). .. _g02ma-py2-py-doc: For full information please refer to the NAG Library document for g02ma https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02maf.html .. _g02ma-py2-py-parameters: **Parameters** **mtype** : int Indicates the type of model to fit. :math:`\mathrm{mtype} = 1` LARS is performed. :math:`\mathrm{mtype} = 2` Forward linear stagewise regression is performed. :math:`\mathrm{mtype} = 3` LASSO model is fit. :math:`\mathrm{mtype} = 4` A positive LASSO model is fit. **d** : float, array-like, shape :math:`\left(n, m\right)` :math:`D`, the data, which along with :math:`\mathrm{pred}` and :math:`\mathrm{isx}`, defines the design matrix :math:`X`. The :math:`\textit{i}`\ th observation for the :math:`\textit{j}`\ th variable must be supplied in :math:`\mathrm{d}[\textit{i}-1,\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **y** : float, array-like, shape :math:`\left(n\right)` :math:`y`, the observations on the dependent variable. **pred** : int, optional Indicates the type of data preprocessing to perform on the independent variables supplied in :math:`\mathrm{d}` to comply with the standardized form of the design matrix. :math:`\mathrm{pred} = 0` No preprocessing is performed. :math:`\mathrm{pred} = 1` Each of the independent variables, :math:`x_{\textit{j}}`, for :math:`\textit{j} = 1,2,\ldots,p`, are mean centred prior to fitting the model. The means of the independent variables, :math:`\bar{x}`, are returned in :math:`\mathrm{b}`, with :math:`\bar{x}_{\textit{j}} = \mathrm{b}[\textit{j}-1,\mathrm{nstep}+1]`, for :math:`\textit{j} = 1,2,\ldots,p`. :math:`\mathrm{pred} = 2` Each independent variable is normalized, with the :math:`j`\ th variable scaled by :math:`1/\sqrt{x_j^\mathrm{T}x_j}`. The scaling factor used by variable :math:`j` is returned in :math:`\mathrm{b}[\textit{j}-1,\mathrm{nstep}]`. :math:`\mathrm{pred} = 3` As :math:`\mathrm{pred} = 1` or :math:`2`, all of the independent variables are mean centred prior to being normalized. **prey** : int, optional Indicates the type of data preprocessing to perform on the dependent variable supplied in :math:`\mathrm{y}`. :math:`\mathrm{prey} = 0` No preprocessing is performed, this is equivalent to setting :math:`\alpha = 0`. :math:`\mathrm{prey} = 1` The dependent variable, :math:`y`, is mean centred prior to fitting the model, so :math:`\alpha = \bar{y}`. Which is equivalent to fitting a non-penalized intercept to the model and the degrees of freedom etc. are adjusted accordingly. The value of :math:`\alpha` used is returned in :math:`\mathrm{fitsum}[0,\mathrm{nstep}]`. **isx** : None or int, array-like, shape :math:`\left(\textit{lisx}\right)`, optional Indicates which independent variables from :math:`\mathrm{d}` will be included in the design matrix, :math:`X`. If :math:`\mathrm{isx}` is **None**, all variables are included in the design matrix. Otherwise :math:`\mathrm{isx}[\textit{j}-1]` must be set as follows, for :math:`\textit{j} = 1,2,\ldots,m`: :math:`\mathrm{isx}[j-1] = 1` To indicate that the :math:`j`\ th variable, as supplied in :math:`\mathrm{d}`, is included in the design matrix; :math:`\mathrm{isx}[j-1] = 0` To indicated that the :math:`j`\ th variable, as supplied in :math:`\mathrm{d}`, is not included in the design matrix; and :math:`p = \sum_{1}^{m}\mathrm{isx}[\textit{j}-1]`. **mnstep** : None or int, optional Note: if this argument is **None** then a default value will be used, determined as follows: if :math:`\mathrm{mtype} = 1`: :math:`{m}`; otherwise: :math:`{200\times m}`. The maximum number of steps to carry out in the model fitting process. If :math:`\mathrm{mtype} = 1`, i.e., a LARS is being performed, the maximum number of steps the algorithm will take is :math:`\mathrm{min}\left(p,n\right)` if :math:`\mathrm{prey} = 0`, otherwise :math:`\mathrm{min}\left(p,n-1\right)`. If :math:`\mathrm{mtype} = 2`, i.e., a forward linear stagewise regression is being performed, the maximum number of steps the algorithm will take is likely to be several orders of magnitude more and is no longer bound by :math:`p` or :math:`n`. If :math:`\mathrm{mtype} = 3` or :math:`4`, i.e., a LASSO or positive LASSO model is being fit, the maximum number of steps the algorithm will take lies somewhere between that of the LARS and forward linear stagewise regression, again it is no longer bound by :math:`p` or :math:`n`. **ropt** : None or float, array-like, shape :math:`\left(\textit{lropt}\right)`, optional Options to control various aspects of the LARS algorithm. The default value will be used for :math:`\mathrm{ropt}[i-1]` if the length of :math:`\mathrm{ropt}` is less than :math:`i`, therefore, to use the default values for all options :math:`\mathrm{ropt}` need not be set and may be **None**. The default value will also be used if an invalid value is supplied for a particular argument, for example, setting :math:`\mathrm{ropt}[i-1] = -1` will use the default value for argument :math:`i`. :math:`\mathrm{ropt}[0]` The minimum step size that will be taken. Default is :math:`100\times \textit{eps}`, where :math:`\textit{eps}` is the machine precision returned by :meth:`machine.precision <naginterfaces.library.machine.precision>`. :math:`\mathrm{ropt}[1]` General tolerance, used amongst other things, for comparing correlations. Default is :math:`\mathrm{ropt}[0]`. :math:`\mathrm{ropt}[2]` If set to :math:`1`, parameter estimates are rescaled before being returned. If set to :math:`0`, no rescaling is performed. This argument has no effect when :math:`\mathrm{pred} = 0` or :math:`1`. Default is for the parameter estimates to be rescaled. :math:`\mathrm{ropt}[3]` If set to :math:`1`, it is assumed that the model contains an intercept during the model fitting process and when calculating the degrees of freedom. If set to :math:`0`, no intercept is assumed. This has no effect on the amount of preprocessing performed on :math:`\mathrm{y}`. Default is to treat the model as having an intercept when :math:`\mathrm{prey} = 1` and as not having an intercept when :math:`\mathrm{prey} = 0`. :math:`\mathrm{ropt}[4]` As implemented, the LARS algorithm can either work directly with :math:`y` and :math:`X`, or it can work with the cross-product matrices, :math:`X^\mathrm{T}y` and :math:`X^\mathrm{T}X`. In most cases it is more efficient to work with the cross-product matrices. This flag allows you direct control over which method is used, however, the default value will usually be the best choice. If :math:`\mathrm{ropt}[4] = 1`, :math:`y` and :math:`X` are worked with directly. If :math:`\mathrm{ropt}[4] = 0`, the cross-product matrices are used. Default is :math:`1` when :math:`p\geq 500` and :math:`n < p` and :math:`0` otherwise. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **ip** : int :math:`p`, number of parameter estimates. If :math:`\mathrm{isx}` is **None**, :math:`p = m`, i.e., the number of variables in :math:`\mathrm{d}`. Otherwise :math:`p` is the number of nonzero values in :math:`\mathrm{isx}`. **nstep** : int :math:`K`, the actual number of steps carried out in the model fitting process. **b** : float, ndarray, shape :math:`\left(\mathrm{ip}, \mathrm{nstep}+2\right)` :math:`\beta` the parameter estimates, with :math:`\mathrm{b}[j-1,k-1] = \beta_{{kj}}`, the parameter estimate for the :math:`j`\ th variable, :math:`j = 1,2,\ldots,p` at the :math:`k`\ th step of the model fitting process, :math:`k = 1,2,\ldots,\mathrm{nstep}`. By default, when :math:`\mathrm{pred} = 2` or :math:`3` the parameter estimates are rescaled prior to being returned. If the parameter estimates are required on the normalized scale, then this can be overridden via :math:`\mathrm{ropt}`. The values held in the remaining part of :math:`\mathrm{b}` depend on the type of preprocessing performed. If :math:`\mathrm{pred} = 0`, :math:`\begin{array}{lll}\mathrm{b}[j-1,\mathrm{nstep}]& = &1\\\mathrm{b}[j-1,\mathrm{nstep}+1]& = &0\end{array}` If :math:`\mathrm{pred} = 1`, :math:`\begin{array}{lll}\mathrm{b}[j-1,\mathrm{nstep}]& = &1\\\mathrm{b}[j-1,\mathrm{nstep}+1]& = &\bar{x}_j\end{array}` If :math:`\mathrm{pred} = 2`, :math:`\begin{array}{lll}\mathrm{b}[j-1,\mathrm{nstep}]& = & 1/ \sqrt{x_j^\mathrm{T}x_j} \\\mathrm{b}[j-1,\mathrm{nstep}+1]& = & 0 \end{array}` If :math:`\mathrm{pred} = 3`, :math:`\begin{array}{lll}\mathrm{b}[j-1,\mathrm{nstep}]& = & 1/ \sqrt{ \left(x_j-\bar{x}_j\right)^\mathrm{T}\left(x_j-\bar{x}_j\right)} \\\mathrm{b}[j-1,\mathrm{nstep}+1]& = & \bar{x}_j \end{array}` for :math:`j = 1,2,\ldots,p`. **fitsum** : float, ndarray, shape :math:`\left(6, \mathrm{mnstep}+1\right)` Summaries of the model fitting process. When :math:`k = 1,2,\ldots,\mathrm{nstep}`, :math:`\mathrm{fitsum}[0,k-1]` :math:`\left\lVert \beta_k\right\rVert_1`, the sum of the absolute values of the parameter estimates for the :math:`k`\ th step of the modelling fitting process. If :math:`\mathrm{pred} = 2` or :math:`3`, the scaled parameter estimates are used in the summation. :math:`\mathrm{fitsum}[1,k-1]` :math:`\textit{RSS}_k`, the residual sums of squares for the :math:`k`\ th step, where :math:`\textit{RSS}_k = \left\lVert y-X^\mathrm{T}\beta_k\right\rVert^2`. :math:`\mathrm{fitsum}[2,k-1]` :math:`\nu_k`, approximate degrees of freedom for the :math:`k`\ th step. :math:`\mathrm{fitsum}[3,k-1]` :math:`C_p^{\left(k\right)}`, a :math:`C_p`-type statistic for the :math:`k`\ th step, where :math:`C_p^{\left(k\right)} = \frac{\textit{RSS}_k}{\sigma^2}-n+2\nu_k`. :math:`\mathrm{fitsum}[4,k-1]` :math:`\hat{C}_k`, correlation between the residual at step :math:`k-1` and the most correlated variable not yet in the active set :math:`\mathcal{A}`, where the residual at step :math:`0` is :math:`y`. :math:`\mathrm{fitsum}[5,k-1]` :math:`\hat{\gamma }_k`, the step size used at step :math:`k`. In addition :math:`\mathrm{fitsum}[0,\mathrm{nstep}]` :math:`\alpha`, with :math:`\alpha = \bar{y}` if :math:`\mathrm{prey} = 1` and :math:`0` otherwise. :math:`\mathrm{fitsum}[1,\mathrm{nstep}]` :math:`\textit{RSS}_0`, the residual sums of squares for the null model, where :math:`\textit{RSS}_0 = y^\mathrm{T}y` when :math:`\mathrm{prey} = 0` and :math:`\textit{RSS}_0 = \left(y-\bar{y}\right)^\mathrm{T}\left(y-\bar{y}\right)` otherwise. :math:`\mathrm{fitsum}[2,\mathrm{nstep}]` :math:`\nu_0`, the degrees of freedom for the null model, where :math:`\nu_0 = 0` if :math:`\mathrm{prey} = 0` and :math:`\nu_0 = 1` otherwise. :math:`\mathrm{fitsum}[3,\mathrm{nstep}]` :math:`C_p^{\left(0\right)}`, a :math:`C_p`-type statistic for the null model, where :math:`C_p^{\left(0\right)} = \frac{\textit{RSS}_0}{\sigma^2}-n+2\nu_0`. :math:`\mathrm{fitsum}[4,\mathrm{nstep}]` :math:`\sigma^2`, where :math:`\sigma^2 = \frac{{n-\textit{RSS}_K}}{\nu_K}` and :math:`K = \mathrm{nstep}`. Although the :math:`C_p` statistics described above are returned when :math:`\mathrm{errno}` = 112 they may not be meaningful due to the estimate :math:`\sigma^2` not being based on the saturated model. .. _g02ma-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) On entry, :math:`\mathrm{mtype} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mtype} = 1`, :math:`2`, :math:`3` or :math:`4`. (`errno` :math:`21`) On entry, :math:`\mathrm{pred} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{pred} = 0`, :math:`1`, :math:`2` or :math:`3`. (`errno` :math:`31`) On entry, :math:`\mathrm{prey} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{prey} = 0` or :math:`1`. (`errno` :math:`41`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 1`. (`errno` :math:`51`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`81`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{isx}[i] = 0` or :math:`1`, for all :math:`i`. (`errno` :math:`82`) On entry, all values of :math:`\mathrm{isx}` are zero. Constraint: at least one value of :math:`\mathrm{isx}` must be nonzero. (`errno` :math:`91`) On entry, :math:`\textit{lisx} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lisx} = 0` or :math:`m`. (`errno` :math:`111`) On entry, :math:`\mathrm{mnstep} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mnstep} \geq 1`. (`errno` :math:`151`) On entry, :math:`\textit{ldb} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\textit{lisx} = 0` then :math:`\textit{ldb} \geq m`. (`errno` :math:`152`) On entry, :math:`\textit{ldb} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`p = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\textit{lisx} = m` then :math:`\textit{ldb}\geq p`. (`errno` :math:`181`) On entry, :math:`\textit{lropt} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \textit{lropt}\leq 5`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`112`) Fitting process did not finish in :math:`\mathrm{mnstep}` steps. Try increasing the size of :math:`\mathrm{mnstep}` and supplying larger output arrays. All output is returned as documented, up to step :math:`\mathrm{mnstep}`, however, :math:`\sigma` and the :math:`C_p` statistics may not be meaningful. (`errno` :math:`161`) :math:`\sigma^2` is approximately zero and hence the :math:`C_p`-type criterion cannot be calculated. All other output is returned as documented. (`errno` :math:`162`) :math:`\nu_K = n`, therefore, :math:`\sigma` has been set to a large value. Output is returned as documented. (`errno` :math:`163`) Degenerate model, no variables added and :math:`\mathrm{nstep} = 0`. Output is returned as documented. .. _g02ma-py2-py-notes: **Notes** ``lars`` implements the LARS algorithm of Efron `et al.` (2004) as well as the modifications needed to perform forward stagewise linear regression and fit LASSO and positive LASSO models. Given a vector of :math:`n` observed values, :math:`y = \left\{y_i:i = 1,2,\ldots,n\right\}` and an :math:`n\times p` design matrix :math:`X`, where the :math:`j`\ th column of :math:`X`, denoted :math:`x_j`, is a vector of length :math:`n` representing the :math:`j`\ th independent variable :math:`x_j`, standardized such that :math:`\sum_{1}^{n}{x_{{ij}}} = 0`, and :math:`\sum_{1}^{n}{x_{{ij}}^2} = 1` and a set of model parameters :math:`\beta` to be estimated from the observed values, the LARS algorithm can be summarised as: (1) Set :math:`k = 1` and all coefficients to zero, that is :math:`\beta = 0`. (#) Find the variable most correlated with :math:`y`, say :math:`x_{j_1}`. Add :math:`x_{j_1}` to the 'most correlated' set :math:`\mathcal{A}`. If :math:`p = 1` go to \(8). (#) Take the largest possible step in the direction of :math:`x_{j_1}` (i.e., increase the magnitude of :math:`\beta_{j_1}`) until some other variable, say :math:`x_{j_2}`, has the same correlation with the current residual, :math:`y-x_{j_1}\beta_{j_1}`. (#) Increment :math:`k` and add :math:`x_{j_k}` to :math:`\mathcal{A}`. (#) If :math:`\left\lvert \mathcal{A}\right\rvert = p` go to \(8). (#) Proceed in the 'least angle direction', that is, the direction which is equiangular between all variables in :math:`\mathcal{A}`, altering the magnitude of the parameter estimates of those variables in :math:`\mathcal{A}`, until the :math:`k`\ th variable, :math:`x_{j_k}`, has the same correlation with the current residual. (#) Go to \(4). (#) Let :math:`K = k`. As well as being a model selection process in its own right, with a small number of modifications the LARS algorithm can be used to fit the LASSO model of Tibshirani (1996), a positive LASSO model, where the independent variables enter the model in their defined direction (i.e., :math:`\beta_{{kj}}\geq 0`), forward stagewise linear regression (Hastie `et al.` (2001)) and forward selection (Weisberg (1985)). Details of the required modifications in each of these cases are given in Efron `et al.` (2004). The LASSO model of Tibshirani (1996) is given by .. math:: \textit{minimize}_{{\alpha,\beta_k \in \mathbb{R}^p}}\left\lVert y-\alpha -X^\mathrm{T}\beta_k\right\rVert^2\quad \text{ subject to }\quad \left\lVert \beta_k\right\rVert_1\leq t_k for all values of :math:`t_k`, where :math:`\alpha = \bar{y} = n^{-1}\sum_{1}^{n}{y_i}`. The positive LASSO model is the same as the standard LASSO model, given above, with the added constraint that .. math:: \beta_{{kj}}\geq 0\text{, }\quad j = 1,2,\ldots,p\text{.} Unlike the standard LARS algorithm, when fitting either of the LASSO models, variables can be dropped as well as added to the set :math:`\mathcal{A}`. Therefore, the total number of steps :math:`K` is no longer bounded by :math:`p`. Forward stagewise linear regression is an iterative procedure of the form: (1) Initialize :math:`k = 1` and the vector of residuals :math:`r_0 = y-\alpha`. (#) For each :math:`j = 1,2,\ldots,p` calculate :math:`c_j = x_j^\mathrm{T}r_{{k-1}}`. The value :math:`c_j` is, therefore, proportional to the correlation between the :math:`j`\ th independent variable and the vector of previous residual values, :math:`r_k`. (#) Calculate :math:`j_k = \textit{argmax}_j\left|c_j\right|`, the value of :math:`j` with the largest absolute value of :math:`c_j`. (#) If :math:`\left\lvert c_{j_k}\right\rvert < \epsilon` then go to \(7). (#) Update the residual values, with .. math:: r_k = r_{{k-1}}+\delta \text{ }\mathrm{sign}\left(c_{j_k}\right)x_{j_k} where :math:`\delta` is a small constant and :math:`\mathrm{sign}\left(c_{j_k}\right) = -1` when :math:`c_{j_k} < 0` and :math:`1` otherwise. (#) Increment :math:`k` and go to \(2). (#) Set :math:`K = k`. If the largest possible step were to be taken, that is :math:`\delta = \left\lvert c_{j_k}\right\rvert` then forward stagewise linear regression reverts to the standard forward selection method as implemented in :meth:`linregm_fit_onestep`. The LARS procedure results in :math:`K` models, one for each step of the fitting process. In order to aid in choosing which is the most suitable Efron `et al.` (2004) introduced a :math:`C_p`-type statistic given by .. math:: C_p^{\left(k\right)} = \frac{\left\lVert y-X^\mathrm{T}\beta_k\right\rVert^2}{\sigma^2}-n+2\nu_k\text{,} where :math:`\nu_k` is the approximate degrees of freedom for the :math:`k`\ th step and .. math:: \sigma^2 = \frac{{n-y^\mathrm{T}y}}{\nu_K}\text{.} One way of choosing a model is, therefore, to take the one with the smallest value of :math:`C_p^{\left(k\right)}`. .. _g02ma-py2-py-references: **References** Efron, B, Hastie, T, Johnstone, I and Tibshirani, R, 2004, `Least Angle Regression`, The Annals of Statistics (Volume 32) (2), 407--499 Hastie, T, Tibshirani, R and Friedman, J, 2001, `The Elements of Statistical Learning: Data Mining, Inference and Prediction`, Springer (New York) Tibshirani, R, 1996, `Regression Shrinkage and Selection via the Lasso`, Journal of the Royal Statistics Society, Series B (Methodological) (Volume 58) (1), 267--288 Weisberg, S, 1985, `Applied Linear Regression`, Wiley See Also -------- :meth:`naginterfaces.library.examples.correg.lars_ex.main` """ raise NotImplementedError
[docs]def lars_xtx(mtype, n, dtd, dty, yty, pred=2, intcpt=1, isx=None, mnstep=None, ropt=None, io_manager=None): r""" ``lars_xtx`` performs Least Angle Regression (LARS), forward stagewise linear regression or Least Absolute Shrinkage and Selection Operator (LASSO) using cross-product matrices. .. _g02mb-py2-py-doc: For full information please refer to the NAG Library document for g02mb https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02mbf.html .. _g02mb-py2-py-parameters: **Parameters** **mtype** : int Indicates the type of model to fit. :math:`\mathrm{mtype} = 1` LARS is performed. :math:`\mathrm{mtype} = 2` Forward linear stagewise regression is performed. :math:`\mathrm{mtype} = 3` LASSO model is fit. :math:`\mathrm{mtype} = 4` A positive LASSO model is fit. **n** : int :math:`n`, the number of observations. **dtd** : float, array-like, shape :math:`\left(:, :\right)` Note: the required extent for this argument in dimension 1 is determined as follows: if :math:`\mathrm{dtd}.\mathrm{shape}[0]=1`: :math:`1`; if :math:`\mathrm{dtd}.\mathrm{shape}[1]=1`: :math:`{ m \times \left(m+1\right) / 2 }`; otherwise: :math:`m`. Note: the required extent for this argument in dimension 2 is determined as follows: if :math:`\mathrm{dtd}.\mathrm{shape}[0]=1`: :math:`{ m \times \left(m+1\right) / 2 }`; if :math:`\mathrm{dtd}.\mathrm{shape}[1]=1`: :math:`1`; otherwise: :math:`m`. :math:`D^\mathrm{T}D`, the cross-product matrix, which along with :math:`\mathrm{isx}`, defines the design matrix cross-product :math:`X^\mathrm{T}X`. If the cross-product matrix is packed into a single row or column of :math:`\mathrm{dtd}`, :math:`\mathrm{dtd}[0,\textit{i}\times \left(\textit{i}-1\right)/2+\textit{j}-1]` or :math:`\mathrm{dtd}[\textit{i}\times \left(\textit{i}-1\right)/2+\textit{j}-1,0]` must contain the cross-product of the :math:`\textit{i}`\ th and :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,m`. That is the cross-product stacked by columns as returned by :meth:`ssqmat`, for example. Otherwise :math:`\mathrm{dtd}[\textit{i}-1,\textit{j}-1]` must contain the cross-product of the :math:`\textit{i}`\ th and :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,m`. It should be noted that, even though :math:`D^\mathrm{T}D` is symmetric, the full matrix must be supplied. The matrix specified in :math:`\mathrm{dtd}` must be a valid cross-products matrix. **dty** : float, array-like, shape :math:`\left(m\right)` :math:`D^\mathrm{T}y`, the cross-product between the dependent variable, :math:`y`, and the independent variables :math:`D`. **yty** : float :math:`y^\mathrm{T}y`, the sums of squares of the dependent variable. **pred** : int, optional Indicates the type of preprocessing to perform on the cross-products involving the independent variables, i.e., those supplied in :math:`\mathrm{dtd}` and :math:`\mathrm{dty}`. :math:`\mathrm{pred} = 0` No preprocessing is performed. :math:`\mathrm{pred} = 2` Each independent variable is normalized, with the :math:`j`\ th variable scaled by :math:`1/\sqrt{x_j^\mathrm{T}x_j}`. The scaling factor used by variable :math:`j` is returned in :math:`\mathrm{b}[\textit{j}-1,\mathrm{nstep}]`. **intcpt** : int, optional Indicates the type of data preprocessing that was perform on the dependent variable, :math:`y`, prior to calling this function. :math:`\mathrm{intcpt} = 0` No preprocessing was performed. :math:`\mathrm{intcpt} = 1` The dependent variable, :math:`y`, was mean centred. **isx** : None or int, array-like, shape :math:`\left(\textit{lisx}\right)`, optional Indicates which independent variables from :math:`\mathrm{dtd}` will be included in the design matrix, :math:`X`. If :math:`\mathrm{isx}` is **None**, all variables are included in the design matrix. Otherwise :math:`\mathrm{isx}[\textit{j}-1]` must be set as follows, for :math:`\textit{j} = 1,2,\ldots,m`: :math:`\mathrm{isx}[\textit{j}-1] = 1` To indicate that the :math:`j`\ th variable, as supplied in :math:`\mathrm{dtd}`, is included in the design matrix; :math:`\mathrm{isx}[\textit{j}-1] = 0` To indicate that the :math:`j`\ th variable, as supplied in :math:`\mathrm{dtd}`, is not included in the design matrix; and :math:`p = \sum_{1}^{m}\mathrm{isx}[\textit{j}-1]`. **mnstep** : None or int, optional Note: if this argument is **None** then a default value will be used, determined as follows: if :math:`\mathrm{mtype} = 1`: :math:`{ m }`; otherwise: :math:`{ 200\times m }`. The maximum number of steps to carry out in the model fitting process. If :math:`\mathrm{mtype} = 1`, i.e., a LARS is being performed, the maximum number of steps the algorithm will take is :math:`\mathrm{min}\left(p,n\right)` if :math:`\mathrm{intcpt} = 0`, otherwise :math:`\mathrm{min}\left(p,n-1\right)`. If :math:`\mathrm{mtype} = 2`, i.e., a forward linear stagewise regression is being performed, the maximum number of steps the algorithm will take is likely to be several orders of magnitude more and is no longer bound by :math:`p` or :math:`n`. If :math:`\mathrm{mtype} = 3` or :math:`4`, i.e., a LASSO or positive LASSO model is being fit, the maximum number of steps the algorithm will take lies somewhere between that of the LARS and forward linear stagewise regression, again it is no longer bound by :math:`p` or :math:`n`. **ropt** : None or float, array-like, shape :math:`\left(\textit{lropt}\right)`, optional Options to control various aspects of the LARS algorithm. The default value will be used for :math:`\mathrm{ropt}[i-1]` if :math:`\textit{lropt} < i`, therefore, setting :math:`\textit{lropt} = 0` will use the default values for all options and :math:`\mathrm{ropt}` need not be set and may be **None**. The default value will also be used if an invalid value is supplied for a particular argument, for example, setting :math:`\mathrm{ropt}[i-1] = -1` will use the default value for argument :math:`i`. :math:`\mathrm{ropt}[0]` The minimum step size that will be taken. Default is :math:`100\times \textit{eps}` is used, where :math:`\textit{eps}` is the machine precision returned by :meth:`machine.precision <naginterfaces.library.machine.precision>`. :math:`\mathrm{ropt}[1]` General tolerance, used amongst other things, for comparing correlations. Default is :math:`\mathrm{ropt}[0]`. :math:`\mathrm{ropt}[2]` If set to :math:`1`, parameter estimates are rescaled before being returned. If set to :math:`0`, no rescaling is performed. This argument has no effect when :math:`\mathrm{pred} = 0`. Default is for the parameter estimates to be rescaled. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **ip** : int :math:`p`, number of parameter estimates. If :math:`\mathrm{isx}` is **None**, :math:`p = m`, i.e., the number of variables in :math:`\mathrm{dtd}`. Otherwise :math:`p` is the number of nonzero values in :math:`\mathrm{isx}`. **nstep** : int :math:`K`, the actual number of steps carried out in the model fitting process. **b** : float, ndarray, shape :math:`\left(\mathrm{ip}, \mathrm{nstep}+1\right)` :math:`\beta` the parameter estimates, with :math:`\mathrm{b}[j-1,k-1] = \beta_{{kj}}`, the parameter estimate for the :math:`j`\ th variable, :math:`j = 1,2,\ldots,p` at the :math:`k`\ th step of the model fitting process, :math:`k = 1,2,\ldots,\mathrm{nstep}`. By default, when :math:`\mathrm{pred} = 2` the parameter estimates are rescaled prior to being returned. If the parameter estimates are required on the normalized scale, then this can be overridden via :math:`\mathrm{ropt}`. The values held in the remaining part of :math:`\mathrm{b}` depend on the type of preprocessing performed. :math:`\begin{array}{llll}\text{If }\mathrm{pred} = 0&\mathrm{b}[j-1,\mathrm{nstep}]& = &1\text{,}\\\\\text{if }\mathrm{pred} = 2&\mathrm{b}[j-1,\mathrm{nstep}]& = & 1/ \sqrt{x_j^\mathrm{T}x_j}\text{,} \end{array}` for :math:`j = 1,2,\ldots p`. **fitsum** : float, ndarray, shape :math:`\left(6, \mathrm{mnstep}+1\right)` Summaries of the model fitting process. When :math:`k = 1,2,\ldots,\mathrm{nstep}` :math:`\mathrm{fitsum}[0,k-1]` :math:`\left\lVert \beta_k\right\rVert_1`, the sum of the absolute values of the parameter estimates for the :math:`k`\ th step of the modelling fitting process. If :math:`\mathrm{pred} = 2`, the scaled parameter estimates are used in the summation. :math:`\mathrm{fitsum}[1,k-1]` :math:`\textit{RSS}_k`, the residual sums of squares for the :math:`k`\ th step, where :math:`\textit{RSS}_k = \left\lVert y-X^\mathrm{T}\beta_k\right\rVert^2`. :math:`\mathrm{fitsum}[2,k-1]` :math:`\nu_k`, approximate degrees of freedom for the :math:`k`\ th step. :math:`\mathrm{fitsum}[3,k-1]` :math:`C_p^{\left(k\right)}`, a :math:`C_p`-type statistic for the :math:`k`\ th step, where :math:`C_p^{\left(k\right)} = \frac{\textit{RSS}_k}{\sigma^2}-n+2\nu_k`. :math:`\mathrm{fitsum}[4,k-1]` :math:`\hat{C}_k`, correlation between the residual at step :math:`k-1` and the most correlated variable not yet in the active set :math:`\mathcal{A}`, where the residual at step :math:`0` is :math:`y`. :math:`\mathrm{fitsum}[5,k-1]` :math:`\hat{\gamma }_k`, the step size used at step :math:`k`. In addition :math:`\mathrm{fitsum}[0,\mathrm{nstep}]` :math:`0`. :math:`\mathrm{fitsum}[1,\mathrm{nstep}]` :math:`\textit{RSS}_0`, the residual sums of squares for the null model, where :math:`\textit{RSS}_0 = y^\mathrm{T}y`. :math:`\mathrm{fitsum}[2,\mathrm{nstep}]` :math:`\nu_0`, the degrees of freedom for the null model, where :math:`\nu_0 = 0` if :math:`\mathrm{intcpt} = 0` and :math:`\nu_0 = 1` otherwise. :math:`\mathrm{fitsum}[3,\mathrm{nstep}]` :math:`C_p^{\left(0\right)}`, a :math:`C_p`-type statistic for the null model, where :math:`C_p^{\left(0\right)} = \frac{\textit{RSS}_0}{\sigma^2}-n+2\nu_0`. :math:`\mathrm{fitsum}[4,\mathrm{nstep}]` :math:`\sigma^2`, where :math:`\sigma^2 = \frac{{n-\textit{RSS}_K}}{\nu_K}` and :math:`K = \mathrm{nstep}`. Although the :math:`C_p` statistics described above are returned when :math:`\mathrm{errno}` = 122 they may not be meaningful due to the estimate :math:`\sigma^2` not being based on the saturated model. .. _g02mb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) On entry, :math:`\mathrm{mtype} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mtype} = 1`, :math:`2`, :math:`3` or :math:`4`. (`errno` :math:`21`) On entry, :math:`\mathrm{pred} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{pred} = 0` or :math:`2`. (`errno` :math:`31`) On entry, :math:`\mathrm{intcpt} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{intcpt} = 0` or :math:`1`. (`errno` :math:`41`) On entry, :math:`\mathrm{n} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{n} \geq 1`. (`errno` :math:`51`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 1`. (`errno` :math:`61`) The cross-product matrix supplied in :math:`\mathrm{dtd}` is not symmetric. (`errno` :math:`62`) On entry, :math:`\mathrm{dtd}[0,\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: diagonal elements of :math:`D^\mathrm{T}D` must be positive. (`errno` :math:`62`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{dtd}[i-1,i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: diagonal elements of :math:`D^\mathrm{T}D` must be positive. (`errno` :math:`81`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{isx}[i] = 0` or :math:`1`, for all :math:`i`. (`errno` :math:`82`) On entry, all values of :math:`\mathrm{isx}` are zero. Constraint: at least one value of :math:`\mathrm{isx}` must be nonzero. (`errno` :math:`91`) On entry, :math:`\textit{lisx} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lisx} = 0` or :math:`m`. (`errno` :math:`111`) On entry, :math:`\mathrm{yty} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{yty} > 0.0`. (`errno` :math:`112`) A negative value for the residual sums of squares was obtained. Check the values of :math:`\mathrm{dtd}`, :math:`\mathrm{dty}` and :math:`\mathrm{yty}`. (`errno` :math:`121`) On entry, :math:`\mathrm{mnstep} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mnstep} \geq 1`. (`errno` :math:`191`) On entry, :math:`\textit{lropt} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \textit{lropt}\leq 3`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`122`) Fitting process did not finished in :math:`\mathrm{mnstep}` steps. Try increasing the size of :math:`\mathrm{mnstep}` and supplying larger output arrays. All output is returned as documented, up to step :math:`\mathrm{mnstep}`, however, :math:`\sigma` and the :math:`C_p` statistics may not be meaningful. (`errno` :math:`171`) :math:`\sigma^2` is approximately zero and hence the :math:`C_p`-type criterion cannot be calculated. All other output is returned as documented. (`errno` :math:`172`) :math:`\nu_K = n`, therefore, sigma has been set to a large value. Output is returned as documented. (`errno` :math:`173`) Degenerate model, no variables added and :math:`\mathrm{nstep} = 0`. Output is returned as documented. .. _g02mb-py2-py-notes: **Notes** ``lars_xtx`` implements the LARS algorithm of Efron `et al.` (2004) as well as the modifications needed to perform forward stagewise linear regression and fit LASSO and positive LASSO models. Given a vector of :math:`n` observed values, :math:`y = \left\{y_i:i = 1,2,\ldots,n\right\}` and an :math:`n\times p` design matrix :math:`X`, where the :math:`j`\ th column of :math:`X`, denoted :math:`x_j`, is a vector of length :math:`n` representing the :math:`j`\ th independent variable :math:`x_j`, standardized such that :math:`\sum_{1}^{n}{x_{{ij}}} = 0`, and :math:`\sum_{1}^{n}{x_{{ij}}^2} = 1` and a set of model parameters :math:`\beta` to be estimated from the observed values, the LARS algorithm can be summarised as: (1) Set :math:`k = 1` and all coefficients to zero, that is :math:`\beta = 0`. (#) Find the variable most correlated with :math:`y`, say :math:`x_{j_1}`. Add :math:`x_{j_1}` to the 'most correlated' set :math:`\mathcal{A}`. If :math:`p = 1` go to \(8). (#) Take the largest possible step in the direction of :math:`x_{j_1}` (i.e., increase the magnitude of :math:`\beta_{j_1}`) until some other variable, say :math:`x_{j_2}`, has the same correlation with the current residual, :math:`y-x_{j_1}\beta_{j_1}`. (#) Increment :math:`k` and add :math:`x_{j_k}` to :math:`\mathcal{A}`. (#) If :math:`\left\lvert \mathcal{A}\right\rvert = p` go to \(8). (#) Proceed in the 'least angle direction', that is, the direction which is equiangular between all variables in :math:`\mathcal{A}`, altering the magnitude of the parameter estimates of those variables in :math:`\mathcal{A}`, until the :math:`k`\ th variable, :math:`x_{j_k}`, has the same correlation with the current residual. (#) Go to \(4). (#) Let :math:`K = k`. As well as being a model selection process in its own right, with a small number of modifications the LARS algorithm can be used to fit the LASSO model of Tibshirani (1996), a positive LASSO model, where the independent variables enter the model in their defined direction, forward stagewise linear regression (Hastie `et al.` (2001)) and forward selection (Weisberg (1985)). Details of the required modifications in each of these cases are given in Efron `et al.` (2004). The LASSO model of Tibshirani (1996) is given by .. math:: \textit{minimize}_{{\alpha,\beta_k \in \mathbb{R}^p}}\left\lVert y-\alpha -X^\mathrm{T}\beta_k\right\rVert^2\quad \text{ subject to }\quad \left\lVert \beta_k\right\rVert_1\leq t_k for all values of :math:`t_k`, where :math:`\alpha = \bar{y} = n^{-1}\sum_{1}^{n}{y_i}`. The positive LASSO model is the same as the standard LASSO model, given above, with the added constraint that .. math:: \beta_{{kj}}\geq 0\text{, }\quad j = 1,2,\ldots,p\text{.} Unlike the standard LARS algorithm, when fitting either of the LASSO models, variables can be dropped as well as added to the set :math:`\mathcal{A}`. Therefore, the total number of steps :math:`K` is no longer bounded by :math:`p`. Forward stagewise linear regression is an iterative procedure of the form: (1) Initialize :math:`k = 1` and the vector of residuals :math:`r_0 = y-\alpha`. (#) For each :math:`j = 1,2,\ldots,p` calculate :math:`c_j = x_j^\mathrm{T}r_{{k-1}}`. The value :math:`c_j` is, therefore, proportional to the correlation between the :math:`j`\ th independent variable and the vector of previous residual values, :math:`r_k`. (#) Calculate :math:`j_k = \textit{argmax}_j\left|c_j\right|`, the value of :math:`j` with the largest absolute value of :math:`c_j`. (#) If :math:`\left\lvert c_{j_k}\right\rvert < \epsilon` then go to \(7). (#) Update the residual values, with .. math:: r_k = r_{{k-1}}+\delta \text{ }\mathrm{sign}\left(c_{j_k}\right)x_{j_k} where :math:`\delta` is a small constant and :math:`\mathrm{sign}\left(c_{j_k}\right) = -1` when :math:`c_{j_k} < 0` and :math:`1` otherwise. (#) Increment :math:`k` and go to \(2). (#) Set :math:`K = k`. If the largest possible step were to be taken, that is :math:`\delta = \left\lvert c_{j_k}\right\rvert` then forward stagewise linear regression reverts to the standard forward selection method as implemented in :meth:`linregm_fit_onestep`. The LARS procedure results in :math:`K` models, one for each step of the fitting process. In order to aid in choosing which is the most suitable Efron `et al.` (2004) introduced a :math:`C_p`-type statistic given by .. math:: C_p^{\left(k\right)} = \frac{\left\lVert y-X^\mathrm{T}\beta_k\right\rVert^2}{\sigma^2}-n+2\nu_k\text{,} where :math:`\nu_k` is the approximate degrees of freedom for the :math:`k`\ th step and .. math:: \sigma^2 = \frac{{n-y^\mathrm{T}y}}{\nu_K}\text{.} One way of choosing a model is, therefore, to take the one with the smallest value of :math:`C_p^{\left(k\right)}`. .. _g02mb-py2-py-references: **References** Efron, B, Hastie, T, Johnstone, I and Tibshirani, R, 2004, `Least Angle Regression`, The Annals of Statistics (Volume 32) (2), 407--499 Hastie, T, Tibshirani, R and Friedman, J, 2001, `The Elements of Statistical Learning: Data Mining, Inference and Prediction`, Springer (New York) Tibshirani, R, 1996, `Regression Shrinkage and Selection via the Lasso`, Journal of the Royal Statistics Society, Series B (Methodological) (Volume 58) (1), 267--288 Weisberg, S, 1985, `Applied Linear Regression`, Wiley See Also -------- :meth:`naginterfaces.library.examples.correg.lars_param_ex.main` """ raise NotImplementedError
[docs]def lars_param(b, fitsum, ktype, nk): r""" ``lars_param`` calculates additional parameter estimates following Least Angle Regression (LARS), forward stagewise linear regression or Least Absolute Shrinkage and Selection Operator (LASSO) as performed by :meth:`lars` and :meth:`lars_xtx`. .. _g02mc-py2-py-doc: For full information please refer to the NAG Library document for g02mc https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02mcf.html .. _g02mc-py2-py-parameters: **Parameters** **b** : float, array-like, shape :math:`\left(\textit{ip}, \textit{nstep}+1\right)` :math:`\beta` the parameter estimates, as returned by :meth:`lars` and :meth:`lars_xtx`, with :math:`\mathrm{b}[\textit{j}-1,k-1] = \beta_{{k\textit{j}}}`, the parameter estimate for the :math:`\textit{j}`\ th variable, for :math:`\textit{j} = 1,2,\ldots,p`, at the :math:`k`\ th step of the model fitting process. **fitsum** : float, array-like, shape :math:`\left(6, \textit{nstep}+1\right)` Summaries of the model fitting process, as returned by :meth:`lars` and :meth:`lars_xtx`. **ktype** : int Indicates what target values are held in :math:`\mathrm{nk}`. :math:`\mathrm{ktype} = 1` :math:`\mathrm{nk}` holds (fractional) LARS step numbers. :math:`\mathrm{ktype} = 2` :math:`\mathrm{nk}` holds values for :math:`L_1` norm of the (scaled) parameters. :math:`\mathrm{ktype} = 3` :math:`\mathrm{nk}` holds ratios with respect to the largest (scaled) :math:`L_1` norm. :math:`\mathrm{ktype} = 4` :math:`\mathrm{nk}` holds values for the :math:`L_1` norm of the (unscaled) parameters. :math:`\mathrm{ktype} = 5` :math:`\mathrm{nk}` holds ratios with respect to the largest (unscaled) :math:`L_1` norm. If :meth:`lars` was called with :math:`{\textit{pred}} = 0` or :math:`1` or :meth:`lars_xtx` was called with :math:`{\textit{pred}} = 0` then the model fitting routine did not rescale the independent variables, :math:`X`, prior to fitting the model and, therefore, there is no difference between :math:`\mathrm{ktype} = 2` or :math:`3` and :math:`\mathrm{ktype} = 4` or :math:`5`. **nk** : float, array-like, shape :math:`\left(\textit{lnk}\right)` Target values used for predicting the new set of parameter estimates. **Returns** **nb** : float, ndarray, shape :math:`\left(\textit{ip}, \textit{lnk}\right)` :math:`\tilde{\beta }` the predicted parameter estimates, with :math:`\mathrm{b}[j-1,i-1] = \tilde{\beta }_{{ij}}`, the parameter estimate for variable :math:`j`, :math:`j = 1,2,\ldots,p` at the point in the fitting process associated with :math:`\mathrm{nk}[i-1]`, :math:`i = 1,2,\ldots,\textit{lnk}`. .. _g02mc-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) On entry, :math:`\textit{nstep} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{nstep} \geq 0`. (`errno` :math:`21`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip} \geq 1`. (`errno` :math:`31`) :math:`\mathrm{b}` has been corrupted since the last call to :meth:`lars` or :meth:`lars_xtx`. (`errno` :math:`51`) :math:`\mathrm{fitsum}` has been corrupted since the last call to :meth:`lars` or :meth:`lars_xtx`. (`errno` :math:`61`) On entry, :math:`\mathrm{ktype} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ktype} = 1`, :math:`2`, :math:`3`, :math:`4` or :math:`5`. (`errno` :math:`71`) On entry, :math:`\mathrm{ktype} = 1`, :math:`\mathrm{nk}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\textit{nstep} = \langle\mathit{\boldsymbol{value}}\rangle` Constraint: :math:`0\leq \mathrm{nk}[i]\leq \textit{nstep}`, for all :math:`i`. (`errno` :math:`72`) On entry, :math:`\mathrm{ktype} = 2`, :math:`\mathrm{nk}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\textit{nstep} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{fitsum}[0,\textit{nstep}-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \mathrm{nk}[i]\leq \mathrm{fitsum}[0,\textit{nstep}-1]`, for all :math:`i`. (`errno` :math:`73`) On entry, :math:`\mathrm{ktype} = 3` or :math:`5`, :math:`\mathrm{nk}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \mathrm{nk}[i]\leq 1`, for all :math:`i`. (`errno` :math:`74`) On entry, :math:`\mathrm{ktype} = 4`, :math:`\mathrm{nk}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\left\lVert \beta_K\right\rVert_1 = \langle\mathit{\boldsymbol{value}}\rangle` Constraint: :math:`0\leq \mathrm{nk}[i]\leq {\left\lVert \beta_K\right\rVert_1}`, for all :math:`i`. (`errno` :math:`81`) On entry, :math:`\textit{lnk} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lnk} \geq 1`. .. _g02mc-py2-py-notes: **Notes** :meth:`lars` and :meth:`lars_xtx` fit either a LARS, forward stagewise linear regression, LASSO or positive LASSO model to a vector of :math:`n` observed values, :math:`y = \left\{y_i:i = 1,2,\ldots,n\right\}` and an :math:`n\times p` design matrix :math:`X`, where the :math:`j`\ th column of :math:`X` is given by the :math:`j`\ th independent variable :math:`x_j`. The models are fit using the LARS algorithm of Efron `et al.` (2004). [figure omitted] The full solution path for all four of these models follow a similar pattern where the parameter estimate for a given variable is piecewise linear. One such path, for a LARS model with six variables :math:`\left(p = 6\right)` can be seen in Figure [label omitted]. Both :meth:`lars` and :meth:`lars_xtx` return the vector of :math:`p` parameter estimates, :math:`\beta_k`, at :math:`K` points along this path (so :math:`k = 1,2,\ldots,K`). Each point corresponds to a step of the LARS algorithm. The number of steps taken depends on the model being fitted. In the case of a LARS model, :math:`K = p` and each step corresponds to a new variable being included in the model. In the case of the LASSO models, each step corresponds to either a new variable being included in the model or an existing variable being removed from the model; the value of :math:`K` is, therefore, no longer bound by the number of parameters. For forward stagewise linear regression, each step no longer corresponds to the addition or removal of a variable;, therefore, the number of possible steps is often markedly greater than for a corresponding LASSO model. ``lars_param`` uses the piecewise linear nature of the solution path to predict the parameter estimates, :math:`\tilde{\beta }`, at a different point on this path. The location of the solution can either be defined in terms of a (fractional) step number or a function of the :math:`L_1` norm of the parameter estimates. .. _g02mc-py2-py-references: **References** Efron, B, Hastie, T, Johnstone, I and Tibshirani, R, 2004, `Least Angle Regression`, The Annals of Statistics (Volume 32) (2), 407--499 Hastie, T, Tibshirani, R and Friedman, J, 2001, `The Elements of Statistical Learning: Data Mining, Inference and Prediction`, Springer (New York) Tibshirani, R, 1996, `Regression Shrinkage and Selection via the Lasso`, Journal of the Royal Statistics Society, Series B (Methodological) (Volume 58) (1), 267--288 Weisberg, S, 1985, `Applied Linear Regression`, Wiley See Also -------- :meth:`naginterfaces.library.examples.correg.lars_param_ex.main` """ raise NotImplementedError
[docs]def quantile_linreg_easy(x, y, tau, io_manager=None): r""" ``quantile_linreg_easy`` performs a multiple linear quantile regression, returning the parameter estimates and associated confidence limits based on an assumption of Normal, independent, identically distributed errors. ``quantile_linreg_easy`` is a simplified version of :meth:`quantile_linreg`. .. _g02qf-py2-py-doc: For full information please refer to the NAG Library document for g02qf https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02qff.html .. _g02qf-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n, m\right)` :math:`X`, the design matrix, with the :math:`\textit{i}`\ th value for the :math:`\textit{j}`\ th variate supplied in :math:`\mathrm{x}[\textit{i}-1,\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`. **y** : float, array-like, shape :math:`\left(n\right)` :math:`y`, the observations on the dependent variable. **tau** : float, array-like, shape :math:`\left(\textit{ntau}\right)` The vector of quantiles of interest. A separate model is fitted to each quantile. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **df** : float The degrees of freedom given by :math:`n-k`, where :math:`n` is the number of observations and :math:`k` is the rank of the cross-product matrix :math:`X^\mathrm{T}X`. **b** : float, ndarray, shape :math:`\left(m, \textit{ntau}\right)` :math:`\hat{\beta }`, the estimates of the parameters of the regression model, with :math:`\mathrm{b}[j-1,l-1]` containing the coefficient for the variable in column :math:`j` of :math:`\mathrm{x}`, estimated for :math:`\tau = \mathrm{tau}[l-1]`. **bl** : float, ndarray, shape :math:`\left(m, \textit{ntau}\right)` :math:`\hat{\beta }_L`, the lower limit of a :math:`95\%` confidence interval for :math:`\hat{\beta }`, with :math:`\mathrm{bl}[j-1,l-1]` holding the lower limit associated with :math:`\mathrm{b}[j-1,l-1]`. **bu** : float, ndarray, shape :math:`\left(m, \textit{ntau}\right)` :math:`\hat{\beta }_U`, the upper limit of a :math:`95\%` confidence interval for :math:`\hat{\beta }`, with :math:`\mathrm{bu}[j-1,l-1]` holding the upper limit associated with :math:`\mathrm{b}[j-1,l-1]`. **info** : int, ndarray, shape :math:`\left(\textit{ntau}\right)` :math:`\mathrm{info}[l]` holds additional information concerning the model fitting and confidence limit calculations when :math:`\tau = \mathrm{tau}[l]`. .. rst-class:: nag-rules-none +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |Code |Warning | +==========+================================================================================================================================================================================+ |:math:`0` |Model fitted and confidence limits calculated successfully. | +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`1` |The function did not converge whilst calculating the parameter estimates. The returned values are based on the estimate at the last iteration. | +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`2` |A singular matrix was encountered during the optimization. The model was not fitted for this value of :math:`\tau`. | +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`8` |The function did not converge whilst calculating the confidence limits. The returned limits are based on the estimate at the last iteration. | +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`16`|Confidence limits for this value of :math:`\tau` could not be calculated. The returned upper and lower limits are set to a large positive and large negative value respectively.| +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ It is possible for multiple warnings to be applicable to a single model. In these cases the value returned in :math:`\mathrm{info}` is the sum of the corresponding individual nonzero warning codes. .. _g02qf-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`21`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq m < n`. (`errno` :math:`51`) On entry, :math:`\textit{ntau} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ntau} \geq 1`. (`errno` :math:`61`) On entry, :math:`\mathrm{tau}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\sqrt{\epsilon } < \mathrm{tau}[\textit{l}-1] < 1-\sqrt{\epsilon }` where :math:`\epsilon` is the machine precision returned by :meth:`machine.precision <naginterfaces.library.machine.precision>`, for all :math:`\textit{ntau}`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`111`) A potential problem occurred whilst fitting the model(s). Additional information has been returned in :math:`\mathrm{info}`. .. _g02qf-py2-py-notes: **Notes** Given a vector of :math:`n` observed values, :math:`y = \left\{y_i:i = 1,2,\ldots,n\right\}`, an :math:`n\times p` design matrix :math:`X`, a column vector, :math:`x`, of length :math:`p` holding the :math:`i`\ th row of :math:`X` and a quantile :math:`\tau \in \left(0,1\right)`, ``quantile_linreg_easy`` estimates the :math:`p`-element vector :math:`\beta` as the solution to .. math:: \textit{minimize}_{{\beta \in \mathbb{R}^p}}\sum_{1}^{n}{\rho_{\tau }\left(y_i - x_i^\mathrm{T}\beta \right)} where :math:`\rho_{\tau }` is the piecewise linear loss function :math:`\rho_{\tau }\left(z\right) = z\left(\tau -I\left(z < 0\right)\right)`, and :math:`I\left(z < 0\right)` is an indicator function taking the value :math:`1` if :math:`z < 0` and :math:`0` otherwise. ``quantile_linreg_easy`` assumes Normal, independent, identically distributed (IID) errors and calculates the asymptotic covariance matrix from .. math:: \Sigma = \frac{{\tau \left(1-\tau \right)}}{n}\left(s\left(\tau \right)\right)^2 \left(X^\mathrm{T}X\right)^{-1} where :math:`s` is the sparsity function, which is estimated from the residuals, :math:`r_i = y_i - x_i^\mathrm{T}\hat{\beta }` (see Koenker (2005)). Given an estimate of the covariance matrix, :math:`\hat{\Sigma }`, lower, :math:`\hat{\beta }_L`, and upper, :math:`\hat{\beta }_U`, limits for a :math:`95\%` confidence interval are calculated for each of the :math:`p` parameters, via .. math:: \hat{\beta }_{{Li}} = \hat{\beta }_i-t_{{n-p,0.975}}\sqrt{\hat{\Sigma }_{{ii}}},\hat{\beta }_{{Ui}} = \hat{\beta }_i+t_{{n-p,0.975}}\sqrt{\hat{\Sigma }_{{ii}}} where :math:`t_{{n-p,0.975}}` is the :math:`97.5` percentile of the Student's :math:`t` distribution with :math:`n-k` degrees of freedom, where :math:`k` is the rank of the cross-product matrix :math:`X^\mathrm{T}X`. Further details of the algorithms used by ``quantile_linreg_easy`` can be found in the documentation for :meth:`quantile_linreg`. .. _g02qf-py2-py-references: **References** Koenker, R, 2005, `Quantile Regression`, Econometric Society Monographs, Cambridge University Press, New York """ raise NotImplementedError
[docs]def quantile_linreg(sorder, dat, isx, y, tau, intcpt='Y', wt=None, b=None, comm=None, statecomm=None, io_manager=None): r""" ``quantile_linreg`` performs a multiple linear quantile regression. Parameter estimates and, if required, confidence limits, covariance matrices and residuals are calculated. ``quantile_linreg`` may be used to perform a weighted quantile regression. A simplified interface for ``quantile_linreg`` is provided by :meth:`quantile_linreg_easy`. Note: this function uses optional algorithmic parameters, see also: :meth:`optset`, :meth:`optget`. .. _g02qg-py2-py-doc: For full information please refer to the NAG Library document for g02qg https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02qgf.html .. _g02qg-py2-py-parameters: **Parameters** **sorder** : int Determines the storage order of variates supplied in :math:`\mathrm{dat}`. **dat** : float, array-like, shape :math:`\left(:, :\right)` Note: the required extent for this argument in dimension 1 is determined as follows: if :math:`\mathrm{sorder}=1`: :math:`n`; otherwise: :math:`m`. Note: the required extent for this argument in dimension 2 is determined as follows: if :math:`\mathrm{sorder}=1`: :math:`m`; otherwise: :math:`n`. The :math:`\textit{i}`\ th value for the :math:`\textit{j}`\ th variate, for :math:`\textit{j} = 1,2,\ldots,m`, for :math:`\textit{i} = 1,2,\ldots,n`, must be supplied in :math:`\mathrm{dat}[i-1,j-1]` if :math:`\mathrm{sorder} = 1`, and :math:`\mathrm{dat}[j-1,i-1]` if :math:`\mathrm{sorder} = 2`. The design matrix :math:`X` is constructed from :math:`\mathrm{dat}`, :math:`\mathrm{isx}` and :math:`\mathrm{intcpt}`. **isx** : int, array-like, shape :math:`\left(m\right)` Indicates which independent variables are to be included in the model. :math:`\mathrm{isx}[j-1] = 0` The :math:`j`\ th variate, supplied in :math:`\mathrm{dat}`, is not included in the regression model. :math:`\mathrm{isx}[j-1] = 1` The :math:`j`\ th variate, supplied in :math:`\mathrm{dat}`, is included in the regression model. **y** : float, array-like, shape :math:`\left(n\right)` :math:`y`, the observations on the dependent variable. **tau** : float, array-like, shape :math:`\left(\textit{ntau}\right)` The vector of quantiles of interest. A separate model is fitted to each quantile. **intcpt** : str, length 1, optional Indicates whether an intercept will be included in the model. The intercept is included by adding a column of ones as the first column in the design matrix, :math:`X`. :math:`\mathrm{intcpt} = \text{‘Y'}` An intercept will be included in the model. :math:`\mathrm{intcpt} = \text{‘N'}` An intercept will not be included in the model. **wt** : None or float, array-like, shape :math:`\left(:\right)`, optional Note: the required length for this argument is determined as follows: if :math:`\mathrm{wt}\text{ is not }\mathbf{None}`: :math:`n`; otherwise: :math:`0`. If not **None**, :math:`\mathrm{wt}` must contain the diagonal elements of the weight matrix :math:`W`. If weights are not provided then :math:`\mathrm{wt}` must be set to **None**. When :math:`\text{‘Drop Zero Weights'} = \texttt{'YES'}` If :math:`\mathrm{wt}[i-1] = 0.0`, the :math:`i`\ th observation is not included in the model, in which case the effective number of observations, :math:`n`, is the number of observations with nonzero weights. If :math:`\text{‘Return Residuals'} = \texttt{'YES'}`, the values of :math:`\mathrm{res}` will be set to zero for observations with zero weights. :math:`\text{‘Drop Zero Weights'} = \texttt{'NO'}` All observations are included in the model and the effective number of observations is :math:`\textit{n}`, i.e., :math:`n = n`. **b** : None or float, array-like, shape :math:`\left(\textit{ip}, \textit{ntau}\right)`, optional If :math:`\text{‘Calculate Initial Values'} = \texttt{'NO'}`, :math:`\mathrm{b}[\textit{i}-1,\textit{l}-1]` must hold an initial estimates for :math:`\hat{\beta }_{\textit{i}}`, for :math:`\textit{l} = 1,2,\ldots,\textit{ntau}`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. If :math:`\text{‘Calculate Initial Values'} = \texttt{'YES'}`, :math:`\mathrm{b}` need not be set. **comm** : None or dict, communication object, optional Communication structure. If not **None**, this argument must have been initialized by a prior call to :meth:`optset`. **statecomm** : None or dict, RNG communication object, optional, modified in place RNG communication structure. If :math:`\text{‘Interval Method'} = \texttt{'BOOTSTRAP XY'}`, this argument must have been initialized by a prior call to :meth:`rand.init_repeat <naginterfaces.library.rand.init_repeat>` or :meth:`rand.init_nonrepeat <naginterfaces.library.rand.init_nonrepeat>`. **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **df** : float The degrees of freedom given by :math:`n-k`, where :math:`n` is the effective number of observations and :math:`k` is the rank of the cross-product matrix :math:`X^\mathrm{T}X`. **b** : float, ndarray, shape :math:`\left(\textit{ip}, \textit{ntau}\right)` :math:`\mathrm{b}[\textit{i}-1,\textit{l}-1]`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`, contains the estimates of the parameters of the regression model, :math:`\hat{\beta }`, estimated for :math:`\tau = \mathrm{tau}[\textit{l}-1]`. If :math:`\mathrm{intcpt} = \text{‘Y'}`, :math:`\mathrm{b}[0,l-1]` will contain the estimate corresponding to the intercept and :math:`\mathrm{b}[i,l-1]` will contain the coefficient of the :math:`j`\ th variate contained in :math:`\mathrm{dat}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th nonzero value in the array :math:`\mathrm{isx}`. If :math:`\mathrm{intcpt} = \text{‘N'}`, :math:`\mathrm{b}[i-1,l-1]` will contain the coefficient of the :math:`j`\ th variate contained in :math:`\mathrm{dat}`, where :math:`\mathrm{isx}[j-1]` is the :math:`i`\ th nonzero value in the array :math:`\mathrm{isx}`. **bl** : None or float, ndarray, shape :math:`\left(\textit{ip}, :\right)` If :math:`\text{‘Interval Method'} \neq \texttt{'NONE'}`, :math:`\mathrm{bl}[i-1,l-1]` contains the lower limit of an :math:`\left(100\times \alpha \right)\%` confidence interval for :math:`\mathrm{b}[\textit{i}-1,\textit{l}-1]`, for :math:`\textit{l} = 1,2,\ldots,\textit{ntau}`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. If :math:`\text{‘Interval Method'} = \texttt{'NONE'}`, :math:`\mathrm{bl}` is not referenced. The method used for calculating the interval is controlled by the options 'Interval Method' and 'Bootstrap Interval Method'. The size of the interval, :math:`\alpha`, is controlled by the option 'Significance Level'. **bu** : None or float, ndarray, shape :math:`\left(\textit{ip}, :\right)` If :math:`\text{‘Interval Method'} \neq \texttt{'NONE'}`, :math:`\mathrm{bu}[i-1,l-1]` contains the upper limit of an :math:`\left(100\times \alpha \right)\%` confidence interval for :math:`\mathrm{b}[\textit{i}-1,\textit{l}-1]`, for :math:`\textit{l} = 1,2,\ldots,\textit{ntau}`, for :math:`\textit{i} = 1,2,\ldots,\textit{ip}`. If :math:`\text{‘Interval Method'} = \texttt{'NONE'}`, :math:`\mathrm{bu}` is not referenced. The method used for calculating the interval is controlled by the options 'Interval Method' and 'Bootstrap Interval Method'. The size of the interval, :math:`\alpha` is controlled by the option 'Significance Level'. **ch** : None or float, ndarray, shape :math:`\left(\textit{ip}, \textit{ip}, :\right)` Depending on the supplied options, :math:`\mathrm{ch}` will either not be referenced, hold an estimate of the upper triangular part of the covariance matrix, :math:`\Sigma`, or an estimate of the upper triangular parts of :math:`nJ_n` and :math:`n^{-1}H_n^{-1}`. If :math:`\text{‘Interval Method'} = \texttt{'NONE'}` or :math:`\text{‘Matrix Returned'} = \texttt{'NONE'}`, :math:`\mathrm{ch}` is not referenced. If :math:`\text{‘Interval Method'} = \texttt{'BOOTSTRAP XY'}` or :math:`\texttt{'IID'}` and :math:`\text{‘Matrix Returned'} = \texttt{'H INVERSE'}`, :math:`\mathrm{ch}` is not referenced. Otherwise, for :math:`i,j = 1,2,…,\textit{ip},j\geq i` and :math:`l = 1,2,…,\textit{ntau}`: If :math:`\text{‘Matrix Returned'} = \texttt{'COVARIANCE'}`, :math:`\mathrm{ch}[i-1,j-1,l-1]` holds an estimate of the covariance between :math:`\mathrm{b}[i-1,l-1]` and :math:`\mathrm{b}[j-1,l-1]`. If :math:`\text{‘Matrix Returned'} = \texttt{'H INVERSE'}`, :math:`\mathrm{ch}[i-1,j-1,0]` holds an estimate of the :math:`\left(i, j\right)`\ th element of :math:`nJ_n` and :math:`\mathrm{ch}[i-1,j-1,{l+1}-1]` holds an estimate of the :math:`\left(i, j\right)`\ th element of :math:`n^{-1}H_n^{-1}`, for :math:`\tau = \mathrm{tau}[l-1]`. The method used for calculating :math:`\Sigma` and :math:`H_n^{-1}` is controlled by the option 'Interval Method'. **res** : None or float, ndarray, shape :math:`\left(n, :\right)` If :math:`\text{‘Return Residuals'} = \texttt{'YES'}`, :math:`\mathrm{res}[\textit{i}-1,\textit{l}-1]` holds the (weighted) residuals, :math:`r_{\textit{i}}`, for :math:`\tau = \mathrm{tau}[\textit{l}-1]`, for :math:`\textit{l} = 1,2,\ldots,\textit{ntau}`, for :math:`\textit{i} = 1,2,\ldots,n`. If :math:`\mathrm{wt}\text{ is not }\mathbf{None}` and :math:`\text{‘Drop Zero Weights'} = \texttt{'YES'}`, the value of :math:`\mathrm{res}` will be set to zero for observations with zero weights. If :math:`\text{‘Return Residuals'} = \texttt{'NO'}`, :math:`\mathrm{res}` is returned as **None**. **info** : int, ndarray, shape :math:`\left(\textit{ntau}\right)` :math:`\mathrm{info}[i]` holds additional information concerning the model fitting and confidence limit calculations when :math:`\tau = \mathrm{tau}[i]`. .. rst-class:: nag-rules-none +----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |Code |Warning | +==========+=================================================================================================================================================================================================================================================================================================+ |:math:`0` |Model fitted and confidence limits (if requested) calculated successfully | +----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`1` |The function did not converge. The returned values are based on the estimate at the last iteration. Try increasing 'Iteration Limit' whilst calculating the parameter estimates or relaxing the definition of convergence by increasing 'Tolerance'. | +----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`2` |A singular matrix was encountered during the optimization. The model was not fitted for this value of :math:`\tau`. | +----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`4` |Some truncation occurred whilst calculating the confidence limits for this value of :math:`\tau`. See `Algorithmic Details <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02qgf.html#algdetails>`__ for details. The returned upper and lower limits may be narrower than specified.| +----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`8` |The function did not converge whilst calculating the confidence limits. The returned limits are based on the estimate at the last iteration. Try increasing 'Iteration Limit'. | +----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |:math:`16`|Confidence limits for this value of :math:`\tau` could not be calculated. The returned upper and lower limits are set to a large positive and large negative value respectively as defined by the option 'Big'. | +----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ It is possible for multiple warnings to be applicable to a single model. In these cases the value returned in :math:`\mathrm{info}` is the sum of the corresponding individual nonzero warning codes. .. _g02qg-py2-py-other_params: **Other Parameters** **'Band Width Alpha'** : float Default :math:`\text{} = 1.0` A multiplier used to construct the parameter :math:`\alpha_b` used when calculating the Sheather--Hall bandwidth (see :ref:`Notes <g02qg-py2-py-notes>`), with :math:`\alpha_b = \left(1-\alpha \right)\times \text{‘Band Width Alpha'}`. Here, :math:`\alpha` is the 'Significance Level'. **'Band Width Method'** : str Default :math:`\text{} = \texttt{'SHEATHER HALL'}` The method used to calculate the bandwidth used in the calculation of the asymptotic covariance matrix :math:`\Sigma` and :math:`H^{-1}` if :math:`\text{‘Interval Method'} = \texttt{'HKS'}`, :math:`\texttt{'KERNEL'}` or :math:`\texttt{'IID'}` (see :ref:`Notes <g02qg-py2-py-notes>`). **'Big'** : float Default :math:`\text{} = 10.0^{20}` This argument should be set to something larger than the biggest value supplied in :math:`\mathrm{dat}` and :math:`\mathrm{y}`. **'Bootstrap Interval Method'** : str Default :math:`\text{} = \texttt{'QUANTILE'}` If :math:`\text{‘Interval Method'} = \texttt{'BOOTSTRAP XY'}`, 'Bootstrap Interval Method' controls how the confidence intervals are calculated from the bootstrap estimates. :math:`\text{‘Bootstrap Interval Method'} = \texttt{'T'}` :math:`t` intervals are calculated. That is, the covariance matrix, :math:`\Sigma = \left\{\sigma_{{ij}}:i,j = 1,2,\ldots,p\right\}` is calculated from the bootstrap estimates and the limits calculated as :math:`\beta_i\pm t_{\left({n-p}, {\left(1+\alpha \right)/2}\right)}\sigma_{{ii}}` where :math:`t_{\left({n-p}, {\left(1+\alpha \right)/2}\right)}` is the :math:`\left(1+\alpha \right)/2` percentage point from a Student's :math:`t` distribution on :math:`n-p` degrees of freedom, :math:`n` is the effective number of observations and :math:`\alpha` is given by the option 'Significance Level'. :math:`\text{‘Bootstrap Interval Method'} = \texttt{'QUANTILE'}` Quantile intervals are calculated. That is, the upper and lower limits are taken as the :math:`\left(1+\alpha \right)/2` and :math:`\left(1-\alpha \right)/2` quantiles of the bootstrap estimates, as calculated using :meth:`stat.quantiles <naginterfaces.library.stat.quantiles>`. **'Bootstrap Iterations'** : int Default :math:`\text{} = 100` The number of bootstrap samples used to calculate the confidence limits and covariance matrix (if requested) when :math:`\text{‘Interval Method'} = \texttt{'BOOTSTRAP XY'}`. **'Bootstrap Monitoring'** : str Default :math:`\text{} = \texttt{'NO'}` If :math:`\text{‘Bootstrap Monitoring'} = \texttt{'YES'}` and :math:`\text{‘Interval Method'} = \texttt{'BOOTSTRAP XY'}`, the parameter estimates for each of the bootstrap samples are displayed. This information is sent to the unit number specified by 'Unit Number'. **'Calculate Initial Values'** : str Default :math:`\text{} = \texttt{'YES'}` If :math:`\text{‘Calculate Initial Values'} = \texttt{'YES'}` then the initial values for the regression parameters, :math:`\beta`, are calculated from the data. Otherwise they must be supplied in :math:`\mathrm{b}`. **'Defaults'** : valueless This special keyword is used to reset all options to their default values. **'Drop Zero Weights'** : str Default :math:`\text{} = \texttt{'YES'}` If a weighted regression is being performed and :math:`\text{‘Drop Zero Weights'} = \texttt{'YES'}` then observations with zero weight are dropped from the analysis. Otherwise such observations are included. **'Epsilon'** : float Default :math:`\text{} = \sqrt{\epsilon }` :math:`\epsilon_u`, the tolerance used when calculating the covariance matrix and the initial values for :math:`u` and :math:`v`. For additional details see `Calculation of Covariance Matrix <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02qgf.html#ad-covariancematrix>`__ and `Additional information <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02qgf.html#ad-additional>`__ respectively. **'Interval Method'** : str Default :math:`\text{} = \texttt{'IID'}` The value of 'Interval Method' controls whether confidence limits are returned in :math:`\mathrm{bl}` and :math:`\mathrm{bu}` and how these limits are calculated. This argument also controls how the matrices returned in :math:`\mathrm{ch}` are calculated. :math:`\text{‘Interval Method'} = \texttt{'NONE'}` No limits are calculated and :math:`\mathrm{bl}`, :math:`\mathrm{bu}` and :math:`\mathrm{ch}` are not referenced. :math:`\text{‘Interval Method'} = \texttt{'KERNEL'}` The Powell Sandwich method with a Gaussian kernel is used. :math:`\text{‘Interval Method'} = \texttt{'HKS'}` The Hendricks--Koenker Sandwich is used. :math:`\text{‘Interval Method'} = \texttt{'IID'}` The errors are assumed to be identical, and independently distributed. :math:`\text{‘Interval Method'} = \texttt{'BOOTSTRAP XY'}` A bootstrap method is used, where sampling is done on the pair :math:`\left(y_i, x_i\right)`. The number of bootstrap samples is controlled by the argument 'Bootstrap Iterations' and the type of interval constructed from the bootstrap samples is controlled by 'Bootstrap Interval Method'. **'Iteration Limit'** : int Default :math:`\text{} = 100` The maximum number of iterations to be performed by the interior point optimization algorithm. **'Matrix Returned'** : str Default :math:`\text{} = \texttt{'NONE'}` The value of 'Matrix Returned' controls the type of matrices returned in :math:`\mathrm{ch}`. If :math:`\text{‘Interval Method'} = \texttt{'NONE'}`, this argument is ignored and :math:`\mathrm{ch}` is not referenced. Otherwise: :math:`\text{‘Matrix Returned'} = \texttt{'NONE'}` No matrices are returned and :math:`\mathrm{ch}` is not referenced. :math:`\text{‘Matrix Returned'} = \texttt{'COVARIANCE'}` The covariance matrices are returned. :math:`\text{‘Matrix Returned'} = \texttt{'H INVERSE'}` If :math:`\text{‘Interval Method'} = \texttt{'KERNEL'}` or :math:`\texttt{'HKS'}`, the matrices :math:`J` and :math:`H^{-1}` are returned. Otherwise no matrices are returned and :math:`\mathrm{ch}` is not referenced. The matrices returned are calculated as described in :ref:`Notes <g02qg-py2-py-notes>`, with the algorithm used specified by 'Interval Method'. In the case of :math:`\text{‘Interval Method'} = \texttt{'BOOTSTRAP XY'}` the covariance matrix is calculated directly from the bootstrap estimates. **'Monitoring'** : str Default :math:`\text{} = \texttt{'NO'}` If :math:`\text{‘Monitoring'} = \texttt{'YES'}` then the duality gap is displayed at each iteration of the interior point optimization algorithm. In addition, the final estimates for :math:`\beta` are also displayed. The monitoring information is sent to the unit number specified by 'Unit Number'. **'QR Tolerance'** : float Default :math:`\text{} = \epsilon^{0.9}` The tolerance used to calculate the rank, :math:`k`, of the :math:`p\times p` cross-product matrix, :math:`X^\mathrm{T}X`. Letting :math:`Q` be the orthogonal matrix obtained from a :math:`QR` decomposition of :math:`X^\mathrm{T}X`, then the rank is calculated by comparing :math:`Q_{{ii}}` with :math:`Q_{{11}}\times \text{‘QR Tolerance'}`. If the cross-product matrix is rank deficient, the parameter estimates for the :math:`p-k` columns with the smallest values of :math:`Q_{{ii}}` are set to zero, along with the corresponding entries in :math:`\mathrm{bl}`, :math:`\mathrm{bu}` and :math:`\mathrm{ch}`, if returned. This is equivalent to dropping these variables from the model. Details on the :math:`QR` decomposition used can be found in :meth:`lapackeig.dgeqp3 <naginterfaces.library.lapackeig.dgeqp3>`. **'Return Residuals'** : str Default :math:`\text{} = \texttt{'NO'}` If :math:`\text{‘Return Residuals'} = \texttt{'YES'}`, the residuals are returned in :math:`\mathrm{res}`. Otherwise :math:`\mathrm{res}` is not referenced. **'Sigma'** : float Default :math:`\text{} = 0.99995` The scaling factor used when calculating the affine scaling step size (see equation `[equation] <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02qgf.html#qr_gamma_eqn>`__). **'Significance Level'** : float Default :math:`\text{} = 0.95` :math:`\alpha`, the size of the confidence interval whose limits are returned in :math:`\mathrm{bl}` and :math:`\mathrm{bu}`. **'Tolerance'** : float Default :math:`\text{} = \sqrt{\epsilon }` Convergence tolerance. The optimization is deemed to have converged if the duality gap is less than 'Tolerance' (see `Update and convergence <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02qgf.html#ad-update>`__). **'Unit Number'** : int Default :math:`= \text{advisory message unit number}` The unit number to which any monitoring information is sent. .. _g02qg-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) On entry, :math:`\mathrm{sorder} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{sorder} = 1` or :math:`2`. (`errno` :math:`21`) On entry, :math:`\mathrm{intcpt} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{intcpt} = \text{‘N'} \text{ or } \text{‘Y'}`. (`errno` :math:`41`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`51`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m \geq 0`. (`errno` :math:`81`) On entry, :math:`\mathrm{isx}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{isx}[i] = 0` or :math:`1`, for all :math:`i`. (`errno` :math:`91`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \textit{ip} < n`. (`errno` :math:`92`) On entry, :math:`\textit{ip}` is not consistent with :math:`\mathrm{isx}` or :math:`\mathrm{intcpt}`: :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\text{expected value} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`111`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{wt}[i]\geq 0.0`, for all :math:`i`. (`errno` :math:`112`) On entry, :math:`\text{effective number of observations} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\text{effective number of observations}\geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`121`) On entry, :math:`\textit{ntau} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ntau} \geq 1`. (`errno` :math:`131`) On entry, :math:`\mathrm{tau}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\sqrt{\epsilon } < \mathrm{tau}[\textit{l}-1] < 1-\sqrt{\epsilon }` where :math:`\epsilon` is the machine precision returned by :meth:`machine.precision <naginterfaces.library.machine.precision>`, for all :math:`\textit{ntau}`. (`errno` :math:`201`) On entry, either the option arrays have not been initialized or they have been corrupted. (`errno` :math:`221`) On entry, :math:`\mathrm{statecomm}`\ ['state'] vector has been corrupted or not initialized. **Warns** **NagAlgorithmicWarning** (`errno` :math:`231`) A potential problem occurred whilst fitting the model(s). Additional information has been returned in :math:`\mathrm{info}`. .. _g02qg-py2-py-notes: **Notes** Given a vector of :math:`n` observed values, :math:`y = \left\{y_i:i = 1,2,\ldots,n\right\}`, an :math:`n\times p` design matrix :math:`X`, a column vector, :math:`x`, of length :math:`p` holding the :math:`i`\ th row of :math:`X` and a quantile :math:`\tau \in \left(0,1\right)`, ``quantile_linreg`` estimates the :math:`p`-element vector :math:`\beta` as the solution to .. math:: \textit{minimize}_{{\beta \in \mathbb{R}^p}}\sum_{1}^{n}{\rho_{\tau }\left(y_i - x_i^\mathrm{T}\beta \right)} where :math:`\rho_{\tau }` is the piecewise linear loss function :math:`\rho_{\tau }\left(z\right) = z\left(\tau -I\left(z < 0\right)\right)`, and :math:`I\left(z < 0\right)` is an indicator function taking the value :math:`1` if :math:`z < 0` and :math:`0` otherwise. Weights can be incorporated by replacing :math:`X` and :math:`y` with :math:`WX` and :math:`Wy` respectively, where :math:`W` is an :math:`n\times n` diagonal matrix. Observations with zero weights can either be included or excluded from the analysis; this is in contrast to least squares regression where such observations do not contribute to the objective function and are, therefore, always dropped. ``quantile_linreg`` uses the interior point algorithm of Portnoy and Koenker (1997), described briefly in `Algorithmic Details <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02qgf.html#algdetails>`__, to obtain the parameter estimates :math:`\hat{\beta }`, for a given value of :math:`\tau`. Under the assumption of Normally distributed errors, Koenker (2005) shows that the limiting covariance matrix of :math:`\hat{\beta }-\beta` has the form .. math:: \Sigma = \frac{{\tau \left(1-\tau \right)}}{n} {H_n}^{-1}J_n {H_n}^{-1} where :math:`J_n = n^{-1}\sum_{1}^{n}{x_i x_i^\mathrm{T}}` and :math:`H_n` is a function of :math:`\tau`, as described below. Given an estimate of the covariance matrix, :math:`\hat{\Sigma }`, lower (:math:`\hat{\beta }_L`) and upper (:math:`\hat{\beta }_U`) limits for an :math:`\left(100\times \alpha \right)\%` confidence interval can be calculated for each of the :math:`p` parameters, via .. math:: \hat{\beta }_{{Li}} = \hat{\beta }_i-t_{{n-p,\left(1+\alpha \right)/2}}\sqrt{\hat{\Sigma }_{{ii}}},\hat{\beta }_{{Ui}} = \hat{\beta }_i+t_{{n-p,\left(1+\alpha \right)/2}}\sqrt{\hat{\Sigma }_{{ii}}} where :math:`t_{{n-p,0.975}}` is the :math:`97.5` percentile of the Student's :math:`t` distribution with :math:`n-k` degrees of freedom, where :math:`k` is the rank of the cross-product matrix :math:`X^\mathrm{T}X`. Four methods for estimating the covariance matrix, :math:`\Sigma`, are available: (i) Independent, identically distributed (IID) errors Under an assumption of IID errors the asymptotic relationship for :math:`\Sigma` simplifies to .. math:: \Sigma = \frac{{\tau \left(1-\tau \right)}}{n}\left(s\left(\tau \right)\right)^2 \left(X^\mathrm{T}X\right)^{-1} where :math:`s` is the sparsity function. ``quantile_linreg`` estimates :math:`s\left(\tau \right)` from the residuals, :math:`r_i = y_i - x_i^\mathrm{T}\hat{\beta }` and a bandwidth :math:`h_n`. (#) Powell Sandwich Powell (1991) suggested estimating the matrix :math:`H_n` by a kernel estimator of the form .. math:: \hat{H}_n = \left(nc_n\right)^{-1}\sum_{1}^{n}{K\left(\frac{{r_i}}{c_n}\right)x_ix_i^\mathrm{T}} where :math:`K` is a kernel function and :math:`c_n` satisfies :math:`\textit{lim}_{{n→\infty }}c_n→0` and :math:`\textit{lim}_{{n→\infty }}\sqrt{n}c_n→\infty`. When the Powell method is chosen, ``quantile_linreg`` uses a Gaussian kernel (i.e., :math:`K = \phi`) and sets .. math:: c_n = \mathrm{min}\left(\sigma_r, {\left(q_{{r3}}-q_{{r1}}\right)/1.34}\right)\times \left(\Phi^{-1}\left(\tau +h_n\right)-\Phi^{-1}\left(\tau -h_n\right)\right) where :math:`h_n` is a bandwidth, :math:`\sigma_r,q_{{r1}}` and :math:`q_{{r3}}` are, respectively, the standard deviation and the :math:`25\%` and :math:`75\%` quantiles for the residuals, :math:`r_i`. (#) Hendricks--Koenker Sandwich Koenker (2005) suggested estimating the matrix :math:`H_n` using .. math:: \hat{H}_n = n^{-1}\sum_{1}^{n}{\left[\frac{{2h_n}}{{ x_i^\mathrm{T}\left(\hat{\beta }\left(\tau +h_n\right)-\hat{\beta }\left(\tau -h_n\right)\right)}}\right]x_ix_i^\mathrm{T}} where :math:`h_n` is a bandwidth and :math:`\hat{\beta }\left(\tau +h_n\right)` denotes the parameter estimates obtained from a quantile regression using the :math:`\left(\tau +h_n\right)`\ th quantile. Similarly with :math:`\hat{\beta }\left(\tau -h_n\right)`. (#) Bootstrap The last method uses bootstrapping to either estimate a covariance matrix or obtain confidence intervals for the parameter estimates directly. This method, therefore, does not assume Normally distributed errors. Samples of size :math:`n` are taken from the paired data :math:`\left\{y_i, x_i\right\}` (i.e., the independent and dependent variables are sampled together). A quantile regression is then fitted to each sample resulting in a series of bootstrap estimates for the model parameters, :math:`\beta`. A covariance matrix can then be calculated directly from this series of values. Alternatively, confidence limits, :math:`\hat{\beta }_L` and :math:`\hat{\beta }_U`, can be obtained directly from the :math:`\left(1-\alpha \right)/2` and :math:`\left(1+\alpha \right)/2` sample quantiles of the bootstrap estimates. Further details of the algorithms used to calculate the covariance matrices can be found in `Algorithmic Details <https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02qgf.html#algdetails>`__. All three asymptotic estimates of the covariance matrix require a bandwidth, :math:`h_n`. Two alternative methods for determining this are provided: (i) Sheather--Hall .. math:: h_n = \left(\frac{{1.5\left(\Phi^{-1}\left(\alpha_b\right)\phi \left(\Phi^{-1}\left(\tau \right)\right)\right)^2}}{{n\left(2\Phi^{-1}\left(\tau \right)+1\right)}}\right)^{\frac{1}{3}} for a user-supplied value :math:`\alpha_b`, (#) Bofinger .. math:: h_n = \left(\frac{{4.5\left(\phi \left(\Phi^{-1}\left(\tau \right)\right)\right)^4}}{{n\left(2\Phi^{-1}\left(\tau \right)+1\right)^2}}\right)^{\frac{1}{5}} ``quantile_linreg`` allows options to be supplied via the :math:`\mathrm{comm}`\ ['iopts'] and :math:`\mathrm{comm}`\ ['opts'] arrays (see :ref:`Other Parameters <g02qg-py2-py-other_params>` for details of the available options). Prior to calling ``quantile_linreg`` the option arrays, :math:`\mathrm{comm}`\ ['iopts'] and :math:`\mathrm{comm}`\ ['opts'] must be initialized by calling :meth:`optset` with :math:`\textit{optstr}` set to :math:`Initialize = \texttt{quantile_linreg}` (see :ref:`Other Parameters <g02qg-py2-py-other_params>` for details on the available options). If bootstrap confidence limits are required (:math:`\text{‘Interval Method'} = \texttt{'BOOTSTRAP XY'}`) then one of the random number initialization functions :meth:`rand.init_repeat <naginterfaces.library.rand.init_repeat>` (for a repeatable analysis) or :meth:`rand.init_nonrepeat <naginterfaces.library.rand.init_nonrepeat>` (for an unrepeatable analysis) must also have been previously called. .. _g02qg-py2-py-references: **References** Koenker, R, 2005, `Quantile Regression`, Econometric Society Monographs, Cambridge University Press, New York Mehrotra, S, 1992, `On the implementation of a primal-dual interior point method`, SIAM J. Optim. (2), 575--601 Nocedal, J and Wright, S J, 2006, `Numerical Optimization`, (2nd Edition), Springer Series in Operations Research, Springer, New York Portnoy, S and Koenker, R, 1997, `The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute error estimators`, Statistical Science (4), 279--300 Powell, J L, 1991, `Estimation of monotonic regression models under quantile restrictions`, `Nonparametric and Semiparametric Methods in Econometrics`, Cambridge University Press, Cambridge See Also -------- :meth:`naginterfaces.library.examples.correg.quantile_linreg_ex.main` """ raise NotImplementedError
[docs]def optset(optstr, comm): r""" ``optset`` either initializes or resets the option arrays or sets a single option for supported problem solving functions in submodule ``correg``. Currently, only :meth:`quantile_linreg` is supported. .. _g02zk-py2-py-doc: For full information please refer to the NAG Library document for g02zk https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02zkf.html .. _g02zk-py2-py-parameters: **Parameters** **optstr** : str A string identifying the option to be set. Initialize = :math:`\textit{function name}` Initialize the option arrays :math:`\mathrm{comm}`\ ['iopts'] and :math:`\mathrm{comm}`\ ['opts'] for use with function :math:`\textit{function name}`, where :math:`\textit{function name}` is the name of the problem solving function you wish to use. Defaults Resets all options to their default values. :math:`\textit{option} = \textit{optval}` See :ref:`Other Parameters for quantile_linreg <g02qg-py2-py-other_params>` for details of valid values for :math:`\textit{option}` and :math:`\textit{optval}`. The equals sign (:math:`=`) delimiter must be used to separate the :math:`\textit{option}` from its :math:`\textit{optval}` value. :math:`\mathrm{optstr}` is case insensitive. Each token in the :math:`\textit{option}` and :math:`\textit{optval}` component must be separated by at least one space. **comm** : dict, communication object, modified in place Communication structure. .. _g02zk-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) On entry, the :math:`\textit{option}` supplied in :math:`\mathrm{optstr}` was not recognized: :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`12`) On entry, the expected delimiter ':math:`=`' was not found in :math:`\mathrm{optstr}`: :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`13`) On entry, could not convert the specified :math:`\textit{optval}` to an integer: :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`13`) On entry, could not convert the specified :math:`\textit{optval}` to a real: :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`14`) On entry, attempting to initialize the option arrays but specified function name was not valid: :math:`\text{name} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`15`) On entry, the :math:`\textit{optval}` supplied for the integer option is not valid. :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`16`) On entry, the :math:`\textit{optval}` supplied for the real option is not valid. :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`17`) On entry, the :math:`\textit{optval}` supplied for the character option is not valid. :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`21`) On entry, either the option arrays have not been initialized or they have been corrupted. .. _g02zk-py2-py-notes: **Notes** ``optset`` has three purposes: to initialize option arrays, to reset all options to their default values or to set a single option to a user-supplied value. Options and their values are, in general, presented as a character string, :math:`\mathrm{optstr}`, of the form ':math:`\textit{option} = \textit{optval}`'; alphabetic characters can be supplied in either upper or lower case. Both :math:`\textit{option}` and :math:`\textit{optval}` may consist of one or more tokens separated by white space. The tokens that comprise :math:`\textit{optval}` will normally be either an integer, real or character value as defined in the description of the specific optional argument. In addition all options can take an :math:`\textit{optval}` DEFAULT which resets the option to its default value. It is imperative that option arrays are initialized before any options are set, before the relevant problem solving function is called and before any options are queried using :meth:`optget`. Information relating to available option names and their corresponding valid values is given in :ref:`Other Parameters for quantile_linreg <g02qg-py2-py-other_params>`. See Also -------- :meth:`naginterfaces.library.examples.correg.quantile_linreg_ex.main` """ raise NotImplementedError
[docs]def optget(optstr, comm): r""" ``optget`` is used to query the value of options available to supported problem solving functions in submodule ``correg``. Currently, only :meth:`quantile_linreg` is supported. .. _g02zl-py2-py-doc: For full information please refer to the NAG Library document for g02zl https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02zlf.html .. _g02zl-py2-py-parameters: **Parameters** **optstr** : str A string identifying the option whose current value is required. See :ref:`Other Parameters for quantile_linreg <g02qg-py2-py-other_params>` for information on valid options. In addition, the following is a valid option: Identify ``optget`` returns the function name supplied to :meth:`optset` when the option arrays :math:`\mathrm{comm}`\ ['iopts'] and :math:`\mathrm{comm}`\ ['opts'] were initialized. **comm** : dict, communication object Communication structure. This argument must have been initialized by a prior call to :meth:`quantile_linreg` or :meth:`optset`. **Returns** **optvalue** : dict The option-value ``dict``, with the following keys: ``'value'`` : float, int or str The value of the requested option. ``'annotation'`` : None or str Possible additional information about the option value. .. _g02zl-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) On entry, the :math:`\textit{option}` in :math:`\mathrm{optstr}` was not recognized: :math:`\mathrm{optstr} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`61`) On entry, either the option arrays have not been initialized or they have been corrupted. **Warns** **NagAlgorithmicWarning** (`errno` :math:`41`) On entry, :math:`\mathrm{optstr}` indicates a character option, but :math:`\textit{cvalue}` is too short to hold the stored value. The returned value will be truncated. .. _g02zl-py2-py-notes: **Notes** ``optget`` is used to query the current values of options. It is necessary to initalize option arrays using :meth:`optset` before any options are queried. Information on option names and whether these options are real, integer or character can be found in :ref:`Other Parameters for quantile_linreg <g02qg-py2-py-other_params>`. See Also -------- :meth:`naginterfaces.library.examples.correg.lmm_init_combine_ex.main` """ raise NotImplementedError