Source code for naginterfaces.library.smooth

# -*- coding: utf-8 -*-
r"""
Module Summary
--------------
Interfaces for the NAG Mark 30.3 `smooth` Chapter.

``smooth`` - Smoothing in Statistics

This module is concerned with methods for smoothing data.
Included are methods for density estimation, smoothing time series data, and statistical applications of splines.
These methods may also be viewed as nonparametric modelling.

Functionality Index
-------------------

**Compute smoothed data sequence**

  running median smoothers: :meth:`data_runningmedian`

**Fit cubic smoothing spline**

  smoothing parameter estimated: :meth:`fit_spline_parest`

  smoothing parameter given: :meth:`fit_spline`

**Kernel density estimation**

  Gaussian kernel, thread safe: :meth:`kerndens_gauss`

Reorder data to give ordered distinct observations: :meth:`data_order`

For full information please refer to the NAG Library document

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g10/g10intro.html
"""

# NAG Copyright 2017-2024.

[docs]def fit_spline(mode, x, y, rho, c, comm, wt=None): r""" ``fit_spline`` fits a cubic smoothing spline for a given smoothing parameter. .. _g10ab-py2-py-doc: For full information please refer to the NAG Library document for g10ab https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g10/g10abf.html .. _g10ab-py2-py-parameters: **Parameters** **mode** : str, length 1 Indicates in which mode the function is to be used. :math:`\mathrm{mode} = \texttt{'P'}` Initialization and fitting is performed. This partial fit can be used in an iterative weighted least squares context where the weights are changing at each call to ``fit_spline`` or when the coefficients are not required. :math:`\mathrm{mode} = \texttt{'Q'}` Fitting only is performed. Initialization must have been performed previously by a call to ``fit_spline`` with :math:`\mathrm{mode} = \texttt{'P'}`. This quick fit may be called repeatedly with different values of :math:`\mathrm{rho}` without re-initialization. :math:`\mathrm{mode} = \texttt{'F'}` Initialization and full fitting is performed and the function coefficients are calculated. **x** : float, array-like, shape :math:`\left(n\right)` The distinct and ordered values :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **y** : float, array-like, shape :math:`\left(n\right)` The values :math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **rho** : float :math:`\rho`, the smoothing parameter. **c** : float, array-like, shape :math:`\left(n-1, 3\right)` If :math:`\mathrm{mode} = \texttt{'Q'}`, :math:`\mathrm{c}` must be unaltered from the previous call to ``fit_spline`` with :math:`\mathrm{mode} = \texttt{'P'}`. Otherwise :math:`\mathrm{c}` need not be set. **comm** : dict, communication object, modified in place Communication structure. `On initial entry`: need not be set. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If :math:`\textit{weight} = \texttt{'W'}`, :math:`\mathrm{wt}` must contain the :math:`n` weights. Otherwise :math:`\mathrm{wt}` is not referenced and unit weights are assumed. **Returns** **yhat** : float, ndarray, shape :math:`\left(n\right)` The fitted values, :math:`\hat{y}_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **c** : float, ndarray, shape :math:`\left(n-1, 3\right)` If :math:`\mathrm{mode} = \texttt{'F'}`, :math:`\mathrm{c}` contains the spline coefficients. More precisely, the value of the spline at :math:`t` is given by :math:`\left(\left(\mathrm{c}[i-1,2]\times d+\mathrm{c}[i-1,1]\right)\times d+\mathrm{c}[i-1,0]\right)\times d+\hat{y}_i`, where :math:`x_i\leq t < x_{{i+1}}` and :math:`d = t-x_i`. If :math:`\mathrm{mode} = \texttt{'P'}` or :math:`\texttt{'Q'}`, :math:`\mathrm{c}` contains information that will be used in a subsequent call to ``fit_spline`` with :math:`\mathrm{mode} = \texttt{'Q'}`. **rss** : float The (weighted) residual sum of squares. **df** : float The residual degrees of freedom. **res** : float, ndarray, shape :math:`\left(n\right)` The (weighted) residuals, :math:`r_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **h** : float, ndarray, shape :math:`\left(n\right)` The leverages, :math:`h_{{\textit{i}\textit{i}}}`, for :math:`\textit{i} = 1,2,\ldots,n`. .. _g10ab-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{mode} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mode} = \texttt{'P'}`, :math:`\texttt{'Q'}` or :math:`\texttt{'F'}`. (`errno` :math:`1`) On entry, :math:`\mathrm{rho} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{rho}\geq 0.0`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n\geq 3`. (`errno` :math:`2`) On entry, at least one element of :math:`\mathrm{wt}\leq 0.0`. (`errno` :math:`3`) On entry, :math:`\mathrm{x}` is not a strictly ordered array. .. _g10ab-py2-py-notes: **Notes** ``fit_spline`` fits a cubic smoothing spline to a set of :math:`n` observations (:math:`x_i`, :math:`y_i`), for :math:`i = 1,2,\ldots,n`. The spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is unsuitable. Cubic smoothing splines arise as the unique real-valued solution function :math:`f`, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes: .. math:: \sum_{{i = 1}}^nw_i{\left(y_i-f\left(x_i\right)\right)}^2+\rho \int_{{-\infty }}^{\infty }{\left(f^{{\prime \prime }}\left(x\right)\right)}^2{dx}\text{,} where :math:`w_i` is the (optional) weight for the :math:`i`\ th observation and :math:`\rho` is the smoothing parameter. This criterion consists of two parts: the first measures the fit of the curve, and the second the smoothness of the curve. The value of the smoothing parameter :math:`\rho` weights these two aspects; larger values of :math:`\rho` give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be estimated see Hutchinson and de Hoog (1985) and Reinsch (1967). The fitted values, :math:`\hat{y} = \left(\hat{y}_1, \hat{y}_2, \ldots, \hat{y}_n\right)^\mathrm{T}`, and weighted residuals, :math:`r_i`, can be written as .. math:: \hat{y} = Hy\quad \text{ and }\quad r_i = \sqrt{w_i}\left(y_i-\hat{y}_i\right) for a matrix :math:`H`. The residual degrees of freedom for the spline is :math:`\mathrm{trace}\left(I-H\right)` and the diagonal elements of :math:`H`, :math:`h_{{ii}}`, are the leverages. The parameter :math:`\rho` can be chosen in a number of ways. The fit can be inspected for a number of different values of :math:`\rho`. Alternatively the degrees of freedom for the spline, which determines the value of :math:`\rho`, can be specified, or the (generalized) cross-validation can be minimized to give :math:`\rho`; see :meth:`fit_spline_parest` for further details. ``fit_spline`` requires the :math:`x_i` to be strictly increasing. If two or more observations have the same :math:`x_i`-value then they should be replaced by a single observation with :math:`y_i` equal to the (weighted) mean of the :math:`y` values and weight, :math:`w_i`, equal to the sum of the weights. This operation can be performed by :meth:`data_order`. The computation is split into three phases. (i) Compute matrices needed to fit spline. (#) Fit spline for a given value of :math:`\rho`. (#) Compute spline coefficients. When fitting the spline for several different values of :math:`\rho`, phase \(1) need only be carried out once and then phase \(2) repeated for different values of :math:`\rho`. If the spline is being fitted as part of an iterative weighted least squares procedure phases \(1) and \(2) have to be repeated for each set of weights. In either case, phase \(3) will often only have to be performed after the final fit has been computed. The algorithm is based on Hutchinson (1986). .. _g10ab-py2-py-references: **References** Hastie, T J and Tibshirani, R J, 1990, `Generalized Additive Models`, Chapman and Hall Hutchinson, M F, 1986, `Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines`, ACM Trans. Math. Software (12), 150--153 Hutchinson, M F and de Hoog, F R, 1985, `Smoothing noisy data with spline functions`, Numer. Math. (47), 99--106 Reinsch, C H, 1967, `Smoothing by spline functions`, Numer. Math. (10), 177--183 """ raise NotImplementedError
[docs]def fit_spline_parest(method, x, y, crit, wt=None, u=0.0, tol=0.0, maxcal=0): r""" ``fit_spline_parest`` estimates the values of the smoothing parameter and fits a cubic smoothing spline to a set of data. .. _g10ac-py2-py-doc: For full information please refer to the NAG Library document for g10ac https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g10/g10acf.html .. _g10ac-py2-py-parameters: **Parameters** **method** : str, length 1 Indicates whether the smoothing parameter is to be found by minimization of the CV or GCV functions, or by finding the smoothing parameter corresponding to a specified degrees of freedom value. :math:`\mathrm{method} = \texttt{'C'}` Cross-validation is used. :math:`\mathrm{method} = \texttt{'D'}` The degrees of freedom are specified. :math:`\mathrm{method} = \texttt{'G'}` Generalized cross-validation is used. **x** : float, array-like, shape :math:`\left(n\right)` The distinct and ordered values :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **y** : float, array-like, shape :math:`\left(n\right)` The values :math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **crit** : float If :math:`\mathrm{method} = \texttt{'D'}`, the required degrees of freedom for the spline. If :math:`\mathrm{method} = \texttt{'C'}` or :math:`\texttt{'G'}`, :math:`\mathrm{crit}` need not be set. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If :math:`\textit{weight} = \texttt{'W'}`, :math:`\mathrm{wt}` must contain the :math:`n` weights. Otherwise :math:`\mathrm{wt}` is not referenced and unit weights are assumed. **u** : float, optional The upper bound on the smoothing parameter. If :math:`\mathrm{u}\leq \mathrm{tol}`, :math:`\mathrm{u} = 1000.0` will be used instead. See `Further Comments <https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g10/g10acf.html#fcomments>`__ for details on how this argument is used. **tol** : float, optional The accuracy to which the smoothing parameter :math:`\mathrm{rho}` is required. :math:`\mathrm{tol}` should preferably be not much less than :math:`\sqrt{\epsilon }`, where :math:`\epsilon` is the machine precision. If :math:`\mathrm{tol} < \epsilon`, :math:`\mathrm{tol} = \sqrt{\epsilon }` will be used instead. **maxcal** : int, optional The maximum number of spline evaluations to be used in finding the value of :math:`\rho`. If :math:`\mathrm{maxcal} < 3`, :math:`\mathrm{maxcal} = 100` will be used instead. **Returns** **yhat** : float, ndarray, shape :math:`\left(n\right)` The fitted values, :math:`\hat{y}_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **c** : float, ndarray, shape :math:`\left(n-1, 3\right)` The spline coefficients. More precisely, the value of the spline approximation at :math:`t` is given by :math:`\left(\left(\mathrm{c}[i-1,2]\times d+\mathrm{c}[i-1,1]\right)\times d+\mathrm{c}[i-1,0]\right)\times d+\hat{y}_i`, where :math:`x_i\leq t < x_{{i+1}}` and :math:`d = t-x_i`. **rss** : float The (weighted) residual sum of squares. **df** : float The residual degrees of freedom. If :math:`\mathrm{method} = \texttt{'D'}` this will be :math:`n-\mathrm{crit}` to the required accuracy. **res** : float, ndarray, shape :math:`\left(n\right)` The (weighted) residuals, :math:`r_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **h** : float, ndarray, shape :math:`\left(n\right)` The leverages, :math:`h_{{\textit{i}\textit{i}}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **crit** : float If :math:`\mathrm{method} = \texttt{'C'}`, the value of the cross-validation, or if :math:`\mathrm{method} = \texttt{'G'}`, the value of the generalized cross-validation function, evaluated at the value of :math:`\rho` returned in :math:`\mathrm{rho}`. **rho** : float The smoothing parameter, :math:`\rho`. .. _g10ac-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{crit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{method} = \texttt{'D'}`, :math:`\mathrm{crit}\leq n`. (`errno` :math:`1`) On entry, :math:`\mathrm{crit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: if :math:`\mathrm{method} = \texttt{'D'}`, :math:`\mathrm{crit} > 2.0`. (`errno` :math:`1`) On entry, :math:`\mathrm{method}` is not valid: :math:`\mathrm{method} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n\geq 3`. (`errno` :math:`2`) On entry, at least one element of :math:`\mathrm{wt}\leq 0.0`. (`errno` :math:`3`) On entry, :math:`\mathrm{x}` is not a strictly ordered array. (`errno` :math:`4`) For the specified degrees of freedom, :math:`\mathrm{rho} > \mathrm{u}`: :math:`\mathrm{u} = \langle\mathit{\boldsymbol{value}}\rangle`. **Warns** **NagAlgorithmicWarning** (`errno` :math:`5`) Accuracy of :math:`\mathrm{tol}` cannot be achieved: :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`6`) :math:`\mathrm{maxcal}` iterations have been performed. (`errno` :math:`7`) Optimum value of :math:`\mathrm{rho}` lies above :math:`\mathrm{u}`: :math:`\mathrm{u} = \langle\mathit{\boldsymbol{value}}\rangle`. .. _g10ac-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` For a set of :math:`n` observations :math:`\left(x_{\textit{i}}, y_{\textit{i}}\right)`, for :math:`\textit{i} = 1,2,\ldots,n`, the spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is not suitable. Cubic smoothing splines arise as the unique real-valued solution function :math:`f`, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes .. math:: \sum_{{i = 1}}^nw_i{\left(y_i-f\left(x_i\right)\right)}^2+\rho \int_{{-\infty }}^{\infty }{\left(f^{{\prime \prime }}\left(x\right)\right)}^2{dx}\text{,} where :math:`w_i` is the (optional) weight for the :math:`i`\ th observation and :math:`\rho` is the smoothing parameter. This criterion consists of two parts: the first measures the fit of the curve and the second the smoothness of the curve. The value of the smoothing parameter :math:`\rho` weights these two aspects; larger values of :math:`\rho` give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be fitted see Hutchinson and de Hoog (1985) and Reinsch (1967). The fitted values, :math:`\hat{y} = \left(\hat{y}_1, \hat{y}_2, \ldots, \hat{y}_n\right)^\mathrm{T}`, and weighted residuals, :math:`r_i`, can be written as: .. math:: \hat{y} = Hy\quad \text{ and }\quad r_i = \sqrt{w_i}\left(y_i-\hat{y}_i\right) for a matrix :math:`H`. The residual degrees of freedom for the spline is :math:`\mathrm{trace}\left(I-H\right)` and the diagonal elements of :math:`H` are the leverages. The parameter :math:`\rho` can be estimated in a number of ways. (i) The degrees of freedom for the spline can be specified, i.e., find :math:`\rho` such that :math:`\mathrm{trace}\left(H\right) = \nu_0` for given :math:`\nu_0`. (#) Minimize the cross-validation (CV), i.e., find :math:`\rho` such that the CV is minimized, where .. math:: \mathrm{CV} = \frac{1}{{\sum_{{i = 1}}^nw_i}}{\sum_{{i = 1}}^n}\left[\frac{r_i}{{1-h_{{ii}}}}\right]^2\text{.} (#) Minimize the generalized cross-validation (GCV), i.e., find :math:`\rho` such that the GCV is minimized, where .. math:: \mathrm{GCV} = \frac{n^2}{{{\sum_{{i = 1}}^n}w_i}}\left[\frac{{{\sum_{{i = 1}}^n}r_i^2}}{{\left({\sum_{{i = 1}}^n}\left(1-h_{{ii}}\right)\right)^2}}\right]\text{.} ``fit_spline_parest`` requires the :math:`x_i` to be strictly increasing. If two or more observations have the same :math:`x_i` value then they should be replaced by a single observation with :math:`y_i` equal to the (weighted) mean of the :math:`y` values and weight, :math:`w_i`, equal to the sum of the weights. This operation can be performed by :meth:`data_order`. The algorithm is based on Hutchinson (1986). :meth:`roots.contfn_brent_rcomm <naginterfaces.library.roots.contfn_brent_rcomm>` is used to solve for :math:`\rho` given :math:`\nu_0` and the method of :meth:`opt.one_var_func <naginterfaces.library.opt.one_var_func>` is used to minimize the GCV or CV. .. _g10ac-py2-py-references: **References** Hastie, T J and Tibshirani, R J, 1990, `Generalized Additive Models`, Chapman and Hall Hutchinson, M F, 1986, `Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines`, ACM Trans. Math. Software (12), 150--153 Hutchinson, M F and de Hoog, F R, 1985, `Smoothing noisy data with spline functions`, Numer. Math. (47), 99--106 Reinsch, C H, 1967, `Smoothing by spline functions`, Numer. Math. (10), 177--183 """ raise NotImplementedError
[docs]def kerndens_gauss(x, comm, wtype=2, window=1.0, slo=None, shi=None, ns=512): r""" ``kerndens_gauss`` performs kernel density estimation using a Gaussian kernel. .. _g10bb-py2-py-doc: For full information please refer to the NAG Library document for g10bb https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g10/g10bbf.html .. _g10bb-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n\right)` :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. On a continuation call, :math:`\mathrm{x}` must be unchanged since the last call to ``kerndens_gauss``. **comm** : dict, communication object, modified in place Communication structure. If not initialized on entry then the values of :math:`Y_l` are to be calculated by this call to ``kerndens_gauss``. Otherwise, this is a continuation call and it is assumed that the values of :math:`Y_l` were calculated by a previous call to this function and the relevant information is stored in :math:`\mathrm{comm}`\ ['rcomm']. **wtype** : int, optional How the window width, :math:`h`, is to be calculated: :math:`\mathrm{wtype} = 1` :math:`h` is supplied in :math:`\mathrm{window}`. :math:`\mathrm{wtype} = 2` :math:`h` is to be calculated from the data, with .. math:: h = m\times \left(\frac{{0.9\times \mathrm{min}\left({q_{75}-q_{25}}, \sigma \right)}}{n^{0.2}}\right) where :math:`q_{75}-q_{25}` is the inter-quartile range and :math:`\sigma` the standard deviation of the sample, :math:`x`, and :math:`m` is a multipler supplied in :math:`\mathrm{window}`. The :math:`25\%` and :math:`75\%` quartiles, :math:`q_{25}` and :math:`q_{75}`, are calculated using :meth:`stat.quantiles <naginterfaces.library.stat.quantiles>`. This is the 'rule-of-thumb' suggested by Silverman (1990). **window** : float, optional If :math:`\mathrm{wtype} = 1`, then :math:`h`, the window width. Otherwise, :math:`m`, the multiplier used in the calculation of :math:`h`. **slo** : None or float, optional Note: if this argument is **None** then a default value will be used, determined as follows: :math:`3.0`. If :math:`\mathrm{slo} < \mathrm{shi}` then :math:`a`, the lower limit of the interval on which the estimate is calculated. Otherwise, :math:`a` and :math:`b`, the lower and upper limits of the interval, are calculated as follows: .. math:: \begin{array}{ccc}a& = &\textit{min}_i\left\{x_i\right\}-\mathrm{slo}\times h\\b& = &\textit{max}_i\left\{x_i\right\}+\mathrm{slo}\times h\end{array} where :math:`h` is the window width. For most applications :math:`a` should be at least three window widths below the lowest data point. On a continuation call, a supplied :math:`\mathrm{slo}` will be ignored and the appropriate value from the previous call to ``kerndens_gauss`` will be extracted from :math:`\mathrm{comm}`\ ['rcomm']. **shi** : None or float, optional Note: if this argument is **None** then a default value will be used, determined as follows: :math:`0.0`. If :math:`\mathrm{slo} < \mathrm{shi}` then :math:`b`, the upper limit of the interval on which the estimate is calculated. Otherwise a value for :math:`b` is calculated from the data as stated in the description of :math:`\mathrm{slo}` and the value supplied in :math:`\mathrm{shi}` is not used. For most applications :math:`b` should be at least three window widths above the highest data point. On a continuation call, a supplied :math:`\mathrm{shi}` will be ignored and the appropriate value from the previous call to ``kerndens_gauss`` will be extracted from :math:`\mathrm{comm}`\ ['rcomm']. **ns** : int, optional :math:`n_s`, the number of points at which the estimate is calculated. On a continuation call, :math:`\mathrm{ns}` must be unchanged since the last call to ``kerndens_gauss``. **Returns** **window** : float :math:`h`, the window width actually used. **slo** : float :math:`a`, the lower limit actually used. **shi** : float :math:`b`, the upper limit actually used. **smooth** : float, ndarray, shape :math:`\left(\mathrm{ns}\right)` :math:`\hat{f}\left(t_{\textit{l}}\right)`, for :math:`\textit{l} = 1,2,\ldots,n_s`, the :math:`n_s` values of the density estimate. **t** : float, ndarray, shape :math:`\left(\mathrm{ns}\right)` :math:`t_{\textit{l}}`, for :math:`\textit{l} = 1,2,\ldots,n_s`, the points at which the estimate is calculated. .. _g10bb-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`11`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 0`. (`errno` :math:`12`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. On entry at previous call, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: on a continuation call, :math:`\textit{n}` must be unchanged since previous call. (`errno` :math:`31`) On entry, :math:`\mathrm{wtype} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{wtype} = 1` or :math:`2`. (`errno` :math:`41`) On entry, :math:`\mathrm{window} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{window} > 0.0`. (`errno` :math:`51`) On entry, :math:`\mathrm{slo} = \langle\mathit{\boldsymbol{value}}\rangle`. On exit from previous call, :math:`\mathrm{slo} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: on a continuation call, :math:`\mathrm{slo}` must be the same value returned by the previous call. (`errno` :math:`62`) On entry, :math:`\mathrm{shi} = \langle\mathit{\boldsymbol{value}}\rangle`. On exit from previous call, :math:`\mathrm{shi} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: on a continuation call, :math:`\mathrm{shi}` must be the same value returned by the previous call. (`errno` :math:`71`) On entry, :math:`\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ns}\geq 2`. (`errno` :math:`74`) On entry, :math:`\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle`. On entry at previous call, :math:`\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: on a continuation call, :math:`\mathrm{ns}` must be unchanged since previous call. (`errno` :math:`111`) :math:`\mathrm{comm}`\ ['rcomm'] has been corrupted between calls. **Warns** **NagAlgorithmicWarning** (`errno` :math:`61`) On entry, :math:`\mathrm{slo} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{shi} = \langle\mathit{\boldsymbol{value}}\rangle`. On entry, :math:`\mathrm{min}\left(\mathrm{x}\right) = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{max}\left(\mathrm{x}\right) = \langle\mathit{\boldsymbol{value}}\rangle`. Expected values of at least :math:`\langle\mathit{\boldsymbol{value}}\rangle` and :math:`\langle\mathit{\boldsymbol{value}}\rangle` for :math:`\mathrm{slo}` and :math:`\mathrm{shi}`. .. _g10bb-py2-py-notes: **Notes** Given a sample of :math:`n` observations, :math:`x_1,x_2,\ldots,x_n`, from a distribution with unknown density function, :math:`f\left(x\right)`, an estimate of the density function, :math:`\hat{f}\left(x\right)`, may be required. The simplest form of density estimator is the histogram. This may be defined by: .. math:: \hat{f}\left(x\right) = \frac{1}{{nh}}n_j\text{, }\quad a+\left(j-1\right)h < x < a+jh\text{, }\quad j = 1,2,\ldots,n_s\text{,} where :math:`n_j` is the number of observations falling in the interval :math:`a+\left(j-1\right)h` to :math:`a+jh`, :math:`a` is the lower bound to the histogram, :math:`b = n_sh` is the upper bound and :math:`n_s` is the total number of intervals. The value :math:`h` is known as the window width. To produce a smoother density estimate a kernel method can be used. A kernel function, :math:`K\left(t\right)`, satisfies the conditions: .. math:: \int_{{-\infty }}^{\infty }K\left(t\right){dt} = 1\quad \text{ and }\quad K\left(t\right)\geq 0\text{.} The kernel density estimator is then defined as .. math:: \hat{f}\left(x\right) = \frac{1}{{nh}}\sum_{{i = 1}}^nK\left(\frac{{x-x_i}}{h}\right)\text{.} The choice of :math:`K` is usually not important but to ease the computational burden use can be made of the Gaussian kernel defined as .. math:: K\left(t\right) = \frac{1}{\sqrt{2\pi }}e^{{-t^2/2}}\text{.} The smoothness of the estimator depends on the window width :math:`h`. The larger the value of :math:`h` the smoother the density estimate. The value of :math:`h` can be chosen by examining plots of the smoothed density for different values of :math:`h` or by using cross-validation methods (see Silverman (1990)). Silverman (1982) and Silverman (1990) show how the Gaussian kernel density estimator can be computed using a fast Fourier transform (FFT). In order to compute the kernel density estimate over the range :math:`a` to :math:`b` the following steps are required. (i) Discretize the data to give :math:`n_s` equally spaced points :math:`t_l` with weights :math:`\xi_l` (see Jones and Lotwick (1984)). (#) Compute the FFT of the weights :math:`\xi_l` to give :math:`Y_l`. (#) Compute :math:`\zeta_l = e^{{-\frac{1}{2}h^2s_l^2}}Y_l` where :math:`s_l = 2\pi l/\left(b-a\right)`. (#) Find the inverse FFT of :math:`\zeta_l` to give :math:`\hat{f}\left(x\right)`. To compute the kernel density estimate for further values of :math:`h` only steps \(iii) and (iv) need be repeated. .. _g10bb-py2-py-references: **References** Jones, M C and Lotwick, H W, 1984, `Remark AS R50. A remark on algorithm AS 176. Kernel density estimation using the Fast Fourier Transform`, Appl. Statist. (33), 120--122 Silverman, B W, 1982, `Algorithm AS 176. Kernel density estimation using the fast Fourier transform`, Appl. Statist. (31), 93--99 Silverman, B W, 1990, `Density Estimation`, Chapman and Hall """ raise NotImplementedError
[docs]def data_runningmedian(itype, y): r""" ``data_runningmedian`` computes a smoothed data sequence using running median smoothers. .. _g10ca-py2-py-doc: For full information please refer to the NAG Library document for g10ca https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g10/g10caf.html .. _g10ca-py2-py-parameters: **Parameters** **itype** : int Specifies the method to be used. If :math:`\mathrm{itype} = 0`, 4253H,twice is used. If :math:`\mathrm{itype} = 1`, 3RSSH,twice is used. **y** : float, array-like, shape :math:`\left(n\right)` The sample observations. **Returns** **smooth** : float, ndarray, shape :math:`\left(n\right)` Contains the smooth. **rough** : float, ndarray, shape :math:`\left(n\right)` Contains the rough. .. _g10ca-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{itype} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{itype} = 0` or :math:`1`. (`errno` :math:`2`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n > 6`. .. _g10ca-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` Given a sequence of :math:`n` observations recorded at equally spaced intervals, ``data_runningmedian`` fits a smooth curve through the data using one of two smoothers. The two smoothers are based on the use of running medians and averages to summarise overlapping segments. The fit and the residuals are called the smooth and the rough respectively. They obey the following: .. math:: \mathrm{Data} = \mathrm{Smooth}+\mathrm{Rough}\text{.} The two smoothers are: (1) 4253H,twice consisting of a running median of :math:`4`, then :math:`2`, then :math:`5`, then :math:`3` followed by hanning. Hanning is a running weighted average, the weights being :math:`1/4`, :math:`1/2` and :math:`1/4`. The result of this smoothing is then reroughed by computing residuals, applying the same smoother to them and adding the result to the smooth of the first pass. (#) 3RSSH,twice consisting of a running median of :math:`3`, two splitting operations named S to improve the smooth sequence, each of which is followed by a running median of :math:`3`, and finally hanning. The end points are dealt with using the method described by Velleman and Hoaglin (1981). The full smoother 3RSSH,twice is produced by reroughing as described above. The compound smoother 4253H,twice is recommended. The smoother 3RSSH,twice is popular when calculating by hand as it requires simpler computations and is included for comparison purposes. .. _g10ca-py2-py-references: **References** Tukey, J W, 1977, `Exploratory Data Analysis`, Addison--Wesley Velleman, P F and Hoaglin, D C, 1981, `Applications, Basics, and Computing of Exploratory Data Analysis`, Duxbury Press, Boston, MA """ raise NotImplementedError
[docs]def data_order(x, y, wt=None): r""" ``data_order`` orders and weights data which is entered unsequentially, weighted or unweighted. .. _g10za-py2-py-doc: For full information please refer to the NAG Library document for g10za https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g10/g10zaf.html .. _g10za-py2-py-parameters: **Parameters** **x** : float, array-like, shape :math:`\left(n\right)` The values, :math:`x_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **y** : float, array-like, shape :math:`\left(n\right)` The values :math:`y_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. **wt** : None or float, array-like, shape :math:`\left(n\right)`, optional If :math:`\textit{weight} = \texttt{'W'}`, :math:`\mathrm{wt}` must contain the :math:`n` weights. Otherwise :math:`\mathrm{wt}` is not referenced and unit weights are assumed. **Returns** **nord** : int The number of distinct observations. **xord** : float, ndarray, shape :math:`\left(n\right)` The first :math:`\mathrm{nord}` elements contain the ordered and distinct :math:`x_i`. **yord** : float, ndarray, shape :math:`\left(n\right)` The first :math:`\mathrm{nord}` elements contain the values :math:`y^{\prime }` corresponding to the values in :math:`\mathrm{xord}`. **wtord** : float, ndarray, shape :math:`\left(n\right)` The first :math:`\mathrm{nord}` elements contain the values :math:`w^{\prime }` corresponding to the values of :math:`\mathrm{xord}` and :math:`\mathrm{yord}`. **rss** : float The within group sum of squares for tied observations. .. _g10za-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{weight} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{weight} = \texttt{'W'}` or :math:`\texttt{'U'}`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n\geq 1`. (`errno` :math:`2`) On entry, all weights are zero. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{wt}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{wt}[i-1] \geq 0.0`. .. _g10za-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` Given a set of observations :math:`\left(x_i, y_i\right)`, for :math:`i = 1,2,\ldots,n`, with corresponding weights :math:`w_i`, ``data_order`` rearranges the observations so that the :math:`x_i` are in ascending order. For any equal :math:`x_i` in the ordered set, say :math:`x_j = x_{{j+1}} = \cdots = x_{{j+k}}`, a single observation :math:`x_j` is returned with a corresponding :math:`y^{\prime }` and :math:`w^{\prime }`, calculated as .. math:: w^{\prime } = \sum_{{l = 0}}^kw_{{i+l}} and .. math:: y^{\prime } = \frac{{\sum_{{l = 0}}^kw_{{i+l}}y_{{i+l}}}}{{w^{\prime }}}\text{.} Observations with zero weight are ignored. If no weights are supplied by you, then unit weights are assumed; that is :math:`w_{\textit{i}} = 1`, for :math:`\textit{i} = 1,2,\ldots,n`. In addition, the within group sum of squares is computed for the tied observations using West's algorithm (see West (1979)). .. _g10za-py2-py-references: **References** Draper, N R and Smith, H, 1985, `Applied Regression Analysis`, (2nd Edition), Wiley West, D H D, 1979, `Updating mean and variance estimates: An improved method`, Comm. ACM (22), 532--555 """ raise NotImplementedError