naginterfaces.library.univar.robust_1var_mestim¶

naginterfaces.library.univar.robust_1var_mestim(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol, maxit=50)[source]¶

robust_1var_mestim computes an $M$ -estimate of location with (optional) simultaneous estimation of the scale using Huber’s algorithm.

For full information please refer to the NAG Library document for g07db

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g07/g07dbf.html

Parameters

isigmaint

The value assigned to $i s i g m a$ determines whether $^σ$ is to be simultaneously estimated.

$i s i g m a = 0$

The estimation of $^σ$ is bypassed and $s i g m a$ is set equal to $σ_{c}$ .

$i s i g m a = 1$

$^σ$ is estimated simultaneously.

xfloat, array-like, shape $(n)$

The vector of observations, $x_{1}, x_{2}, \dots, x_{n}$ .

ipsiint

Which $ψ$ function is to be used.

$i p s i = 0$

$ψ (t) = t$ .

$i p s i = 1$

Huber’s function.

$i p s i = 2$

Hampel’s piecewise linear function.

$i p s i = 3$

Andrew’s sine wave,

$i p s i = 4$

Tukey’s bi-weight.

cfloat

If $i p s i = 1$ , $c$ must specify the parameter, $c$ , of Huber’s $ψ$ function. $c$ is not referenced if $i p s i \neq 1$ .

h1float

If $i p s i = 2$ , $h 1$ , $h 2$ and $h 3$ must specify the parameters, $h_{1}$ , $h_{2}$ , and $h_{3}$ , of Hampel’s piecewise linear $ψ$ function. $h 1$ , $h 2$ and $h 3$ are not referenced if $i p s i \neq 2$ .

h2float

If $i p s i = 2$ , $h 1$ , $h 2$ and $h 3$ must specify the parameters, $h_{1}$ , $h_{2}$ , and $h_{3}$ , of Hampel’s piecewise linear $ψ$ function. $h 1$ , $h 2$ and $h 3$ are not referenced if $i p s i \neq 2$ .

h3float

If $i p s i = 2$ , $h 1$ , $h 2$ and $h 3$ must specify the parameters, $h_{1}$ , $h_{2}$ , and $h_{3}$ , of Hampel’s piecewise linear $ψ$ function. $h 1$ , $h 2$ and $h 3$ are not referenced if $i p s i \neq 2$ .

dchifloat

$d$ , the parameter of the $χ$ function. $d c h i$ is not referenced if $i p s i = 0$ .

thetafloat

If $s i g m a > 0$ then $t h e t a$ must be set to the required starting value of the estimation of the location parameter $^θ$ . A reasonable initial value for $^θ$ will often be the sample mean or median.

sigmafloat

The role of $s i g m a$ depends on the value assigned to $i s i g m a$ , as follows:

if $i s i g m a = 1$ , $s i g m a$ must be assigned a value which determines the values of the starting points for the calculations of $^θ$ and $^σ$ . If $s i g m a \leq 0.0$ then robust_1var_mestim will determine the starting points of $^θ$ and $^σ$ . Otherwise the value assigned to $s i g m a$ will be taken as the starting point for $^σ$ , and $t h e t a$ must be assigned a value before entry, see above;

if $i s i g m a = 0$ , $s i g m a$ must be assigned a value which determines the value of $σ_{c}$ , which is held fixed during the iterations, and the starting value for the calculation of $^θ$ . If $s i g m a \leq 0$ , robust_1var_mestim will determine the value of $σ_{c}$ as the median absolute deviation adjusted to reduce bias (see robust_1var_median()) and the starting point for $^θ$ . Otherwise, the value assigned to $s i g m a$ will be taken as the value of $σ_{c}$ and $t h e t a$ must be assigned a relevant value before entry, see above.

tolfloat

The relative precision for the final estimates. Convergence is assumed when the increments for $t h e t a$ , and $s i g m a$ are less than $t o l \times m a x (1.0, σ_{k - 1})$ .

maxitint, optional

The maximum number of iterations that should be used during the estimation.

Returns

thetafloat: The $M$ -estimate of the location parameter, $^θ$ .
sigmafloat: Contains the $M$ -estimate of the scale parameter, $^σ$ , if $i s i g m a$ was assigned the value $1$ on entry, otherwise $s i g m a$ will contain the initial fixed value $σ_{c}$ .
rsfloat, ndarray, shape $(n)$: The Winsorized residuals.
nitint: The number of iterations that were used during the estimation.
wrkfloat, ndarray, shape $(n)$: If $s i g m a \leq 0.0$ on entry, $w r k$ will contain the $n$ observations in ascending order.

Raises

NagValueError

(errno $1$ )

On entry, $i p s i = ⟨ v a l u e ⟩$ .

Constraint: $i p s i = 0$ , $1$ , $2$ , $3$ or $4$ .

(errno $1$ )

On entry, $i s i g m a = ⟨ v a l u e ⟩$ .

Constraint: $i s i g m a = 0$ or $1$ .

(errno $1$ )

On entry, $m a x i t = ⟨ v a l u e ⟩$ .

Constraint: $m a x i t > 0$ .

(errno $1$ )

On entry, $t o l = ⟨ v a l u e ⟩$ .

Constraint: $t o l > 0.0$ .

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n > 1$ .

(errno $2$ )

On entry, $d c h i = ⟨ v a l u e ⟩$ .

Constraint: $d c h i > 0.0$ .

(errno $2$ )

On entry, $h 1 = ⟨ v a l u e ⟩$ , $h 2 = ⟨ v a l u e ⟩$ and $h 3 = ⟨ v a l u e ⟩$ .

Constraint: $0 \leq h 1 \leq h 2 \leq h 3$ and $h 3 > 0.0$ .

(errno $2$ )

On entry, $c = ⟨ v a l u e ⟩$ .

Constraint: $c > 0.0$ .

(errno $3$ )

All elements of $x$ are equal.

(errno $4$ )

Current estimate of $s i g m a$ is zero or negative: $s i g m a = ⟨ v a l u e ⟩$ .

(errno $5$ )

Number of iterations required exceeds $m a x i t$ : $m a x i t = ⟨ v a l u e ⟩$ .

(errno $6$ )

All winsorized residuals are zero.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

The data consists of a sample of size $n$ , denoted by $x_{1}, x_{2}, \dots, x_{n}$ , drawn from a random variable $X$ .

The $x_{i}$ are assumed to be independent with an unknown distribution function of the form

F ((x_{i} - θ) / σ)

where $θ$ is a location parameter, and $σ$ is a scale parameter. $M$ -estimators of $θ$ and $σ$ are given by the solution to the following system of equations:

n \sum i = 1 ψ ((x_{i} -^θ) /^σ) = 0

n \sum i = 1 χ ((x_{i} -^θ) /^σ) = (n - 1) β

where $ψ$ and $χ$ are given functions, and $β$ is a constant, such that $^σ$ is an unbiased estimator when $x_{i}$ , for $i = 1, 2, \dots, n$ has a Normal distribution. Optionally, the second equation can be omitted and the first equation is solved for $^θ$ using an assigned value of $σ = σ_{c}$ .

The values of $ψ (\frac{x_{i} -^θ}{^σ})^σ$ are known as the Winsorized residuals.

The following functions are available for $ψ$ and $χ$ in robust_1var_mestim:

Null Weights

$ψ (t) = t$

$χ (t) = \frac{t^{2}}{2}$

Use of these null functions leads to the mean and standard deviation of the data.
Huber’s Function

$ψ (t) = m a x (- c, m i n (c, t))$

$χ (t) = \frac{{| t |}^{2}}{2}$

$| t | \leq d$

$χ (t) = \frac{d^{2}}{2}$

$| t | > d$

Hampel’s Piecewise Linear Function

$ψ_{h_{1}, h_{2}, h_{3}} (t) = - ψ_{h_{1}, h_{2}, h_{3}} (- t)$
$ψ_{h_{1}, h_{2}, h_{3}} (t) = t$	$0 \leq t \leq h_{1}$
$ψ_{h_{1}, h_{2}, h_{3}} (t) = h_{1}$	$h_{1} \leq t \leq h_{2}$
$ψ_{h_{1}, h_{2}, h_{3}} (t) = h_{1} (h_{3} - t) / (h_{3} - h_{2})$	$h_{2} \leq t \leq h_{3}$
$ψ_{h_{1}, h_{2}, h_{3}} (t) = 0$	$t > h_{3}$
$χ (t) = \frac{{\| t \|}^{2}}{2}$	$\| t \| \leq d$
$χ (t) = \frac{d^{2}}{2}$	$\| t \| > d$

Andrew’s Sine Wave Function

$ψ (t) = sin (t)$

$- π \leq t \leq π$

$ψ (t) = 0$

otherwise

$χ (t) = \frac{{| t |}^{2}}{2}$

$| t | \leq d$

$χ (t) = \frac{d^{2}}{2}$

$| t | > d$
Tukey’s Bi-weight

$ψ (t) = t {(1 - t^{2})}^{2}$

$| t | \leq 1$

$ψ (t) = 0$

otherwise

$χ (t) = \frac{{| t |}^{2}}{2}$

$| t | \leq d$

$χ (t) = \frac{d^{2}}{2}$

$| t | > d$

where $c$ , $h_{1}$ , $h_{2}$ , $h_{3}$ and $d$ are constants.

Equations (1) and (2) are solved by a simple iterative procedure suggested by Huber:

{^σ}_{k} = \sqrt{\frac{1}{β (n - 1)} (n \sum i = 1 χ (\frac{x_{i} - {^θ}_{k - 1}}{{^σ}_{k - 1}})) {^σ}_{k - 1}^{2}}

and

{^θ}_{k} = {^θ}_{k - 1} + \frac{1}{n} n \sum i = 1 ψ (\frac{x_{i} - {^θ}_{k - 1}}{{^σ}_{k}}) {^σ}_{k}

or

{^σ}_{k} = σ_{c}, if σ is fixed.

The initial values for $^θ$ and $^σ$ may either be user-supplied or calculated within robust_1var_mestim as the sample median and an estimate of $σ$ based on the median absolute deviation respectively.

robust_1var_mestim is based upon function LYHALG within the ROBETH library, see Marazzi (1987).

References

Hampel, F R, Ronchetti, E M, Rousseeuw, P J and Stahel, W A, 1986, Robust Statistics. The Approach Based on Influence Functions, Wiley

Huber, P J, 1981, Robust Statistics, Wiley

Marazzi, A, 1987, Subroutines for robust estimation of location and scale in ROBETH, Cah. Rech. Doc. IUMSP, No. 3 ROB 1, Institut Universitaire de Médecine Sociale et Préventive, Lausanne

NAG and Python

Return to Front

naginterfaces.library.univar.robust_1var_mestim¶

$ψ (t) = m a x (- c, m i n (c, t))$
$χ (t) = \frac{{\| t \|}^{2}}{2}$	$\| t \| \leq d$
$χ (t) = \frac{d^{2}}{2}$	$\| t \| > d$

$ψ (t) = sin (t)$	$- π \leq t \leq π$
$ψ (t) = 0$	otherwise
$χ (t) = \frac{{\| t \|}^{2}}{2}$	$\| t \| \leq d$
$χ (t) = \frac{d^{2}}{2}$	$\| t \| > d$

$ψ (t) = t {(1 - t^{2})}^{2}$	$\| t \| \leq 1$
$ψ (t) = 0$	otherwise
$χ (t) = \frac{{\| t \|}^{2}}{2}$	$\| t \| \leq d$
$χ (t) = \frac{d^{2}}{2}$	$\| t \| > d$

naginterfaces.library.univar.robust_​1var_​mestim¶

naginterfaces.library.univar.robust_1var_mestim¶