# NAG C Library Function Document

## 1Purpose

nag_robust_m_estim_1var (g07dbc) computes an $M$-estimate of location with (optional) simultaneous estimation of the scale using Huber's algorithm.

## 2Specification

 #include #include
 void nag_robust_m_estim_1var (Nag_SigmaSimulEst sigma_est, Integer n, const double x[], Nag_PsiFun psifun, double c, double h1, double h2, double h3, double dchi, double *theta, double *sigma, Integer maxit, double tol, double rs[], Integer *nit, double sorted_x[], NagError *fail)

## 3Description

The data consists of a sample of size $n$, denoted by ${x}_{1},{x}_{2},\dots ,{x}_{n}$, drawn from a random variable $X$.
The ${x}_{i}$ are assumed to be independent with an unknown distribution function of the form
 $F x i - θ / σ$
where $\theta$ is a location argument, and $\sigma$ is a scale argument. $M$-estimators of $\theta$ and $\sigma$ are given by the solution to the following system of equations:
 $∑ i=1 n ψ x i - θ ^ / σ ^ = 0$ (1)
 $∑ i=1 n χ x i - θ ^ / σ ^ = n-1 β$ (2)
where $\psi$ and $\chi$ are given functions, and $\beta$ is a constant, such that $\stackrel{^}{\sigma }$ is an unbiased estimator when ${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$, has a normal distribution. Optionally, the second equation can be omitted and the first equation is solved for $\stackrel{^}{\theta }$ using an assigned value of $\sigma ={\sigma }_{c}$.
The values of $\psi \left(\frac{{x}_{i}-\stackrel{^}{\theta }}{\stackrel{^}{\sigma }}\right)\stackrel{^}{\sigma }$ are known as the Winsorized residuals.
The following functions are available for $\psi$ and $\chi$ in nag_robust_m_estim_1var (g07dbc):
(a) Null Weights
 $\psi \left(t\right)$ $\text{}=t$ $\chi \left(t\right)=\frac{{t}^{2}}{2}$
Use of these null functions leads to the mean and standard deviation of the data.
(b) Huber's Function
 $\psi \left(t\right)$ $\text{}=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(-c,\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(c,t\right)\right)$ $\chi \left(t\right)=\frac{{‖t‖}^{2}}{2}‖t‖\le d$ $\chi \left(t\right)=\frac{{d}^{2}}{2}‖t‖>d$
(c) Hampel's Piecewise Linear Function
 ${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)$ $\text{}=-{\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(-t\right)$ $\text{}=t$ $0\le t\le {h}_{1}$ $\chi \left(t\right)=\frac{{\left|t\right|}^{2}}{2}\left|t\right|\le d$ $\text{}={h}_{1}$ ${h}_{1}\le t\le {h}_{2}$ $\text{}={h}_{1}\left({h}_{3}-t\right)/\left({h}_{3}-{h}_{2}\right)$ ${h}_{2}\le t\le {h}_{3}$ $\chi \left(t\right)=\frac{{d}^{2}}{2}\left|t\right|>d$ $\text{}=0$ $t>{h}_{3}$
(d) Andrew's Sine Wave Function
 $\psi \left(t\right)$ $\text{}=\mathrm{sin}t$ $-\pi \le t\le \pi$ $\chi \left(t\right)=\frac{{\left|t\right|}^{2}}{2}\left|t\right|\le d$ $\text{}=0$ otherwise $\chi \left(t\right)=\frac{{d}^{2}}{2}\left|t\right|>d$
(e) Tukey's Bi-weight
 $\psi \left(t\right)$ $\text{}=t{\left(1-{t}^{2}\right)}^{2}$ $\left|t\right|\le 1$ $\chi \left(t\right)=\frac{{\left|t\right|}^{2}}{2}\left|t\right|\le d$ $\text{}=0$ otherwise $\chi \left(t\right)=\frac{{d}^{2}}{2}\left|t\right|>d$
where $c$, ${h}_{1}$, ${h}_{2}$, ${h}_{3}$ and $d$ are constants.
Equations (1) and (2) are solved by a simple iterative procedure suggested by Huber:
 $σ ^ k = 1 β n-1 ∑ i=1 n χ x i - θ ^ k-1 σ ^ k-1 σ ^ k-1 2$
and
 $θ ^ k = θ ^ k-1 + 1 n ∑ i=1 n ψ x i - θ ^ k-1 σ ^ k σ ^ k$
or
 $σ ^ k = σ c , if ​ σ ​ is fixed.$
The initial values for $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$ may either be user-supplied or calculated within nag_robust_m_estim_1var (g07dbc) as the sample median and an estimate of $\sigma$ based on the median absolute deviation respectively.
nag_robust_m_estim_1var (g07dbc) is based upon subroutine LYHALG within the ROBETH library, see Marazzi (1987).

## 4References

Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

## 5Arguments

1:    $\mathbf{sigma_est}$Nag_SigmaSimulEstInput
On entry: the value assigned to sigma_est determines whether $\stackrel{^}{\sigma }$ is to be simultaneously estimated.
${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$
The estimation of $\stackrel{^}{\sigma }$ is bypassed and sigma is set equal to ${\sigma }_{c}$;
${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$
$\stackrel{^}{\sigma }$ is estimated simultaneously.
Constraint: ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$ or $\mathrm{Nag_SigmaSimul}$.
2:    $\mathbf{n}$IntegerInput
On entry: the number of observations, $n$.
Constraint: ${\mathbf{n}}>1$.
3:    $\mathbf{x}\left[{\mathbf{n}}\right]$const doubleInput
On entry: the vector of observations, ${x}_{1},{x}_{2},\dots ,{x}_{n}$.
4:    $\mathbf{psifun}$Nag_PsiFunInput
On entry: which $\psi$ function is to be used.
${\mathbf{psifun}}=\mathrm{Nag_Lsq}$
 $ψ t = t .$
${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$
Huber's function.
${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$
Hampel's piecewise linear function.
${\mathbf{psifun}}=\mathrm{Nag_AndrewFun}$
Andrew's sine wave.
${\mathbf{psifun}}=\mathrm{Nag_TukeyFun}$
Tukey's bi-weight.
Constraint: ${\mathbf{psifun}}=\mathrm{Nag_Lsq}$, $\mathrm{Nag_HuberFun}$, $\mathrm{Nag_HampelFun}$, $\mathrm{Nag_AndrewFun}$ or $\mathrm{Nag_TukeyFun}$.
5:    $\mathbf{c}$doubleInput
On entry: must specify the argument, $c$, of Huber's $\psi$ function, if ${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$. c is not referenced if ${\mathbf{psifun}}\ne \mathrm{Nag_HuberFun}$.
Constraint: ${\mathbf{c}}>0.0$ if ${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$.
6:    $\mathbf{h1}$doubleInput
7:    $\mathbf{h2}$doubleInput
8:    $\mathbf{h3}$doubleInput
On entry: h1, h2, and h3 must specify the arguments ${h}_{1}$, ${h}_{2}$, and ${h}_{3}$, of Hampel's piecewise linear $\psi$ function, if ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$. h1, h2, and h3 are not referenced if ${\mathbf{psifun}}\ne \mathrm{Nag_HampelFun}$.
Constraint: $0\le {\mathbf{h1}}\le {\mathbf{h2}}\le {\mathbf{h3}}$ and ${\mathbf{h3}}>0.0$ if ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
9:    $\mathbf{dchi}$doubleInput
On entry: the argument, $d$, of the $\chi$ function. dchi is not referenced if ${\mathbf{psifun}}=\mathrm{Nag_Lsq}$.
Constraint: ${\mathbf{dchi}}>0.0$ if ${\mathbf{psifun}}\ne \mathrm{Nag_Lsq}$.
10:  $\mathbf{theta}$double *Input/Output
On entry: if ${\mathbf{sigma}}>0$ then theta must be set to the required starting value of the estimation of the location argument $\stackrel{^}{\theta }$. A reasonable initial value for $\stackrel{^}{\theta }$ will often be the sample mean or median.
On exit: the $M$-estimate of the location argument, $\stackrel{^}{\theta }$.
11:  $\mathbf{sigma}$double *Input/Output
The role of sigma depends on the value assigned to sigma_est (see above) as follows.
If ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$:
On entry: sigma must be assigned a value which determines the values of the starting points for the calculations of $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$. If ${\mathbf{sigma}}\le 0.0$ then nag_robust_m_estim_1var (g07dbc) will determine the starting points of $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$. Otherwise the value assigned to sigma will be taken as the starting point for $\stackrel{^}{\sigma }$, and theta must be assigned a value before entry, see above.
If ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$:
On entry: sigma must be assigned a value which determines the value of ${\sigma }_{c}$, which is held fixed during the iterations, and the starting value for the calculation of $\stackrel{^}{\theta }$. If ${\mathbf{sigma}}\le 0$, then nag_robust_m_estim_1var (g07dbc) will determine the value of ${\sigma }_{c}$ as the median absolute deviation adjusted to reduce bias (see G07DAF) and the starting point for $\stackrel{^}{\theta }$. Otherwise, the value assigned to sigma will be taken as the value of ${\sigma }_{c}$ and theta must be assigned a relevant value before entry, see above.
On exit: sigma contains the $M$ – estimate of the scale argument, $\stackrel{^}{\sigma }$, if ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$ on entry, otherwise sigma will contain the initial fixed value ${\sigma }_{c}$.
12:  $\mathbf{maxit}$IntegerInput
On entry: the maximum number of iterations that should be used during the estimation.
Suggested value: p ${\mathbf{maxit}}=50$.
Constraint: ${\mathbf{maxit}}>0$.
13:  $\mathbf{tol}$doubleInput
On entry: the relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than ${\mathbf{tol}}×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1.0,{\sigma }_{k-1}\right)$.
Constraint: ${\mathbf{tol}}>0.0$.
14:  $\mathbf{rs}\left[{\mathbf{n}}\right]$doubleOutput
On exit: the Winsorized residuals.
15:  $\mathbf{nit}$Integer *Output
On exit: the number of iterations that were used during the estimation.
16:  $\mathbf{sorted_x}\left[{\mathbf{n}}\right]$doubleOutput
On exit: if ${\mathbf{sigma}}\le 0.0$ on entry, sorted_x will contain the $n$ observations in ascending order.
17:  $\mathbf{fail}$NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

## 6Error Indicators and Warnings

NE_2_REAL_ENUM_ARG_CONS
On entry, ${\mathbf{h1}}=〈\mathit{\text{value}}〉$, ${\mathbf{h2}}=〈\mathit{\text{value}}〉$ and ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{h1}}\le {\mathbf{h2}}$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
On entry, ${\mathbf{h1}}=〈\mathit{\text{value}}〉$, ${\mathbf{h3}}=〈\mathit{\text{value}}〉$ and ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{h1}}\le {\mathbf{h3}}$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
On entry, ${\mathbf{h2}}=〈\mathit{\text{value}}〉$, ${\mathbf{h3}}=〈\mathit{\text{value}}〉$ and ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{h2}}\le {\mathbf{h3}}$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
NE_3_REAL_ENUM_ARG_CONS
On entry, ${\mathbf{h1}}=〈\mathit{\text{value}}〉$, ${\mathbf{h2}}=〈\mathit{\text{value}}〉$, ${\mathbf{h3}}=〈\mathit{\text{value}}〉$, ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{h1}}={\mathbf{h2}}$=${\mathbf{h3}}\ne 0.0$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
NE_ALL_ELEMENTS_EQUAL
On entry, all the values in the array x must not be equal.
On entry, argument psifun had an illegal value.
On entry, argument sigma_est had an illegal value.
NE_ESTIM_SIGMA_ZERO
The estimated value of sigma was $\le 0.0$ during an iteration.
NE_INT_ARG_LE
On entry, ${\mathbf{maxit}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{maxit}}>0$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}>1$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL_ARG_LE
On entry, tol must not be less than or equal to 0.0: ${\mathbf{tol}}=〈\mathit{\text{value}}〉$.
NE_REAL_ENUM_ARG_CONS
On entry, ${\mathbf{c}}=〈\mathit{\text{value}}〉$, ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{c}}>0.0$, ${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$.
On entry, ${\mathbf{dchi}}=〈\mathit{\text{value}}〉$, ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{dchi}}>0.0$, ${\mathbf{psifun}}\ne \mathrm{Nag_Lsq}$.
On entry, ${\mathbf{h1}}=〈\mathit{\text{value}}〉$, ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{h1}}\ge 0.0$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
NE_TOO_MANY
Too many iterations ($〈\mathit{\text{value}}〉$ ).
NE_WINS_RES_ZERO
The Winsorized residuals are zero.
On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$ option with a redescending $\psi$ function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of $\sigma$ is too small, then the standardized residuals $\frac{{x}_{i}-{\stackrel{^}{\theta }}_{k}}{{\sigma }_{c}}$, will be large and all the residuals may fall into the region for which $\psi \left(t\right)=0$. This may incorrectly terminate the iterations thus making theta and sigma invalid.
Re-enter the function with a larger value of ${\sigma }_{c}$ or with ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$.

## 7Accuracy

On successful exit the accuracy of the results is related to the value of TOL, see Section 4.

## 8Parallelism and Performance

nag_robust_m_estim_1var (g07dbc) is not threaded in any implementation.

When you supply the initial values, care has to be taken over the choice of the initial value of $\sigma$. If too small a value of $\sigma$ is chosen then initial values of the standardized residuals $\frac{{x}_{i}-{\stackrel{^}{\theta }}_{k}}{\sigma }$ will be large. If the redescending $\psi$ functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see Hampel et al. (1986).

## 10Example

The following program reads in a set of data consisting of eleven observations of a variable $X$.
For this example, Hampels's Piecewise Linear Function is used (${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$), values for ${h}_{1}$, ${h}_{2}$ and ${h}_{3}$ along with $d$ for the $\chi$ function, being read from the data file.
Using the following starting values various estimates of $\theta$ and $\sigma$ are calculated and printed along with the number of iterations used:
 (a) nag_robust_m_estim_1var (g07dbc) determines the starting values, $\sigma$ is estimated simultaneously. (b) You supply the starting values, $\sigma$ is estimated simultaneously. (c) nag_robust_m_estim_1var (g07dbc) determines the starting values, $\sigma$ is fixed. (d) You supply the starting values, $\sigma$ is fixed.

### 10.1Program Text

Program Text (g07dbce.c)

### 10.2Program Data

Program Data (g07dbce.d)

### 10.3Program Results

Program Results (g07dbce.r)