Integer, Intent (In)	::	isigma, n, ipsi, maxit
Integer, Intent (Inout)	::	ifail
Integer, Intent (Out)	::	nit
Real (Kind=nag_wp), Intent (In)	::	x(n), c, h1, h2, h3, dchi, tol
Real (Kind=nag_wp), Intent (Inout)	::	theta, sigma
Real (Kind=nag_wp), Intent (Out)	::	rs(n), wrk(n)

C Header Interface

#include <nag.h>

void

g07dbf_ (const Integer *isigma, const Integer *n, const double x[], const Integer *ipsi, const double *c, const double *h1, const double *h2, const double *h3, const double *dchi, double *theta, double *sigma, const Integer *maxit, const double *tol, double rs[], Integer *nit, double wrk[], Integer *ifail)

The routine may be called by the names g07dbf or nagf_univar_robust_1var_mestim.

3 Description

The data consists of a sample of size

n

, denoted by

x_{1}, x_{2}, \dots, x_{n}

, drawn from a random variable

X

The

x_{i}

are assumed to be independent with an unknown distribution function of the form

F ((x_{i} - θ) / σ)

where

θ

is a location parameter, and

σ

is a scale parameter.

M

-estimators of

θ

and

σ

are given by the solution to the following system of equations:

\sum_{i = 1}^{n} ψ ((x_{i} - \hat{θ}) / \hat{σ}) = 0

(1)

\sum_{i = 1}^{n} χ ((x_{i} - \hat{θ}) / \hat{σ}) = (n - 1) β

(2)

where

ψ

and

χ

are given functions, and

β

is a constant, such that

\hat{σ}

is an unbiased estimator when

x_{i}

, for

i = 1, 2, \dots, n

has a Normal distribution. Optionally, the second equation can be omitted and the first equation is solved for

\hat{θ}

using an assigned value of

σ = σ_{c}

The values of

ψ (\frac{x_{i} - \hat{θ}}{\hat{σ}}) \hat{σ}

are known as the Winsorized residuals.

The following functions are available for

ψ

and

χ

in g07dbf.

(a)Null Weights

$ψ (t) = t$ $χ (t) = \frac{t^{2}}{2}$

Use of these null functions leads to the mean and standard deviation of the data.
(b)Huber's Function

$ψ (t) = \max (- c, \min (c, t))$ $χ (t) = \frac{{‖t‖}^{2}}{2} ‖t‖ \leq d$

$χ (t) = \frac{d^{2}}{2} ‖t‖ > d$

(c)Hampel's Piecewise Linear Function

$ψ_{h_{1}, h_{2}, h_{3}} (t) = - ψ_{h_{1}, h_{2}, h_{3}} (- t)$

$ψ_{h_{1}, h_{2}, h_{3}} (t) = t$	$0 \leq t \leq h_{1}$	$χ (t) = \frac{{\|t\|}^{2}}{2} \|t\| \leq d$

$ψ_{h_{1}, h_{2}, h_{3}} (t) = h_{1}$	$h_{1} \leq t \leq h_{2}$

$ψ_{h_{1}, h_{2}, h_{3}} (t) = h_{1} (h_{3} - t) / (h_{3} - h_{2})$	$h_{2} \leq t \leq h_{3}$	$χ (t) = \frac{d^{2}}{2} \|t\| > d$

$ψ_{h_{1}, h_{2}, h_{3}} (t) = 0$	$t > h_{3}$

(d)Andrew's Sine Wave Function

$ψ (t) = \sin t$	$- π \leq t \leq π$	$χ (t) = \frac{{\|t\|}^{2}}{2} \|t\| \leq d$

$ψ (t) = 0$	otherwise	$χ (t) = \frac{d^{2}}{2} \|t\| > d$

(e)Tukey's Bi-weight

$ψ (t) = t {(1 - t^{2})}^{2}$	$\|t\| \leq 1$	$χ (t) = \frac{{\|t\|}^{2}}{2} \|t\| \leq d$

$ψ (t) = t {(1 - t^{2})}^{2} = 0$	otherwise	$χ (t) = \frac{d^{2}}{2} \|t\| > d$

where

c

h_{1}

h_{2}

h_{3}

and

d

are constants.

Equations (1) and (2) are solved by a simple iterative procedure suggested by Huber:

{\hat{σ}}_{k} = \sqrt{\frac{1}{β (n - 1)} (\sum_{i = 1}^{n} χ (\frac{x_{i} - {\hat{θ}}_{k - 1}}{{\hat{σ}}_{k - 1}})) {\hat{σ}}_{k - 1}^{2}}

and

{\hat{θ}}_{k} = {\hat{θ}}_{k - 1} + \frac{1}{n} \sum_{i = 1}^{n} ψ (\frac{x_{i} - {\hat{θ}}_{k - 1}}{{\hat{σ}}_{k}}) {\hat{σ}}_{k}

{\hat{σ}}_{k} = σ_{c},   if σ is fixed.

The initial values for

\hat{θ}

and

\hat{σ}

may either be user-supplied or calculated within g07dbf as the sample median and an estimate of

σ

based on the median absolute deviation respectively.

g07dbf is based upon subroutine LYHALG within the ROBETH library, see Marazzi (1987).

4 References

Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley

Huber P J (1981) Robust Statistics Wiley

Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

5 Arguments

1: $isigma$ – Integer Input

On entry: the value assigned to isigma determines whether

\hat{σ}

is to be simultaneously estimated.

$isigma = 0$: The estimation of $\hat{σ}$ is bypassed and sigma is set equal to $σ_{c}$ .
$isigma = 1$: $\hat{σ}$ is estimated simultaneously.

2: $n$ – Integer Input

On entry:

n

, the number of observations.

Constraint:

n > 1

3: $x (n)$ – Real (Kind=nag_wp) array Input

On entry: the vector of observations,

x_{1}, x_{2}, \dots, x_{n}

4: $ipsi$ – Integer Input

On entry: which

ψ

function is to be used.

$ipsi = 0$: $ψ (t) = t$ .
$ipsi = 1$: Huber's function.
$ipsi = 2$: Hampel's piecewise linear function.
$ipsi = 3$: Andrew's sine wave,
$ipsi = 4$: Tukey's bi-weight.

5: $c$ – Real (Kind=nag_wp) Input

On entry: if

ipsi = 1

, c must specify the parameter,

c

, of Huber's

ψ

function. c is not referenced if

ipsi \neq 1

Constraint: if

ipsi = 1

c > 0.0

6: $h1$ – Real (Kind=nag_wp) Input

7: $h2$ – Real (Kind=nag_wp) Input

8: $h3$ – Real (Kind=nag_wp) Input

On entry: if

ipsi = 2

, h1, h2 and h3 must specify the parameters,

h_{1}

h_{2}

, and

h_{3}

, of Hampel's piecewise linear

ψ

function. h1, h2 and h3 are not referenced if

ipsi \neq 2

Constraint:

0 \leq h1 \leq h2 \leq h3

and

h3 > 0.0

ipsi = 2

9: $dchi$ – Real (Kind=nag_wp) Input

On entry:

d

, the parameter of the

χ

function. dchi is not referenced if

ipsi = 0

Constraint: if

ipsi \neq 0

dchi > 0.0

10: $theta$ – Real (Kind=nag_wp) Input/Output

On entry: if

sigma > 0

then theta must be set to the required starting value of the estimation of the location parameter

\hat{θ}

. A reasonable initial value for

\hat{θ}

will often be the sample mean or median.

On exit: the

M

-estimate of the location parameter,

\hat{θ}

11: $sigma$ – Real (Kind=nag_wp) Input/Output

On entry: the role of sigma depends on the value assigned to isigma, as follows:

if $isigma = 1$ , sigma must be assigned a value which determines the values of the starting points for the calculations of $\hat{θ}$ and $\hat{σ}$ . If $sigma \leq 0.0$ then g07dbf will determine the starting points of $\hat{θ}$ and $\hat{σ}$ . Otherwise the value assigned to sigma will be taken as the starting point for $\hat{σ}$ , and theta must be assigned a value before entry, see above;
if $isigma = 0$ , sigma must be assigned a value which determines the value of $σ_{c}$ , which is held fixed during the iterations, and the starting value for the calculation of $\hat{θ}$ . If $sigma \leq 0$ , g07dbf will determine the value of $σ_{c}$ as the median absolute deviation adjusted to reduce bias (see g07daf) and the starting point for $\hat{θ}$ . Otherwise, the value assigned to sigma will be taken as the value of $σ_{c}$ and theta must be assigned a relevant value before entry, see above.

On exit: contains the

M

-estimate of the scale parameter,

\hat{σ}

, if isigma was assigned the value

1

on entry, otherwise sigma will contain the initial fixed value

σ_{c}

12: $maxit$ – Integer Input

On entry: the maximum number of iterations that should be used during the estimation.

Suggested value:

maxit = 50

Constraint:

maxit > 0

13: $tol$ – Real (Kind=nag_wp) Input

On entry: the relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than

tol \times \max (1.0, σ_{k - 1})

Constraint:

tol > 0.0

14: $rs (n)$ – Real (Kind=nag_wp) array Output

On exit: the Winsorized residuals.

15: $nit$ – Integer Output

On exit: the number of iterations that were used during the estimation.

16: $wrk (n)$ – Real (Kind=nag_wp) array Output

On exit: if

sigma \leq 0.0

on entry, wrk will contain the

n

observations in ascending order.

17: $ifail$ – Integer Input/Output

On entry: ifail must be set to

0

- 1

1

to set behaviour on detection of an error; these values have no effect when no error is detected.

A value of

0

causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of

- 1

means that an error message is printed while a value of

1

means that it is not.

If halting is not appropriate, the value

- 1

1

is recommended. If message printing is undesirable, then the value

1

is recommended. Otherwise, the value

0

is recommended. When the value $- 1$ or $1$ is used it is essential to test the value of ifail on exit.

On exit:

ifail = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry

ifail = 0

- 1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Errors or warnings detected by the routine:

$ifail = 1$: On entry, $ipsi = 〈value〉$ .
Constraint: $ipsi = 0$ , $1$ , $2$ , $3$ or $4$ .

On entry, $isigma = 〈value〉$ .
Constraint: $isigma = 0$ or $1$ .

On entry, $maxit = 〈value〉$ .
Constraint: $maxit > 0$ .

On entry, $n = 〈value〉$ .
Constraint: $n > 1$ .

On entry, $tol = 〈value〉$ .
Constraint: $tol > 0.0$ .

$ifail = 2$: On entry, $c = 〈value〉$ .
Constraint: $c > 0.0$ .

On entry, $dchi = 〈value〉$ .
Constraint: $dchi > 0.0$ .

On entry, $h1 = 〈value〉$ , $h2 = 〈value〉$ and $h3 = 〈value〉$ .
Constraint: $0 \leq h1 \leq h2 \leq h3$ and $h3 > 0.0$ .

$ifail = 3$: All elements of x are equal.

$ifail = 4$: Current estimate of sigma is zero or negative: $sigma = 〈value〉$ . This error exit is very unlikely, although it may be caused by too large an initial value of sigma.

$ifail = 5$: Number of iterations required exceeds maxit: $maxit = 〈value〉$ .

$ifail = 6$: All winsorized residuals are zero. This may occur when using the $isigma = 0$ option with a redescending $ψ$ function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of $σ$ is too small, the standardized residuals $\frac{x_{i} - {\hat{θ}}_{k}}{σ_{c}}$ , will be large and all the residuals may fall into the region for which $ψ (t) = 0$ . This may incorrectly terminate the iterations thus making theta and sigma invalid.
Re-enter the routine with a larger value of $σ_{c}$ or with $isigma = 1$ .

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

On successful exit the accuracy of the results is related to the value of tol, see Section 5.

8 Parallelism and Performance

g07dbf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.

g07dbf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

When you supply the initial values, care has to be taken over the choice of the initial value of

σ

. If too small a value of

σ

is chosen then initial values of the standardized residuals

\frac{x_{i} - {\hat{θ}}_{k}}{σ}

will be large. If the redescending

ψ

functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see page 152 of Hampel et al. (1986).

10 Example

The following program reads in a set of data consisting of eleven observations of a variable

X

For this example, Hampel's Piecewise Linear Function is used (

ipsi = 2

), values for

h_{1}

h_{2}

and

h_{3}

along with

d

for the

χ

function, being read from the data file.

Using the following starting values various estimates of

θ

and

σ

are calculated and printed along with the number of iterations used:

(a)g07dbf determines the starting values, $σ$ is estimated simultaneously.
(b)You must supply the starting values, $σ$ is estimated simultaneously.
(c)g07dbf determines the starting values, $σ$ is fixed.
(d)You must supply the starting values, $σ$ is fixed.

g07db: FL CL CPP AD

$ψ (t) = \max (- c, \min (c, t))$		$χ (t) = \frac{{‖t‖}^{2}}{2} ‖t‖ \leq d$

		$χ (t) = \frac{d^{2}}{2} ‖t‖ > d$

NAG FL Interfaceg07dbf (robust_​1var_​mestim)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG FL Interface
g07dbf (robust_1var_mestim)