Integer type: int32 int64 nag_int show int32 show int32 show int64 show int64 show nag_int show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)

Chapter Contents

Chapter Introduction

NAG Toolbox

NAG Toolbox: nag_univar_robust_1var_mestim (g07db)

▸▿ Contents

1 Purpose

2 Syntax

3 Description

4 References

▸▿ 5 Parameters

5.1 Compulsory Input Parameters

5.2 Optional Input Parameters

5.3 Output Parameters

6 Error Indicators and Warnings

7 Accuracy

8 Further Comments

9 Example

Purpose

nag_univar_robust_1var_mestim (g07db) computes an

M

-estimate of location with (optional) simultaneous estimation of the scale using Huber's algorithm.

Syntax

[theta, sigma, rs, nit, wrk, ifail] = g07db(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol, 'n', n, 'maxit', maxit)

[theta, sigma, rs, nit, wrk, ifail] = nag_univar_robust_1var_mestim(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol, 'n', n, 'maxit', maxit)

Description

The data consists of a sample of size

n

, denoted by

x_{1}, x_{2}, \dots, x_{n}

, drawn from a random variable

X

The

x_{i}

are assumed to be independent with an unknown distribution function of the form

F ((x_{i} - θ) / σ)

where

θ

is a location argument, and

σ

is a scale argument.

M

-estimators of

θ

and

σ

are given by the solution to the following system of equations:

\sum_{i = 1}^{n} ψ ((x_{i} - \hat{θ}) / \hat{σ}) = 0

(1)

\sum_{i = 1}^{n} χ ((x_{i} - \hat{θ}) / \hat{σ}) = (n - 1) β

(2)

where

ψ

and

χ

are given functions, and

β

is a constant, such that

\hat{σ}

is an unbiased estimator when

x_{i}

, for

i = 1, 2, \dots, n

has a Normal distribution. Optionally, the second equation can be omitted and the first equation is solved for

\hat{θ}

using an assigned value of

σ = σ_{c}

The values of

ψ (\frac{x_{i} - \hat{θ}}{\hat{σ}}) \hat{σ}

are known as the Winsorized residuals.

The following functions are available for

ψ

and

χ

in nag_univar_robust_1var_mestim (g07db).

(a)

Null Weights

ψ (t) = t

χ (t) = \frac{t^{2}}{2}

Use of these null functions leads to the mean and standard deviation of the data.

(b)

Huber's Function

$ψ (t) = \max (- c, \min (c, t))$		$χ (t) = \frac{{‖t‖}^{2}}{2} ‖t‖ \leq d$

		$χ (t) = \frac{d^{2}}{2} ‖t‖ > d$

(c)

Hampel's Piecewise Linear Function

$ψ_{h_{1}, h_{2}, h_{3}} (t) = - ψ_{h_{1}, h_{2}, h_{3}} (- t)$

$ψ_{h_{1}, h_{2}, h_{3}} (t) = t$	$0 \leq t \leq h_{1}$	$χ (t) = \frac{{\|t\|}^{2}}{2} \|t\| \leq d$

$ψ_{h_{1}, h_{2}, h_{3}} (t) = h_{1}$	$h_{1} \leq t \leq h_{2}$

$ψ_{h_{1}, h_{2}, h_{3}} (t) = h_{1} (h_{3} - t) / (h_{3} - h_{2})$	$h_{2} \leq t \leq h_{3}$	$χ (t) = \frac{d^{2}}{2} \|t\| > d$

$ψ_{h_{1}, h_{2}, h_{3}} (t) = 0$	$t > h_{3}$

(d)

Andrew's Sine Wave Function

$ψ (t) = \sin t$	$- π \leq t \leq π$	$χ (t) = \frac{{\|t\|}^{2}}{2} \|t\| \leq d$

$ψ (t) = 0$	otherwise	$χ (t) = \frac{d^{2}}{2} \|t\| > d$

(e)

Tukey's Bi-weight

$ψ (t) = t {(1 - t^{2})}^{2}$	$\|t\| \leq 1$	$χ (t) = \frac{{\|t\|}^{2}}{2} \|t\| \leq d$

$ψ (t) = t {(1 - t^{2})}^{2} = 0$	otherwise	$χ (t) = \frac{d^{2}}{2} \|t\| > d$

where

c

h_{1}

h_{2}

h_{3}

and

d

are constants.

Equations (1) and (2) are solved by a simple iterative procedure suggested by Huber:

{\hat{σ}}_{k} = \sqrt{\frac{1}{β (n - 1)} (\sum_{i = 1}^{n} χ (\frac{x_{i} - {\hat{θ}}_{k - 1}}{{\hat{σ}}_{k - 1}})) {\hat{σ}}_{k - 1}^{2}}

and

{\hat{θ}}_{k} = {\hat{θ}}_{k - 1} + \frac{1}{n} \sum_{i = 1}^{n} ψ (\frac{x_{i} - {\hat{θ}}_{k - 1}}{{\hat{σ}}_{k}}) {\hat{σ}}_{k}

{\hat{σ}}_{k} = σ_{c},   if σ is fixed.

The initial values for

\hat{θ}

and

\hat{σ}

may either be user-supplied or calculated within nag_univar_robust_1var_mestim (g07db) as the sample median and an estimate of

σ

based on the median absolute deviation respectively.

nag_univar_robust_1var_mestim (g07db) is based upon function LYHALG within the ROBETH library, see Marazzi (1987).

References

Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley

Huber P J (1981) Robust Statistics Wiley

Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

Parameters

Compulsory Input Parameters

1: $isigma$ – int64int32nag_int scalar

The value assigned to isigma determines whether

\hat{σ}

is to be simultaneously estimated.

$isigma = 0$: The estimation of $\hat{σ}$ is bypassed and sigma is set equal to $σ_{c}$ .
$isigma = 1$: $\hat{σ}$ is estimated simultaneously.

2: $x (n)$ – double array

The vector of observations,

x_{1}, x_{2}, \dots, x_{n}

3: $ipsi$ – int64int32nag_int scalar

Which

ψ

function is to be used.

$ipsi = 0$: $ψ (t) = t$ .
$ipsi = 1$: Huber's function.
$ipsi = 2$: Hampel's piecewise linear function.
$ipsi = 3$: Andrew's sine wave,
$ipsi = 4$: Tukey's bi-weight.

4: $c$ – double scalar

ipsi = 1

, c must specify the argument,

c

, of Huber's

ψ

function. c is not referenced if

ipsi \neq 1

Constraint: if

ipsi = 1

c > 0.0

5: $h1$ – double scalar

6: $h2$ – double scalar

7: $h3$ – double scalar

ipsi = 2

, h1, h2 and h3 must specify the arguments,

h_{1}

h_{2}

, and

h_{3}

, of Hampel's piecewise linear

ψ

function. h1, h2 and h3 are not referenced if

ipsi \neq 2

Constraint:

0 \leq h1 \leq h2 \leq h3

and

h3 > 0.0

ipsi = 2

8: $dchi$ – double scalar

d

, the argument of the

χ

function. dchi is not referenced if

ipsi = 0

Constraint: if

ipsi \neq 0

dchi > 0.0

9: $theta$ – double scalar

sigma > 0

then theta must be set to the required starting value of the estimation of the location argument

\hat{θ}

. A reasonable initial value for

\hat{θ}

will often be the sample mean or median.

10: $sigma$ – double scalar

The role of sigma depends on the value assigned to isigma, as follows:

if $isigma = 1$ , sigma must be assigned a value which determines the values of the starting points for the calculations of $\hat{θ}$ and $\hat{σ}$ . If $sigma \leq 0.0$ then nag_univar_robust_1var_mestim (g07db) will determine the starting points of $\hat{θ}$ and $\hat{σ}$ . Otherwise the value assigned to sigma will be taken as the starting point for $\hat{σ}$ , and theta must be assigned a value before entry, see above;
if $isigma = 0$ , sigma must be assigned a value which determines the value of $σ_{c}$ , which is held fixed during the iterations, and the starting value for the calculation of $\hat{θ}$ . If $sigma \leq 0$ , then nag_univar_robust_1var_mestim (g07db) will determine the value of $σ_{c}$ as the median absolute deviation adjusted to reduce bias (see nag_univar_robust_1var_median (g07da)) and the starting point for $\hat{θ}$ . Otherwise, the value assigned to sigma will be taken as the value of $σ_{c}$ and theta must be assigned a relevant value before entry, see above.

11: $tol$ – double scalar

The relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than

tol \times \max (1.0, σ_{k - 1})

Constraint:

tol > 0.0

Optional Input Parameters

1: $n$ – int64int32nag_int scalar: Default: the dimension of the array x.
$n$ , the number of observations.

Constraint: $n > 1$ .
2: $maxit$ – int64int32nag_int scalar: Default: $50$
The maximum number of iterations that should be used during the estimation.

Constraint: $maxit > 0$ .

Output Parameters

1: $theta$ – double scalar: The $M$ -estimate of the location argument, $\hat{θ}$ .
2: $sigma$ – double scalar: Contains the $M$ -estimate of the scale argument, $\hat{σ}$ , if isigma was assigned the value $1$ on entry, otherwise sigma will contain the initial fixed value $σ_{c}$ .
3: $rs (n)$ – double array: The Winsorized residuals.
4: $nit$ – int64int32nag_int scalar: The number of iterations that were used during the estimation.
5: $wrk (n)$ – double array: If $sigma \leq 0.0$ on entry, wrk will contain the $n$ observations in ascending order.
6: $ifail$ – int64int32nag_int scalar: $ifail = 0$ unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:

$ifail = 1$

On entry,	$n \leq 1$ ,
or	$maxit \leq 0$ ,
or	$tol \leq 0.0$ ,
or	$isigma \neq 0$ or $1$ ,
or	$ipsi < 0$ ,
or	$ipsi > 4$ .

$ifail = 2$

On entry,	$c \leq 0.0$ and $ipsi = 1$ ,
or	$h1 < 0.0$ and $ipsi = 2$ ,
or	$h1 = h2 = h3 = 0.0$ and $ipsi = 2$ ,
or	$h1 > h2$ and $ipsi = 2$ ,
or	$h1 > h3$ and $ipsi = 2$ ,
or	$h2 > h3$ and $ipsi = 2$ ,
or	$dchi \leq 0.0$ and $ipsi \neq 0$ .

$ifail = 3$

On entry,

all elements of the input array x are equal.

$ifail = 4$: sigma, the current estimate of $σ$ , is zero or negative. This error exit is very unlikely, although it may be caused by too large an initial value of sigma.

$ifail = 5$: The number of iterations required exceeds maxit.

$ifail = 6$: On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the $isigma = 0$ option with a redescending $ψ$ function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.

If the given value of $σ$ is too small, then the standardized residuals $\frac{x_{i} - {\hat{θ}}_{k}}{σ_{c}}$ , will be large and all the residuals may fall into the region for which $ψ (t) = 0$ . This may incorrectly terminate the iterations thus making theta and sigma invalid.

Re-enter the function with a larger value of $σ_{c}$ or with $isigma = 1$ .

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.

$ifail = - 999$: Dynamic memory allocation failed.

Accuracy

On successful exit the accuracy of the results is related to the value of tol, see Arguments.

Further Comments

When you supply the initial values, care has to be taken over the choice of the initial value of

σ

. If too small a value of

σ

is chosen then initial values of the standardized residuals

\frac{x_{i} - {\hat{θ}}_{k}}{σ}

will be large. If the redescending

ψ

functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see page 152 of Hampel et al. (1986).

Example

The following program reads in a set of data consisting of eleven observations of a variable

X

For this example, Hampel's Piecewise Linear Function is used (

ipsi = 2

), values for

h_{1}

h_{2}

and

h_{3}

along with

d

for the

χ

function, being read from the data file.

Using the following starting values various estimates of

θ

and

σ

are calculated and printed along with the number of iterations used:

(a)	nag_univar_robust_1var_mestim (g07db) determines the starting values, $σ$ is estimated simultaneously.
(b)	You must supply the starting values, $σ$ is estimated simultaneously.
(c)	nag_univar_robust_1var_mestim (g07db) determines the starting values, $σ$ is fixed.
(d)	You must supply the starting values, $σ$ is fixed.

Open in the MATLAB editor: g07db_example

function g07db_example


fprintf('g07db example results\n\n');

x = [13; 11; 16;  5;  3; 18;  9;  8;  6; 27;  7];


ipsi = int64(2);
c  = 0;
h1 = 1.5;
h2 = 3;
h3 = 4.5;
dchi = 1.5;
tol = 0.0001;

% Loop over input values for isigma sigma and theta
isigma = int64([ 1  1  0  0]);
sigma  =         [-1  7 -1  7];
theta  =         [ 0  2  0  2];

fprintf('           Input parameters     Output parameters\n');
fprintf(' isigma   sigma   theta   tol    sigma  theta\n');
for j = 1:numel(theta)

  fprintf('%3d   %8.4f%8.4f%8.4f', isigma(j), sigma(j), theta(j), tol);

  [thetaOut, sigmaOut, rs, nit, wrk, ifail] = ...
  g07db( ...
         isigma(j), x, ipsi, c, h1, h2, h3, dchi, theta(j), sigma(j), tol);

  fprintf(' %8.4f%8.4f\n', sigmaOut, thetaOut);
end

g07db example results

           Input parameters     Output parameters
 isigma   sigma   theta   tol    sigma  theta
  1    -1.0000  0.0000  0.0001   6.3247 10.5487
  1     7.0000  2.0000  0.0001   6.3249 10.5487
  0    -1.0000  0.0000  0.0001   5.9304 10.4896
  0     7.0000  2.0000  0.0001   7.0000 10.6500

PDF version (NAG web site, 64-bit version, 64-bit version)

Chapter Contents

Chapter Introduction

NAG Toolbox