NAG CL Interface
g07dbc (robust_1var_mestim)
1
Purpose
g07dbc computes an -estimate of location with (optional) simultaneous estimation of the scale using Huber's algorithm.
2
Specification
void |
g07dbc (Nag_SigmaSimulEst sigma_est,
Integer n,
const double x[],
Nag_PsiFun psifun,
double c,
double h1,
double h2,
double h3,
double dchi,
double *theta,
double *sigma,
Integer maxit,
double tol,
double rs[],
Integer *nit,
double sorted_x[],
NagError *fail) |
|
The function may be called by the names: g07dbc, nag_univar_robust_1var_mestim or nag_robust_m_estim_1var.
3
Description
The data consists of a sample of size , denoted by , drawn from a random variable .
The
are assumed to be independent with an unknown distribution function of the form
where
is a location argument, and
is a scale argument.
-estimators of
and
are given by the solution to the following system of equations:
where
and
are given functions, and
is a constant, such that
is an unbiased estimator when
, for
, has a normal distribution. Optionally, the second equation can be omitted and the first equation is solved for
using an assigned value of
.
The values of are known as the Winsorized residuals.
The following functions are available for
and
in
g07dbc:
-
(a)Null Weights
Use of these null functions leads to the mean and standard deviation of the data.
-
(b)Huber's Function
|
|
|
|
|
|
|
|
-
(c)Hampel's Piecewise Linear Function
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-
(d)Andrew's Sine Wave Function
|
|
|
|
|
|
otherwise |
|
-
(e)Tukey's Bi-weight
|
|
|
|
|
|
otherwise |
|
where
,
,
,
and
are constants.
Equations
(1) and
(2) are solved by a simple iterative procedure suggested by Huber:
and
or
The initial values for and may either be user-supplied or calculated within g07dbc as the sample median and an estimate of based on the median absolute deviation respectively.
g07dbc is based upon subroutine LYHALG within the ROBETH library, see
Marazzi (1987).
4
References
Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne
5
Arguments
-
1:
– Nag_SigmaSimulEst
Input
-
On entry: the value assigned to
sigma_est determines whether
is to be simultaneously estimated.
- The estimation of is bypassed and sigma is set equal to ;
- is estimated simultaneously.
Constraint:
or .
-
2:
– Integer
Input
-
On entry: the number of observations, .
Constraint:
.
-
3:
– const double
Input
-
On entry: the vector of observations, .
-
4:
– Nag_PsiFun
Input
-
On entry: which
function is to be used.
-
- Huber's function.
- Hampel's piecewise linear function.
- Andrew's sine wave.
- Tukey's bi-weight.
Constraint:
, , , or .
-
5:
– double
Input
-
On entry: must specify the argument,
, of Huber's
function, if
.
c is not referenced if
.
Constraint:
if .
-
6:
– double
Input
-
7:
– double
Input
-
8:
– double
Input
-
On entry:
h1,
h2, and
h3 must specify the arguments
,
, and
, of Hampel's piecewise linear
function, if
.
h1,
h2, and
h3 are not referenced if
.
Constraint:
and if .
-
9:
– double
Input
-
On entry: the argument,
, of the
function.
dchi is not referenced if
.
Constraint:
if .
-
10:
– double *
Input/Output
-
On entry: if
then
theta must be set to the required starting value of the estimation of the location argument
. A reasonable initial value for
will often be the sample mean or median.
On exit: the -estimate of the location argument, .
-
11:
– double *
Input/Output
-
The role of
sigma depends on the value assigned to
sigma_est (see above) as follows.
If :
On entry:
sigma must be assigned a value which determines the values of the starting points for the calculations of
and
. If
then
g07dbc will determine the starting points of
and
. Otherwise the value assigned to
sigma will be taken as the starting point for
, and
theta must be assigned a value before entry, see above.
If :
On entry:
sigma must be assigned a value which determines the value of
, which is held fixed during the iterations, and the starting value for the calculation of
. If
, then
g07dbc will determine the value of
as the median absolute deviation adjusted to reduce bias (see G07DAF) and the starting point for
. Otherwise, the value assigned to
sigma will be taken as the value of
and
theta must be assigned a relevant value before entry, see above.
On exit:
sigma contains the
– estimate of the scale argument,
, if
on entry, otherwise
sigma will contain the initial fixed value
.
-
12:
– Integer
Input
-
On entry: the maximum number of iterations that should be used during the estimation.
Suggested value:
p .
Constraint:
.
-
13:
– double
Input
-
On entry: the relative precision for the final estimates. Convergence is assumed when the increments for
theta, and
sigma are less than
.
Constraint:
.
-
14:
– double
Output
-
On exit: the Winsorized residuals.
-
15:
– Integer *
Output
-
On exit: the number of iterations that were used during the estimation.
-
16:
– double
Output
-
On exit: if
on entry,
sorted_x will contain the
observations in ascending order.
-
17:
– NagError *
Input/Output
-
The NAG error argument (see
Section 7 in the Introduction to the NAG Library CL Interface).
6
Error Indicators and Warnings
- NE_2_REAL_ENUM_ARG_CONS
-
On entry, , and . These arguments must satisfy , .
On entry, , and . These arguments must satisfy , .
On entry, , and . These arguments must satisfy , .
- NE_3_REAL_ENUM_ARG_CONS
-
On entry, , , , . These arguments must satisfy =, .
- NE_ALL_ELEMENTS_EQUAL
-
On entry, all the values in the array
x must not be equal.
- NE_BAD_PARAM
-
On entry, argument
psifun had an illegal value.
On entry, argument
sigma_est had an illegal value.
- NE_ESTIM_SIGMA_ZERO
-
The estimated value of
sigma was
during an iteration.
- NE_INT_ARG_LE
-
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INTERNAL_ERROR
-
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
- NE_REAL_ARG_LE
-
On entry,
tol must not be less than or equal to 0.0:
.
- NE_REAL_ENUM_ARG_CONS
-
On entry, , . These arguments must satisfy , .
On entry, , . These arguments must satisfy , .
On entry, , . These arguments must satisfy , .
- NE_TOO_MANY
-
Too many iterations ( ).
- NE_WINS_RES_ZERO
-
The Winsorized residuals are zero.
On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the option with a redescending function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of
is too small, then the standardized residuals
, will be large and all the residuals may fall into the region for which
. This may incorrectly terminate the iterations thus making
theta and
sigma invalid.
Re-enter the function with a larger value of or with .
7
Accuracy
On successful exit the accuracy of the results is related to the value of TOL, see Section 4.
8
Parallelism and Performance
g07dbc is not threaded in any implementation.
When you supply the initial values, care has to be taken over the choice of the initial value of
. If too small a value of
is chosen then initial values of the standardized residuals
will be large. If the redescending
functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see
Hampel et al. (1986).
10
Example
The following program reads in a set of data consisting of eleven observations of a variable .
For this example, Hampels's Piecewise Linear Function is used (), values for , and along with for the function, being read from the data file.
Using the following starting values various estimates of
and
are calculated and printed along with the number of iterations used:
-
(a)g07dbc determines the starting values, is estimated simultaneously.
-
(b)You supply the starting values, is estimated simultaneously.
-
(c)g07dbc determines the starting values, is fixed.
-
(d)You supply the starting values, is fixed.
10.1
Program Text
10.2
Program Data
10.3
Program Results