The routine may be called by the names g07dcf or nagf_univar_robust_1var_mestim_wgt.
3Description
The data consists of a sample of size , denoted by , drawn from a random variable .
The are assumed to be independent with an unknown distribution function of the form,
where is a location parameter, and is a scale parameter. -estimators of and are given by the solution to the following system of equations;
where and are user-supplied weight functions, and is a constant. Optionally the second equation can be omitted and the first equation is solved for using an assigned value of .
The constant should be chosen so that is an unbiased estimator when , for has a Normal distribution. To achieve this the value of is calculated as:
The values of are known as the Winsorized residuals.
The equations are solved by a simple iterative procedure, suggested by Huber:
and
or
if is fixed.
The initial values for and may be user-supplied or calculated within g07dbf as the sample median and an estimate of based on the median absolute deviation respectively.
g07dcf is based upon subroutine LYHALG within the ROBETH library, see Marazzi (1987).
4References
Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne
5Arguments
1: – real (Kind=nag_wp) Function, supplied by the user.External Procedure
chi must return the value of the weight function for a given value of its argument. The value of must be non-negative.
On entry: the argument for which chi must be evaluated.
chi must either be a module subprogram USEd by, or declared as EXTERNAL in, the (sub)program from which g07dcf is called. Arguments denoted as Input must not be changed by this procedure.
Note:chi should not return floating-point NaN (Not a Number) or infinity values, since these are not handled by g07dcf. If your code inadvertently does return any NaNs or infinities, g07dcf is likely to produce unexpected results.
2: – real (Kind=nag_wp) Function, supplied by the user.External Procedure
psi must return the value of the weight function for a given value of its argument.
On entry: the argument for which psi must be evaluated.
psi must either be a module subprogram USEd by, or declared as EXTERNAL in, the (sub)program from which g07dcf is called. Arguments denoted as Input must not be changed by this procedure.
Note:psi should not return floating-point NaN (Not a Number) or infinity values, since these are not handled by g07dcf. If your code inadvertently does return any NaNs or infinities, g07dcf is likely to produce unexpected results.
3: – IntegerInput
On entry: the value assigned to isigma determines whether is to be simultaneously estimated.
The estimation of is bypassed and sigma is set equal to .
is estimated simultaneously.
4: – IntegerInput
On entry: , the number of observations.
Constraint:
.
5: – Real (Kind=nag_wp) arrayInput
On entry: the vector of observations, .
6: – Real (Kind=nag_wp)Input
On entry: the value of the constant of the chosen chi function.
Constraint:
.
7: – Real (Kind=nag_wp)Input/Output
On entry: if , theta must be set to the required starting value of the estimate of the location parameter . A reasonable initial value for will often be the sample mean or median.
On exit: the -estimate of the location parameter .
8: – Real (Kind=nag_wp)Input/Output
On entry: the role of sigma depends on the value assigned to isigma as follows.
If , sigma must be assigned a value which determines the values of the starting points for the calculation of and . If , g07dcf will determine the starting points of and . Otherwise, the value assigned to sigma will be taken as the starting point for , and theta must be assigned a relevant value before entry, see above.
If , sigma must be assigned a value which determines the values of , which is held fixed during the iterations, and the starting value for the calculation of . If , g07dcf will determine the value of as the median absolute deviation adjusted to reduce bias (see g07daf) and the starting point for . Otherwise, the value assigned to sigma will be taken as the value of and theta must be assigned a relevant value before entry, see above.
On exit: the -estimate of the scale parameter , if isigma was assigned the value on entry, otherwise sigma will contain the initial fixed value .
9: – IntegerInput
On entry: the maximum number of iterations that should be used during the estimation.
Suggested value:
.
Constraint:
.
10: – Real (Kind=nag_wp)Input
On entry: the relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than .
Constraint:
.
11: – Real (Kind=nag_wp) arrayOutput
On exit: the Winsorized residuals.
12: – IntegerOutput
On exit: the number of iterations that were used during the estimation.
13: – Real (Kind=nag_wp) arrayOutput
On exit: if on entry, wrk will contain the observations in ascending order.
14: – IntegerInput/Output
On entry: ifail must be set to , or to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of means that an error message is printed while a value of means that it is not.
If halting is not appropriate, the value or is recommended. If message printing is undesirable, then the value is recommended. Otherwise, the value is recommended. When the value or is used it is essential to test the value of ifail on exit.
On exit: unless the routine detects an error or a warning has been flagged (see Section 6).
6Error Indicators and Warnings
If on entry or , explanatory error messages are output on the current error message unit (as defined by x04aaf).
All winsorized residuals are zero. This may occur when using the option with a redescending function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of is too small, the standardized residuals , will be large and all the residuals may fall into the region for which . This may incorrectly terminate the iterations thus making theta and sigma invalid.
Re-enter the routine with a larger value of or with .
An unexpected error has been triggered by this routine. Please
contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.
7Accuracy
On successful exit the accuracy of the results is related to the value of tol, see Section 5.
8Parallelism and Performance
Background information to multithreading can be found in the Multithreading documentation.
g07dcf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g07dcf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.
When you supply the initial values, care has to be taken over the choice of the initial value of . If too small a value is chosen then initial values of the standardized residuals will be large. If the redescending functions are used, i.e., if , for some positive constant , then these large values are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see page 152 of Hampel et al. (1986).
10Example
The following program reads in a set of data consisting of eleven observations of a variable .
The psi and chi functions used are Hampel's Piecewise Linear Function and Hubers chi function respectively.
Using the following starting values various estimates of and are calculated and printed along with the number of iterations used:
(a)g07dcf determined the starting values, is estimated simultaneously.
(b)You must supply the starting values, is estimated simultaneously.
(c)g07dcf determined the starting values, is fixed.