nag_univar_robust_1var_ci (g07ea) computes a rank based (nonparametric) estimate and confidence interval for the location argument of a single population.

Syntax

[theta, thetal, thetau, estcl, wlower, wupper, ifail] = g07ea(method, x, clevel, 'n', n)

[theta, thetal, thetau, estcl, wlower, wupper, ifail] = nag_univar_robust_1var_ci(method, x, clevel, 'n', n)

Description

Consider a vector of independent observations,

x = {(x_{1}, x_{2}, \dots, x_{n})}^{T}

with unknown common symmetric density

f (x_{i} - θ)

. nag_univar_robust_1var_ci (g07ea) computes the Hodges–Lehmann location estimator (see Lehmann (1975)) of the centre of symmetry

θ

, together with an associated confidence interval. The Hodges–Lehmann estimate is defined as

\hat{θ} = median \{\frac{x_{i} + x_{j}}{2}, 1 \leq i \leq j \leq n\} .

Let

m = (n (n + 1)) / 2

and let

a_{k}

, for

k = 1, 2, \dots, m

denote the

m

ordered averages

(x_{i} + x_{j}) / 2

for

1 \leq i \leq j \leq n

. Then

if $m$ is odd, $\hat{θ} = a_{k}$ where $k = (m + 1) / 2$ ;
if $m$ is even, $\hat{θ} = (a_{k} + a_{k + 1}) / 2$ where $k = m / 2$ .

This estimator arises from inverting the one-sample Wilcoxon signed-rank test statistic,

W (x - θ_{0})

, for testing the hypothesis that

θ = θ_{0}

. Effectively

W (x - θ_{0})

is a monotonically decreasing step function of

θ_{0}

with

\begin{matrix} mean ​ (W) = μ = \frac{n (n + 1)}{4}, \\ var (W) = σ^{2} = \frac{n (n + 1) (2 n + 1)}{24} . \end{matrix}

The estimate

\hat{θ}

is the solution to the equation

W (x - \hat{θ}) = μ

; two methods are available for solving this equation. These methods avoid the computation of all the ordered averages

a_{k}

; this is because for large

n

both the storage requirements and the computation time would be excessive.

The first is an exact method based on a set partitioning procedure on the set of all ordered averages

(x_{i} + x_{j}) / 2

for

i \leq j

. This is based on the algorithm proposed by Monahan (1984).

The second is an iterative algorithm, based on the Illinois method which is a modification of the regula falsi method, see McKean and Ryan (1977). This algorithm has proved suitable for the function

W (x - θ_{0})

which is asymptotically linear as a function of

θ_{0}

The confidence interval limits are also based on the inversion of the Wilcoxon test statistic.

Given a desired percentage for the confidence interval,

1 - α

, expressed as a proportion between

0

and

1

, initial estimates for the lower and upper confidence limits of the Wilcoxon statistic are found from

W_{l} = μ - 0.5 + (σ Φ^{- 1} (α / 2))

and

W_{u} = μ + 0.5 + (σ Φ^{- 1} (1 - α / 2)),

where

Φ^{- 1}

is the inverse cumulative Normal distribution function.

W_{l}

and

W_{u}

are rounded to the nearest integer values. These estimates are then refined using an exact method if

n \leq 80

, and a Normal approximation otherwise, to find

W_{l}

and

W_{u}

satisfying

\begin{array}{l} P (W \leq W_{l}) \leq α / 2 \\ P (W \leq W_{l} + 1) > α / 2 \end{array}

and

\begin{array}{l} P (W \geq W_{u}) \leq α / 2 \\ P (W \geq W_{u} - 1) > α / 2 . \end{array}

Let

W_{u} = m - k

; then

θ_{l} = a_{k + 1}

. This is the largest value

θ_{l}

such that

W (x - θ_{l}) = W_{u}

Let

W_{l} = k

; then

θ_{u} = a_{m - k}

. This is the smallest value

θ_{u}

such that

W (x - θ_{u}) = W_{l}

As in the case of

\hat{θ}

, these equations may be solved using either the exact or the iterative methods to find the values

θ_{l}

and

θ_{u}

Then

(θ_{l}, θ_{u})

is the confidence interval for

θ

. The confidence interval is thus defined by those values of

θ_{0}

such that the null hypothesis,

θ = θ_{0}

, is not rejected by the Wilcoxon signed-rank test at the

(100 \times α) %

level.

References

Lehmann E L (1975) Nonparametrics: Statistical Methods Based on Ranks Holden–Day

Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

McKean J W and Ryan T A (1977) Algorithm 516: An algorithm for obtaining confidence intervals and point estimates based on ranks in the two-sample location problem ACM Trans. Math. Software 10 183–185

Monahan J F (1984) Algorithm 616: Fast computation of the Hodges–Lehman location estimator ACM Trans. Math. Software 10 265–270

Parameters

Compulsory Input Parameters

1: $method$ – string (length ≥ 1)

Specifies the method to be used.

$method ='E'$: The exact algorithm is used.
$method ='A'$: The iterative algorithm is used.

Constraint:

method ='E'

'A'

2: $x (n)$ – double array

The sample observations,

x_{i}

, for

i = 1, 2, \dots, n

3: $clevel$ – double scalar

The confidence interval desired.

For example, for a

95 %

confidence interval set

clevel = 0.95

Constraint:

0.0 < clevel < 1.0

Optional Input Parameters

1: $n$ – int64int32nag_int scalar: Default: the dimension of the array x.
$n$ , the sample size.

Constraint: $n \geq 2$ .

Output Parameters

1: $theta$ – double scalar: The estimate of the location, $\hat{θ}$ .
2: $thetal$ – double scalar: The estimate of the lower limit of the confidence interval, $θ_{l}$ .
3: $thetau$ – double scalar: The estimate of the upper limit of the confidence interval, $θ_{u}$ .
4: $estcl$ – double scalar: An estimate of the actual percentage confidence of the interval found, as a proportion between $(0.0, 1.0)$ .
5: $wlower$ – double scalar: The upper value of the Wilcoxon test statistic, $W_{u}$ , corresponding to the lower limit of the confidence interval.
6: $wupper$ – double scalar: The lower value of the Wilcoxon test statistic, $W_{l}$ , corresponding to the upper limit of the confidence interval.
7: $ifail$ – int64int32nag_int scalar: $ifail = 0$ unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:

$ifail = 1$

On entry,	$method \neq'E'$ or $'A'$ ,
or	$n < 2$ ,
or	$clevel \leq 0.0$ ,
or	$clevel \geq 1.0$ .

$ifail = 2$: There is not enough information to compute a confidence interval since the whole sample consists of identical values.

$ifail = 3$: For at least one of the estimates $\hat{θ}$ , $θ_{l}$ and $θ_{u}$ , the underlying iterative algorithm (when $method ='A'$ ) failed to converge. This is an unlikely exit but the estimate should still be a reasonable approximation.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.

$ifail = - 999$: Dynamic memory allocation failed.

Accuracy

nag_univar_robust_1var_ci (g07ea) should produce results accurate to five significant figures in the width of the confidence interval; that is the error for any one of the three estimates should be less than

0.00001 \times (thetau - thetal)

Further Comments

The time taken increases with the sample size

n

Example

The following program calculates a 95% confidence interval for

θ

, a measure of symmetry of the sample of

50

observations.

Open in the MATLAB editor: g07ea_example

function g07ea_example


fprintf('g07ea example results\n\n');

x = [-0.23;  0.35; -0.77;  0.35;  0.27; -0.72;  0.08; -0.40; -0.76;  0.45;
      0.73;  0.74;  0.83; -0.87;  0.21;  0.29; -0.91; -0.04;  0.82; -0.38;
     -0.31;  0.24; -0.47; -0.68; -0.77; -0.86; -0.59;  0.73;  0.39; -0.44;
      0.63; -0.22; -0.07; -0.43; -0.21; -0.31;  0.64; -1.00; -0.86; -0.73];

method = 'Exact';
clevel = 0.95;

[theta, thetal, thetau, estcl, wlower, wupper, ifail] = ...
  g07ea(method, x, clevel);

fprintf(' Location estimator     Confidence Interval\n\n');
fprintf('%13.4f%12s%7.4f,%7.4f )\n\n', theta, '(', thetal, thetau);
fprintf(' Corresponding Wilcoxon statistics\n\n');
fprintf('  Lower : %8.2f\n', wlower);
fprintf('  Upper : %8.2f\n', wupper);

g07ea example results

 Location estimator     Confidence Interval

      -0.1300           (-0.3300, 0.0350 )

 Corresponding Wilcoxon statistics

  Lower :   556.00
  Upper :   264.00

PDF version (NAG web site, 64-bit version, 64-bit version)

Chapter Contents

Chapter Introduction

NAG Toolbox