NAG Library Function Document

nag_rank_ci_1var (g07eac)

void	nag_rank_ci_1var (Nag_RCIMethod method, Integer n, const double x[], double clevel, double theta, double thetal, double thetau, double estcl, double wlower, double wupper, NagError *fail)

3 Description

Consider a vector of independent observations,

x = {(x_{1}, x_{2}, \dots, x_{n})}^{T}

with unknown common symmetric density

f (x_{i} - θ)

. nag_rank_ci_1var (g07eac) computes the Hodges–Lehmann location estimator (see Lehmann (1975)) of the centre of symmetry

θ

, together with an associated confidence interval. The Hodges–Lehmann estimate is defined as

\hat{θ} = median \{\frac{x_{i} + x_{j}}{2}, 1 \leq i \leq j \leq n\} .

Let

m = (n (n + 1)) / 2

and let

a_{k}

, for

k = 1, 2, \dots, m

denote the

m

ordered averages

(x_{i} + x_{j}) / 2

for

1 \leq i \leq j \leq n

. Then

if $m$ is odd, $\hat{θ} = a_{k}$ where $k = (m + 1) / 2$ ;
if $m$ is even, $\hat{θ} = (a_{k} + a_{k + 1}) / 2$ where $k = m / 2$ .

This estimator arises from inverting the one-sample Wilcoxon signed-rank test statistic,

W (x - θ_{0})

, for testing the hypothesis that

θ = θ_{0}

. Effectively

W (x - θ_{0})

is a monotonically decreasing step function of

θ_{0}

with

\begin{matrix} mean ​ (W) = μ = \frac{n (n + 1)}{4}, \\ var (W) = σ^{2} = \frac{n (n + 1) (2 n + 1)}{24} . \end{matrix}

The estimate

\hat{θ}

is the solution to the equation

W (x - \hat{θ}) = μ

; two methods are available for solving this equation. These methods avoid the computation of all the ordered averages

a_{k}

; this is because for large

n

both the storage requirements and the computation time would be excessive.

The first is an exact method based on a set partitioning procedure on the set of all ordered averages

(x_{i} + x_{j}) / 2

for

i \leq j

. This is based on the algorithm proposed by Monahan (1984).

The second is an iterative algorithm, based on the Illinois method which is a modification of the regula falsi method, see McKean and Ryan (1977). This algorithm has proved suitable for the function

W (x - θ_{0})

which is asymptotically linear as a function of

θ_{0}

The confidence interval limits are also based on the inversion of the Wilcoxon test statistic.

Given a desired percentage for the confidence interval,

1 - α

, expressed as a proportion between

0

and

1

, initial estimates for the lower and upper confidence limits of the Wilcoxon statistic are found from

W_{l} = μ - 0.5 + (σ Φ^{- 1} (α / 2))

and

W_{u} = μ + 0.5 + (σ Φ^{- 1} (1 - α / 2)),

where

Φ^{- 1}

is the inverse cumulative Normal distribution function.

W_{l}

and

W_{u}

are rounded to the nearest integer values. These estimates are then refined using an exact method if

n \leq 80

, and a Normal approximation otherwise, to find

W_{l}

and

W_{u}

satisfying

\begin{array}{l} P (W \leq W_{l}) \leq α / 2 \\ P (W \leq W_{l} + 1) > α / 2 \end{array}

and

\begin{array}{l} P (W \geq W_{u}) \leq α / 2 \\ P (W \geq W_{u} - 1) > α / 2 . \end{array}

Let

W_{u} = m - k

; then

θ_{l} = a_{k + 1}

. This is the largest value

θ_{l}

such that

W (x - θ_{l}) = W_{u}

Let

W_{l} = k

; then

θ_{u} = a_{m - k}

. This is the smallest value

θ_{u}

such that

W (x - θ_{u}) = W_{l}

As in the case of

\hat{θ}

, these equations may be solved using either the exact or the iterative methods to find the values

θ_{l}

and

θ_{u}

Then

(θ_{l}, θ_{u})

is the confidence interval for

θ

. The confidence interval is thus defined by those values of

θ_{0}

such that the null hypothesis,

θ = θ_{0}

, is not rejected by the Wilcoxon signed-rank test at the

(100 \times α) %

level.

4 References

Lehmann E L (1975) Nonparametrics: Statistical Methods Based on Ranks Holden–Day

Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

McKean J W and Ryan T A (1977) Algorithm 516: An algorithm for obtaining confidence intervals and point estimates based on ranks in the two-sample location problem ACM Trans. Math. Software 10 183–185

Monahan J F (1984) Algorithm 616: Fast computation of the Hodges–Lehman location estimator ACM Trans. Math. Software 10 265–270

5 Arguments

1: method – Nag_RCIMethodInput

On entry: specifies the method to be used.

$method = Nag_RCI_Exact$: The exact algorithm is used.
$method = Nag_RCI_Approx$: The iterative algorithm is used.

Constraint:

method = Nag_RCI_Exact

Nag_RCI_Approx

2: n – IntegerInput

On entry:

n

, the sample size.

Constraint:

n \geq 2

3: x[n] – const doubleInput

On entry: the sample observations,

x_{i}

, for

i = 1, 2, \dots, n

4: clevel – doubleInput

On entry: the confidence interval desired.

For example, for a

95 %

confidence interval set

clevel = 0.95

Constraint:

0.0 < clevel < 1.0

5: theta – double *Output

On exit: the estimate of the location,

\hat{θ}

6: thetal – double *Output

On exit: the estimate of the lower limit of the confidence interval,

θ_{l}

7: thetau – double *Output

On exit: the estimate of the upper limit of the confidence interval,

θ_{u}

8: estcl – double *Output

On exit: an estimate of the actual percentage confidence of the interval found, as a proportion between

(0.0, 1.0)

9: wlower – double *Output

On exit: the upper value of the Wilcoxon test statistic,

W_{u}

, corresponding to the lower limit of the confidence interval.

10: wupper – double *Output

On exit: the lower value of the Wilcoxon test statistic,

W_{l}

, corresponding to the upper limit of the confidence interval.

11: fail – NagError *Input/Output

The NAG error argument (see Section 3.6 in the Essential Introduction).

6 Error Indicators and Warnings

NE_ALLOC_FAIL: Dynamic memory allocation failed.
NE_BAD_PARAM: On entry, argument $⟨value⟩$ had an illegal value.
NE_CONVERGENCE: Warning. The iterative procedure to find an estimate of the lower confidence point had not converged in $100$ iterations.
Warning. The iterative procedure to find an estimate of Theta had not converged in $100$ iterations.
Warning. The iterative procedure to find an estimate of the upper confidence point had not converged in $100$ iterations.
NE_INT: On entry, $n = ⟨value⟩$ .
Constraint: $n \geq 2$ .
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL: On entry, clevel is out of range: $clevel = ⟨value⟩$ .
NE_SAMPLE_IDEN: Not enough information to compute an interval estimate since the whole sample is identical. The common value is returned in theta, thetal and thetau.

7 Accuracy

nag_rank_ci_1var (g07eac) should produce results accurate to five significant figures in the width of the confidence interval; that is the error for any one of the three estimates should be less than

0.00001 \times (thetau - thetal)

8 Parallelism and Performance

nag_rank_ci_1var (g07eac) is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.

Please consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

The time taken increases with the sample size

n

10 Example

The following program calculates a 95% confidence interval for

θ

, a measure of symmetry of the sample of

50

observations.

NAG Library Function Documentnag_rank_ci_1var (g07eac)

+− Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG Library Function Document

nag_rank_ci_1var (g07eac)