g07eb:: Univariate Estimation (NAG Toolbox)

nag_univar_robust_2var_ci (g07eb) finds a point estimate,

\hat{θ}

, of the difference in location

θ

together with an associated confidence interval. The estimates are based on the ordered differences

y_{j} - x_{i}

. The estimate

\hat{θ}

is defined by

\hat{θ} = median \{y_{j} - x_{i}, i = 1, 2, \dots, n; j = 1, 2, \dots, m\} .

Let

d_{k}

, for

k = 1, 2, \dots, n m

, denote the

n m

(ascendingly) ordered differences

y_{j} - x_{i}

, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m

. Then

if $n m$ is odd, $\hat{θ} = d_{k}$ where $k = (n m - 1) / 2$ ;
if $n m$ is even, $\hat{θ} = (d_{k} + d_{k + 1}) / 2$ where $k = n m / 2$ .

This estimator arises from inverting the two sample Mann–Whitney rank test statistic,

U (θ_{0})

, for testing the hypothesis that

θ = θ_{0}

. Thus

U (θ_{0})

is the value of the Mann–Whitney

U

statistic for the two independent samples

\{(x_{i} + θ_{0}), for ​ i = 1, 2, \dots, n\}

and

\{y_{j}, for ​ j = 1, 2, \dots, m\}

. Effectively

U (θ_{0})

is a monotonically increasing step function of

θ_{0}

with

\begin{matrix} mean ​ (U) = μ = \frac{n m}{2}, \\ var (U) = σ^{2} \frac{n m (n + m + 1)}{12} . \end{matrix}

The estimate

\hat{θ}

is the solution to the equation

U (\hat{θ}) = μ

; two methods are available for solving this equation. These methods avoid the computation of all the ordered differences

d_{k}

; this is because for large

n

and

m

both the storage requirements and the computation time would be high.

Given a desired percentage for the confidence interval,

1 - α

, expressed as a proportion between

0.0

and

1.0

initial estimates of the upper and lower confidence limits for the Mann–Whitney

U

statistic are found;

\begin{matrix} U_{l} = μ - 0.5 + (σ \times Φ^{- 1} (α / 2)) \\ U_{u} = μ + 0.5 + (σ \times Φ^{- 1} ((1 - α) / 2)) \end{matrix}

where

Φ^{- 1}

is the inverse cumulative Normal distribution function.

U_{l}

and

U_{u}

are rounded to the nearest integer values. These estimates are refined using an exact method, without taking ties into account, if

n + m \leq 40

and

\max (n, m) \leq 30

and a Normal approximation otherwise, to find

U_{l}

and

U_{u}

satisfying

\begin{array}{l} P (U \leq U_{l}) \leq α / 2 \\ P (U \leq U_{l} + 1) > α / 2 \end{array}

and

\begin{array}{l} P (U \geq U_{u}) \leq α / 2 \\ P (U \geq U_{u} - 1) > α / 2 . \end{array}

The function

U (θ_{0})

is a monotonically increasing step function. It is the number of times a score in the second sample,

y_{j}

, precedes a score in the first sample,

x_{i} + θ

, where we only count a half if a score in the second sample actually equals a score in the first.

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

Accuracy

Further Comments

Example

function g07eb_example


fprintf('g07eb example results\n\n');

x = [-0.582;  0.157; -0.523; -0.769;  2.338;  1.664; -0.981;  1.549;
      1.131; -0.460; -0.484;  1.932;  0.306; -0.602; -0.979;  0.132;
      0.256; -0.094;  1.065; -1.084; -0.969; -0.524;  0.239;  1.512;
     -0.782; -0.252; -1.163;  1.376;  1.674;  0.831;  1.478; -1.486;
     -0.808; -0.429; -2.002;  0.482; -1.584; -0.105;  0.429;  0.568;
      0.944;  2.558; -1.801;  0.242;  0.763; -0.461; -1.497; -1.353;
      0.301;  1.941];
y = [ 1.995;  0.007;  0.997;  1.089;  2.004;  0.171;  0.294;  2.448;
      0.214;  0.773;  2.960;  0.025;  0.638;  0.937; -0.568; -0.711;
      0.931;  2.601;  1.121; -0.251; -0.050;  1.341;  2.282;  0.745;
      1.633;  0.944;  2.370;  0.293;  0.895;  0.938;  0.199;  0.812;
      1.253;  0.590;  1.522; -0.685;  1.259;  0.571;  1.579;  0.568;
      0.381;  0.829;  0.277;  0.656;  2.497;  1.779;  1.922; -0.174;
      2.132;  2.793;  0.102;  1.569;  1.267;  0.490;  0.077;  1.366;
      0.056;  0.605;  0.628;  1.650;  0.104;  2.194;  2.869; -0.171;
     -0.598;  2.134;  0.917;  0.630;  0.209;  1.328;  0.368;  0.756;
      2.645;  1.161;  0.347;  0.920;  1.256; -0.052;  1.474;  0.510;
      1.386;  3.550;  1.392; -0.358;  1.938;  1.727; -0.372;  0.911;
      0.499;  0.066;  1.467;  1.898;  1.145;  0.501;  2.230;  0.212;
      0.536;  1.690;  1.086;  0.494];

method = 'Approx';
clevel = 0.95;

[theta, thetal, thetau, estcl, ulower, uupper, ifail] = ...
  g07eb(method, x, y, clevel);

fprintf(' Location estimator     Confidence Interval\n\n');
fprintf('%13.4f%12s%7.4f,%7.4f )\n\n', theta, '(', thetal, thetau);
fprintf(' Corresponding Mann-Whitney statistics\n\n');
fprintf('  Lower : %8.2f\n', ulower);
fprintf('  Upper : %8.2f\n', uupper);

On entry,	$method \neq'E'$ or $'A'$ ,
or	$n < 1$ ,
or	$m < 1$ ,
or	$clevel \leq 0.0$ ,
or	$clevel \geq 1.0$ .

NAG Toolbox: nag_univar_robust_2var_ci (g07eb)

▸▿ Contents

Purpose

Syntax

Description