g08rb:: Nonparametric Statistics (NAG Toolbox)

Description

Analysis of data can be made by replacing observations by their ranks. The analysis produces inference for the regression model where the location parameters of the observations,

θ_{i}

, for

i = 1, 2, \dots, n

, are related by

θ = X β

. Here

X

is an

n

p

matrix of explanatory variables and

β

is a vector of

p

unknown regression parameters. The observations are replaced by their ranks and an approximation, based on Taylor's series expansion, made to the rank marginal likelihood. For details of the approximation see Pettitt (1982).

An observation is said to be right-censored if we can only observe

Y_{j}^{*}

with

Y_{j}^{*} \leq Y_{j}

. We rank censored and uncensored observations as follows. Suppose we can observe

Y_{j}

, for

j = 1, 2, \dots, n

, directly but

Y_{j}^{*}

, for

j = n + 1, \dots, q

and

n \leq q

, are censored on the right. We define the rank

r_{j}

Y_{j}

, for

j = 1, 2, \dots, n

, in the usual way;

r_{j}

equals

i

if and only if

Y_{j}

is the

i

th smallest amongst the

Y_{1}, Y_{2}, \dots, Y_{n}

. The right-censored

Y_{j}^{*}

, for

j = n + 1, n + 2, \dots, q

, has rank

r_{j}

if and only if

Y_{j}^{*}

lies in the interval

[Y_{(r_{j})}, Y_{(r_{j} + 1)}]

, with

Y_{0} = - \infty

Y_{(n + 1)} = + \infty

and

Y_{(1)} < \dots < Y_{(n)}

the ordered

Y_{j}

, for

j = 1, 2, \dots, n

The distribution of the

Y

is assumed to be of the following form. Let

F_{L} (y) = e^{y} / (1 + e^{y})

, the logistic distribution function, and consider the distribution function

F_{γ} (y)

defined by

1 - F_{γ} = {[1 - F_{L} (y)]}^{1 / γ}

. This distribution function can be thought of as either the distribution function of the minimum,

X_{1, γ}

, of a random sample of size

γ^{- 1}

from the logistic distribution, or as the

F_{γ} (y - \log γ)

being the distribution function of a random variable having the

F

-distribution with

2

and

2 γ^{- 1}

degrees of freedom. This family of generalized logistic distribution functions

[F_{γ} (.); 0 \leq γ < \infty]

naturally links the symmetric logistic distribution

(γ = 1)

with the skew extreme value distribution (

\lim γ \to 0

) and with the limiting negative exponential distribution (

\lim γ \to \infty

). For this family explicit results are available for right-censored data. See Pettitt (1983) for details.

Let

l_{R}

denote the logarithm of the rank marginal likelihood of the observations and define the

q \times 1

vector

a

a = l_{R}^{'} (θ = 0)

, and let the

q

q

diagonal matrix

B

and

q

q

symmetric matrix

A

be given by

B - A = - l_{R}^{''} (θ = 0)

. Then various statistics can be found from the analysis.

(a)	The score statistic $X^{T} a$ . This statistic is used to test the hypothesis $H_{0} : β = 0$ (see (e)).
(b)	The estimated variance-covariance matrix of the score statistic in (a).
(c)	The estimate ${\hat{β}}_{R} = M X^{T} a$ .
(d)	The estimated variance-covariance matrix $M = {(X^{T} (B - A) X)}^{- 1}$ of the estimate ${\hat{β}}_{R}$ .
(e)	The $χ^{2}$ statistic $Q = {\hat{β}}_{R} M^{- 1} {\hat{β}}_{r} = a^{T} X {(X^{T} (B - A) X)}^{- 1} X^{T} a$ , used to test $H_{0} : β = 0$ . Under $H_{0}$ , $Q$ has an approximate $χ^{2}$ -distribution with $p$ degrees of freedom.
(f)	The standard errors $M_{i i}^{1 / 2}$ of the estimates given in (c).
(g)	Approximate $z$ -statistics, i.e., $Z_{i} = {\hat{β}}_{R_{i}} / s e ({\hat{β}}_{R_{i}})$ for testing $H_{0} : β_{i} = 0$ . For $i = 1, 2, \dots, n$ , $Z_{i}$ has an approximate $N (0, 1)$ distribution.

In many situations, more than one sample of observations will be available. In this case we assume the model,

h_{k} (Y_{k}) = X_{k}^{T} β + e_{k}, k = 1, 2, \dots, ns,

where ns is the number of samples. In an obvious manner,

Y_{k}

and

X_{k}

are the vector of observations and the design matrix for the

k

th sample respectively. Note that the arbitrary transformation

h_{k}

can be assumed different for each sample since observations are ranked within the sample.

The earlier analysis can be extended to give a combined estimate of

β

\hat{β} = D d

, where

D^{- 1} = \sum_{k = 1}^{ns} X^{T} (B_{k} - A_{k}) X_{k}

and

d = \sum_{k = 1}^{ns} X_{k}^{T} a_{k},

with

a_{k}

B_{k}

and

A_{k}

defined as

a

B

and

A

above but for the

k

th sample.

The remaining statistics are calculated as for the one sample case.

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Accuracy

Further Comments

Example

function g08rb_example


fprintf('g08rb example results\n\n');

y = [143; 164; 188; 188; 190; 192; 206; 209; 213; 216;
     220; 227; 230; 234; 246; 265; 304; 216; 244; 142;
     156; 163; 198; 205; 232; 232; 233; 233; 233; 233;
     239; 240; 261; 280; 280; 296; 296; 323; 204; 344];
nv = [int64(numel(y))];
x  = zeros(nv,1);
x(20:end) = 1;
icen = zeros(nv,1,'int64');
icen(18:19) = 1;
icen(39:40) = 1;

gamma = 1e-05;
nmax  = int64(nv);
tol   = 1e-05;

ns    = size(y,2);
ip    = size(x,2);
fprintf('Number of samples            = %3d\n', ns);
fprintf('Number of parameters fitted  = %3d\n', ip);
fprintf('Distribution power parameter = %8.1e\n', gamma);
fprintf('Tolerance for ties           = %8.1e\n', tol);

[parvar, irank, zin, eta, vapvec, parest, ifail] = ...
  g08rb( ...
         nv, y, x, icen, gamma, nmax, tol);

% Display results
fprintf('\nScore statistic\n');
fprintf('%9.3f\n', parest(1:ip));
fprintf('\nCovariance matrix of score statistic\n');
for j = 1:ip
  fprintf('%9.3f', parvar(1:j,j));
  fprintf('\n');
end
fprintf('\nParameter estimates\n');
fprintf('%9.3f', parest(ip+1:ip+ip));
fprintf('\n\nCovariance matrix of parameter estimates\n');
for j = 1:ip
  fprintf('%9.3f', parvar(j+1,1:j));
  fprintf('\n');
end

chisq = parest(2*ip+1);
fprintf('\nChi-squared statistic = %8.3f with %2d d.f.\n\n', chisq, ip);

sterr = reshape(parest(2*ip+2:end),[ip,2]);
fprintf('Standard errors of estimates and approximate z-statistics\n');
disp(sterr);

g08rb example results

Number of samples            =   1
Number of parameters fitted  =   1
Distribution power parameter =  1.0e-05
Tolerance for ties           =  1.0e-05

Score statistic
    4.584

Covariance matrix of score statistic
    7.653

Parameter estimates
    0.599

Covariance matrix of parameter estimates
    0.131

Chi-squared statistic =    2.746 with  1 d.f.

Standard errors of estimates and approximate z-statistics
    0.3615    1.6571

On entry,	$ns < 1$ ,
or	$tol \leq 0.0$ ,
or	$nmax \leq ip$ ,
or	$ldprvr < ip + 1$ ,
or	$ldx < nsum$ ,
or	$nmax \neq \max_{1 \leq i \leq ns} (nv (i))$ ,
or	$nv (i) \leq 0$ for some $i$ , $i = 1, 2, \dots, ns$ ,
or	$nsum \neq \sum_{i = 1}^{ns} nv (i)$ ,
or	$ip < 1$ ,
or	$gamma < 0.0$ ,
or	$lwork < nmax \times (ip + 1)$ .

NAG Toolbox: nag_nonpar_rank_regsn_censored (g08rb)

▸▿ Contents

Purpose

Syntax

Description

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Accuracy

Further Comments

Example