g08rbc calculates the parameter estimates, score statistics and their variance-covariance matrices for the linear model using a likelihood based on the ranks of the observations when some of the observations may be right-censored.

2 Specification

#include <nag.h>

void

g08rbc (Nag_OrderType order, Integer ns, const Integer nv[], const double y[], Integer p, const double x[], Integer pdx, const Integer icen[], double gamma, Integer nmax, double tol, double prvr[], Integer pdprvr, Integer irank[], double zin[], double eta[], double vapvec[], double parest[], NagError *fail)

The function may be called by the names: g08rbc, nag_nonpar_rank_regsn_censored or nag_rank_regsn_censored.

3 Description

Analysis of data can be made by replacing observations by their ranks. The analysis produces inference for the regression model where the location parameters of the observations,

θ_{i}

, for

i = 1, 2, \dots, n

, are related by

θ = X β

. Here

X

is an

n \times p

matrix of explanatory variables and

β

is a vector of

p

unknown regression parameters. The observations are replaced by their ranks and an approximation, based on Taylor's series expansion, made to the rank marginal likelihood. For details of the approximation see Pettitt (1982).

An observation is said to be right-censored if we can only observe

Y_{j}^{*}

with

Y_{j}^{*} \leq Y_{j}

. We rank censored and uncensored observations as follows. Suppose we can observe

Y_{j}

, for

j = 1, 2, \dots, n

, directly but

Y_{j}^{*}

, for

j = n + 1, \dots, q

and

n \leq q

, are censored on the right. We define the rank

r_{j}

Y_{j}

, for

j = 1, 2, \dots, n

, in the usual way;

r_{j}

equals

i

if and only if

Y_{j}

is the

i

th smallest amongst the

Y_{1}, Y_{2}, \dots, Y_{n}

. The right-censored

Y_{j}^{*}

, for

j = n + 1, n + 2, \dots, q

, has rank

r_{j}

if and only if

Y_{j}^{*}

lies in the interval

[Y_{(r_{j})}, Y_{(r_{j} + 1)}]

, with

Y_{0} = - \infty

Y_{(n + 1)} = + \infty

and

Y_{(1)} < \dots < Y_{(n)}

the ordered

Y_{j}

, for

j = 1, 2, \dots, n

The distribution of the

Y

is assumed to be of the following form. Let

F_{L} (y) = e^{y} / (1 + e^{y})

, the logistic distribution function, and consider the distribution function

F_{γ} (y)

defined by

1 - F_{γ} = {[1 - F_{L} (y)]}^{1 / γ}

. This distribution function can be thought of as either the distribution function of the minimum,

X_{1, γ}

, of a random sample of size

γ^{- 1}

from the logistic distribution, or as the

F_{γ} (y - \log γ)

being the distribution function of a random variable having the

F

-distribution with

2

and

2 γ^{- 1}

degrees of freedom. This family of generalized logistic distribution functions

[F_{γ} (.); 0 \leq γ < \infty]

naturally links the symmetric logistic distribution

(γ = 1)

with the skew extreme value distribution (

\lim γ \to 0

) and with the limiting negative exponential distribution (

\lim γ \to \infty

). For this family explicit results are available for right-censored data. See Pettitt (1983) for details.

Let

l_{R}

denote the logarithm of the rank marginal likelihood of the observations and define the

q \times 1

vector

a

a = l_{R}^{'} (θ = 0)

, and let the

q \times q

diagonal matrix

B

and

q \times q

symmetric matrix

A

be given by

B - A = - l_{R}^{''} (θ = 0)

. Then various statistics can be found from the analysis.

(a)The score statistic $X^{T} a$ . This statistic is used to test the hypothesis $H_{0} : β = 0$ (see (e)).
(b)The estimated variance-covariance matrix of the score statistic in (a).
(c)The estimate ${\hat{β}}_{R} = M X^{T} a$ .
(d)The estimated variance-covariance matrix $M = {(X^{T} (B - A) X)}^{−1}$ of the estimate ${\hat{β}}_{R}$ .
(e)The $χ^{2}$ statistic $Q = {\hat{β}}_{R} M^{- 1} {\hat{β}}_{r} = a^{T} X {(X^{T} (B - A) X)}^{−1} X^{T} a$ , used to test $H_{0} : β = 0$ . Under $H_{0}$ , $Q$ has an approximate $χ^{2}$ -distribution with $p$ degrees of freedom.
(f)The standard errors $M_{i i}^{1 / 2}$ of the estimates given in (c).
(g)Approximate $z$ -statistics, i.e., $Z_{i} = {\hat{β}}_{R_{i}} / s e ({\hat{β}}_{R_{i}})$ for testing $H_{0} : β_{i} = 0$ . For $i = 1, 2, \dots, n$ , $Z_{i}$ has an approximate $N (0, 1)$ distribution.

In many situations, more than one sample of observations will be available. In this case we assume the model,

h_{k} (Y_{k}) = X_{k}^{T} β + e_{k}, k = 1, 2, \dots, ns,

where ns is the number of samples. In an obvious manner,

Y_{k}

and

X_{k}

are the vector of observations and the design matrix for the

k

th sample respectively. Note that the arbitrary transformation

h_{k}

can be assumed different for each sample since observations are ranked within the sample.

The earlier analysis can be extended to give a combined estimate of

β

\hat{β} = D d

, where

D^{- 1} = \sum_{k = 1}^{ns} X^{T} (B_{k} - A_{k}) X_{k}

and

d = \sum_{k = 1}^{ns} X_{k}^{T} a_{k},

with

a_{k}

B_{k}

and

A_{k}

defined as

a

B

and

A

above but for the

k

th sample.

The remaining statistics are calculated as for the one sample case.

4 References

Kalbfleisch J D and Prentice R L (1980) The Statistical Analysis of Failure Time Data Wiley

Pettitt A N (1982) Inference for the linear model using a likelihood based on ranks J. Roy. Statist. Soc. Ser. B 44 234–243

Pettitt A N (1983) Approximate methods using ranks for regression with censored data Biometrika 70 121–132

5 Arguments

1: $order$ – Nag_OrderType Input

On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by

order = Nag_RowMajor

. See Section 3.1.3 in the Introduction to the NAG Library CL Interface for a more detailed explanation of the use of this argument.

Constraint:

order = Nag_RowMajor

Nag_ColMajor

2: $ns$ – Integer Input

On entry: the number of samples.

Constraint:

ns \geq 1

3: $nv [ns]$ – const Integer Input

On entry: the number of observations in the

i

th sample, for

i = 1, 2, \dots, ns

Constraint:

nv [i - 1] \geq 1

, for

i = 1, 2, \dots, ns

4: $y [\dim]$ – const double Input

Note: the dimension, dim, of the array y must be at least

(\sum_{i = 1}^{ns} nv [i - 1])

On entry: the observations in each sample. Specifically,

y [\sum_{k = 1}^{i - 1} nv [k - 1] + j - 1]

must contain the

j

th observation in the

i

th sample.

5: $p$ – Integer Input

On entry: the number of parameters to be fitted.

Constraint:

p \geq 1

6: $x [\dim]$ – const double Input

Note: the dimension, dim, of the array x must be at least

$\max (1, pdx \times p)$ when $order = Nag_ColMajor$ ;
$\max (1, (\sum_{i = 1}^{ns} nv [i - 1]) \times pdx)$ when $order = Nag_RowMajor$ .

where

X (i, j)

appears in this document, it refers to the array element

$x [(j - 1) \times pdx + i - 1]$ when $order = Nag_ColMajor$ ;
$x [(i - 1) \times pdx + j - 1]$ when $order = Nag_RowMajor$ .

On entry: the design matrices for each sample. Specifically,

X (\sum_{k = 1}^{i - 1} nv [k - 1] + j, l)

must contain the value of the

l

th explanatory variable for the

j

th observations in the

i

th sample.

Constraint:

x

must not contain a column with all elements equal.

7: $pdx$ – Integer Input

On entry: the stride separating row or column elements (depending on the value of order) in the array x.

Constraints:

if $order = Nag_ColMajor$ , $pdx \geq (\sum_{i = 1}^{ns} nv [i - 1])$ ;
if $order = Nag_RowMajor$ , $pdx \geq p$ .

8: $icen [\dim]$ – const Integer Input

Note: the dimension, dim, of the array icen must be at least

(\sum_{i = 1}^{ns} nv [i - 1])

On entry: defines the censoring variable for the observations in y.

$icen [i - 1] = 0$: If $y [i - 1]$ is uncensored.
$icen [i - 1] = 1$: If $y [i - 1]$ is censored.

Constraint:

icen [i - 1] = 0

1

, for

i = 1, 2, \dots, (\sum_{i = 1}^{ns} nv [i - 1])

9: $gamma$ – double Input

On entry: the value of the parameter defining the generalized logistic distribution. For

gamma \leq 0.0001

, the limiting extreme value distribution is assumed.

Constraint:

gamma \geq 0.0

10: $nmax$ – Integer Input

On entry: the value of the largest sample size.

Constraint:

nmax = \max_{1 \leq i \leq ns} (nv [i - 1])

and

nmax > p

11: $tol$ – double Input

On entry: the tolerance for judging whether two observations are tied. Thus, observations

Y_{i}

and

Y_{j}

are adjudged to be tied if

| Y_{i} - Y_{j} | < tol

Constraint:

tol > 0.0

12: $prvr [\dim]$ – double Output

Note: the dimension, dim, of the array prvr must be at least

$\max (1, pdprvr \times p)$ when $order = Nag_ColMajor$ ;
$\max (1, p + 1 \times pdprvr)$ when $order = Nag_RowMajor$ .

where

PRVR (i, j)

appears in this document, it refers to the array element

$prvr [(j - 1) \times pdprvr + i - 1]$ when $order = Nag_ColMajor$ ;
$prvr [(i - 1) \times pdprvr + j - 1]$ when $order = Nag_RowMajor$ .

On exit: the variance-covariance matrices of the score statistics and the parameter estimates, the former being stored in the upper triangle and the latter in the lower triangle. Thus for

1 \leq i \leq j \leq p

PRVR (i, j)

contains an estimate of the covariance between the

i

th and

j

th score statistics. For

1 \leq j \leq i \leq p - 1

PRVR (i + 1, j)

contains an estimate of the covariance between the

i

th and

j

th parameter estimates.

13: $pdprvr$ – Integer Input

On entry: the stride separating row or column elements (depending on the value of order) in the array prvr.

Constraints:

if $order = Nag_ColMajor$ , $pdprvr \geq p + 1$ ;
if $order = Nag_RowMajor$ , $pdprvr \geq p$ .

14: $irank [nmax]$ – Integer Output

On exit: for the one sample case, irank contains the ranks of the observations.

15: $zin [nmax]$ – double Output

On exit: for the one sample case, zin contains the expected values of the function

g (.)

of the order statistics.

16: $eta [nmax]$ – double Output

On exit: for the one sample case, eta contains the expected values of the function

g' (.)

of the order statistics.

17: $vapvec [nmax \times (nmax + 1) / 2]$ – double Output

On exit: for the one sample case, vapvec contains the upper triangle of the variance-covariance matrix of the function

g (.)

of the order statistics stored column-wise.

18: $parest [4 \times p + 1]$ – double Output

On exit: the statistics calculated by the function.

The first p components of parest contain the score statistics.

The next p elements contain the parameter estimates.

parest [2 \times p]

contains the value of the

χ^{2}

statistic.

The next p elements of parest contain the standard errors of the parameter estimates.

Finally, the remaining p elements of parest contain the

z

-statistics.

19: $fail$ – NagError * Input/Output

The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_ALLOC_FAIL: Dynamic memory allocation failed.
See Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
NE_BAD_PARAM: On entry, argument $⟨ value ⟩$ had an illegal value.
NE_INT: On entry, $ns = ⟨ value ⟩$ .
Constraint: $ns \geq 1$ .

On entry, $p = ⟨ value ⟩$ .
Constraint: $p \geq 1$ .
NE_INT_2: On entry, $nmax = ⟨ value ⟩$ and $p = ⟨ value ⟩$ .
Constraint: $nmax > p$ .

On entry, $pdprvr = ⟨ value ⟩$ and $p = ⟨ value ⟩$ .
Constraint: $pdprvr \geq p + 1$ .

On entry, $pdx = ⟨ value ⟩$ and $p = ⟨ value ⟩$ .
Constraint: $pdx \geq p$ .

On entry, $pdx = ⟨ value ⟩$ and sum $nv [i - 1] = ⟨ value ⟩$ .
Constraint: $pdx \geq$ the sum of $nv [i - 1]$ .
NE_INT_ARRAY_ELEM_CONS: On entry, $⟨ value ⟩$ elements of icen are out of range.
Constraint: $icen [i] = 0$ or $1$ , for all $i$ .

On entry, $⟨ value ⟩$ elements of $nv are < 1$ .
Constraint: $nv [i] \geq 1$ .
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
NE_MAT_ILL_DEFINED: The matrix $X^{T} (B - A) X$ is either ill-conditioned or not positive definite. This error should only occur with extreme rankings of the data.
NE_NO_LICENCE: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library CL Interface for further information.
NE_OBSERVATIONS: On entry, all the observations were adjudged to be tied. You are advised to check the value supplied for tol.
NE_REAL: On entry, $gamma = ⟨ value ⟩$ .
Constraint: $gamma \geq 0.0$ .

On entry, $tol = ⟨ value ⟩$ .
Constraint: $tol > 0.0$ .
NE_REAL_ARRAY_ELEM_CONS: On entry, for $j = ⟨ value ⟩$ , $X (i, j) = ⟨ value ⟩$ for all $i$ .
Constraint: $X (i, j) \neq X (i + 1, j)$ for at least one $i$ .
NE_SAMPLE: On entry, $\max_{i} nv [i] = ⟨ value ⟩$ and $nmax = ⟨ value ⟩$ .
Constraint: $\max_{i} nv [i] = nmax$ .

7 Accuracy

The computations are believed to be stable.

8 Parallelism and Performance

Background information to multithreading can be found in the Multithreading documentation.

g08rbc is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.

g08rbc makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

The time taken by g08rbc depends on the number of samples, the total number of observations and the number of parameters fitted.

In extreme cases the parameter estimates for certain models can be infinite, although this is unlikely to occur in practice. See Pettitt (1982) for further details.

10 Example

This example fits a regression model to a single sample of

40

observations using just one explanatory variable.

g08rb: FL CL CPP AD PY MB

NAG CL Interfaceg08rbc (rank_​regsn_​censored)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG CL Interface
g08rbc (rank_regsn_censored)