naginterfaces.library.nonpar.rank_regsn_censored¶

naginterfaces.library.nonpar.rank_regsn_censored(nv, y, x, icen, gamma, nmax, tol)[source]¶

rank_regsn_censored calculates the parameter estimates, score statistics and their variance-covariance matrices for the linear model using a likelihood based on the ranks of the observations when some of the observations may be right-censored.

For full information please refer to the NAG Library document for g08rb

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g08/g08rbf.html

Parameters

nvint, array-like, shape $(ns)$

The number of observations in the $i$ th sample, for $i = 1, 2, \dots, ns$ .

yfloat, array-like, shape $(s u m (n v))$

The observations in each sample. Specifically, $y [\sum_{k = 1}^{i - 1} n v [k - 1] + j - 1]$ must contain the $j$ th observation in the $i$ th sample.

xfloat, array-like, shape $(nsum, ip)$

The design matrices for each sample. Specifically, $x [\sum_{k = 1}^{i - 1} n v [k - 1] + j - 1, l - 1]$ must contain the value of the $l$ th explanatory variable for the $j$ th observations in the $i$ th sample.

icenint, array-like, shape $(s u m (n v))$

Defines the censoring variable for the observations in $y$ .

$i c e n [i - 1] = 0$

If $y [i - 1]$ is uncensored.

$i c e n [i - 1] = 1$

If $y [i - 1]$ is censored.

gammafloat

The value of the parameter defining the generalized logistic distribution. For $g a m m a \leq 0.0001$ , the limiting extreme value distribution is assumed.

nmaxint

The value of the largest sample size.

tolfloat

The tolerance for judging whether two observations are tied. Thus, observations $Y_{i}$ and $Y_{j}$ are adjudged to be tied if $∣ ∣ Y_{i} - Y_{j} ∣ ∣ < t o l$ .

Returns

prvrfloat, ndarray, shape $(ip + 1, ip)$

The variance-covariance matrices of the score statistics and the parameter estimates, the former being stored in the upper triangle and the latter in the lower triangle. Thus for $1 \leq i \leq j \leq ip$ , $p r v r [i - 1, j - 1]$ contains an estimate of the covariance between the $i$ th and $j$ th score statistics. For $1 \leq j \leq i \leq ip - 1$ , $p r v r [i, j - 1]$ contains an estimate of the covariance between the $i$ th and $j$ th parameter estimates.

irankint, ndarray, shape $(n m a x)$

For the one sample case, $i r a n k$ contains the ranks of the observations.

zinfloat, ndarray, shape $(n m a x)$

For the one sample case, $z i n$ contains the expected values of the function $g (.)$ of the order statistics.

etafloat, ndarray, shape $(n m a x)$

For the one sample case, $e t a$ contains the expected values of the function $g' (.)$ of the order statistics.

vapvecfloat, ndarray, shape $(n m a x \times (n m a x + 1) / 2)$

For the one sample case, $v a p v e c$ contains the upper triangle of the variance-covariance matrix of the function $g (.)$ of the order statistics stored column-wise.

parestfloat, ndarray, shape $(4 \times ip + 1)$

The statistics calculated by the function.

The first $ip$ components of $p a r e s t$ contain the score statistics.

The next $ip$ elements contain the parameter estimates.

$p a r e s t [2 \times ip]$ contains the value of the $χ^{2}$ statistic.

The next $ip$ elements of $p a r e s t$ contain the standard errors of the parameter estimates.

Finally, the remaining $ip$ elements of $p a r e s t$ contain the $z$ -statistics.

Raises

NagValueError

(errno $1$ )

On entry, $g a m m a = ⟨ v a l u e ⟩$ .

Constraint: $g a m m a \geq 0.0$ .

(errno $1$ )

On entry, $n m a x = ⟨ v a l u e ⟩$ and $ip = ⟨ v a l u e ⟩$ .

Constraint: $n m a x > ip$ .

(errno $1$ )

On entry, $ip = ⟨ v a l u e ⟩$ .

Constraint: $ip \geq 1$ .

(errno $1$ )

On entry, $t o l = ⟨ v a l u e ⟩$ .

Constraint: $t o l > 0.0$ .

(errno $1$ )

On entry, $ns = ⟨ v a l u e ⟩$ .

Constraint: $ns \geq 1$ .

(errno $1$ )

On entry, ${max}_{i} (n v [i]) = ⟨ v a l u e ⟩$ and $n m a x = ⟨ v a l u e ⟩$ .

Constraint: ${max}_{i} (n v [i]) = n m a x$ .

(errno $1$ )

On entry, $\sum_{i} (n v [i]) = ⟨ v a l u e ⟩$ and $nsum = ⟨ v a l u e ⟩$ .

Constraint: $\sum_{i} (n v [i]) = nsum$ .

(errno $1$ )

On entry, $⟨ v a l u e ⟩$ elements of $n v are < 1$ .

Constraint: $n v [i] \geq 1$ .

(errno $2$ )

On entry, $⟨ v a l u e ⟩$ elements of $i c e n$ are out of range.

Constraint: $i c e n [i] = 0$ or $1$ , for all $i$ .

(errno $3$ )

On entry, all the observations were adjudged to be tied.

(errno $4$ )

The matrix $X^{T} (B - A) X$ is either ill-conditioned or not positive definite.

(errno $5$ )

On entry, for $j = ⟨ v a l u e ⟩$ , $x [i, j - 1] = ⟨ v a l u e ⟩$ for all $i$ .

Constraint: $x [i, j - 1] \neq x [i + 1, j - 1]$ for at least one $i$ .

Notes

Analysis of data can be made by replacing observations by their ranks. The analysis produces inference for the regression model where the location parameters of the observations, $θ_{i}$ , for $i = 1, 2, \dots, n$ , are related by $θ = X β$ . Here $X$ is an $n \times p$ matrix of explanatory variables and $β$ is a vector of $p$ unknown regression parameters. The observations are replaced by their ranks and an approximation, based on Taylor’s series expansion, made to the rank marginal likelihood. For details of the approximation see Pettitt (1982).

An observation is said to be right-censored if we can only observe $Y_{j}^{*}$ with $Y_{j}^{*} \leq Y_{j}$ . We rank censored and uncensored observations as follows. Suppose we can observe $Y_{j}$ , for $j = 1, 2, \dots, n$ , directly but $Y_{j}^{*}$ , for $j = n + 1, \dots, q$ and $n \leq q$ , are censored on the right. We define the rank $r_{j}$ of $Y_{j}$ , for $j = 1, 2, \dots, n$ , in the usual way; $r_{j}$ equals $i$ if and only if $Y_{j}$ is the $i$ th smallest amongst the $Y_{1}, Y_{2}, \dots, Y_{n}$ . The right-censored $Y_{j}^{*}$ , for $j = n + 1, n + 2, \dots, q$ , has rank $r_{j}$ if and only if $Y_{j}^{*}$ lies in the interval $[Y_{(r_{j})}, Y_{(r_{j} + 1)}]$ , with $Y_{0} = - \infty$ , $Y_{(n + 1)} = + \infty$ and $Y_{(1)} < \dots < Y_{(n)}$ the ordered $Y_{j}$ , for $j = 1, 2, \dots, n$ .

The distribution of the $Y$ is assumed to be of the following form. Let $F_{L} (y) = e^{y} / (1 + e^{y})$ , the logistic distribution function, and consider the distribution function $F_{γ} (y)$ defined by $1 - F_{γ} = {[1 - F_{L} (y)]}^{1 / γ}$ . This distribution function can be thought of as either the distribution function of the minimum, $X_{1, γ}$ , of a random sample of size $γ^{- 1}$ from the logistic distribution, or as the $F_{γ} (y - log (γ))$ being the distribution function of a random variable having the $F$ -distribution with $2$ and $2 γ^{- 1}$ degrees of freedom. This family of generalized logistic distribution functions $[F_{γ} (.); 0 \leq γ < \infty]$ naturally links the symmetric logistic distribution $(γ = 1)$ with the skew extreme value distribution ( $l i m (γ) \to 0$ ) and with the limiting negative exponential distribution ( $l i m (γ) \to \infty$ ). For this family explicit results are available for right-censored data. See Pettitt (1983) for details.

Let $l_{R}$ denote the logarithm of the rank marginal likelihood of the observations and define the $q \times 1$ vector $a$ by $a = l_{R}^{'} (θ = 0)$ , and let the $q \times q$ diagonal matrix $B$ and $q \times q$ symmetric matrix $A$ be given by $B - A = - l_{R}^{''} (θ = 0)$ . Then various statistics can be found from the analysis.

The score statistic $X^{T} a$ . This statistic is used to test the hypothesis $H_{0} : β = 0$ (see (e)).
The estimated variance-covariance matrix of the score statistic in (a).
The estimate ${^β}_{R} = M X^{T} a$ .
The estimated variance-covariance matrix $M = {(X^{T} (B - A) X)}^{- 1}$ of the estimate ${^β}_{R}$ .
The $χ^{2}$ statistic $Q = {^β}_{R} M^{- 1} {^β}_{r} = a^{T} X {(X^{T} (B - A) X)}^{- 1} X^{T} a$ , used to test $H_{0} : β = 0$ . Under $H_{0}$ , $Q$ has an approximate $χ^{2}$ -distribution with $p$ degrees of freedom.
The standard errors $M_{i i}^{1 / 2}$ of the estimates given in (c).
Approximate $z$ -statistics, i.e., $Z_{i} = {^β}_{R_{i}} / s e ({^β}_{R_{i}})$ for testing $H_{0} : β_{i} = 0$ . For $i = 1, 2, \dots, n$ , $Z_{i}$ has an approximate $N (0, 1)$ distribution.

In many situations, more than one sample of observations will be available. In this case we assume the model,

h_{k} (Y_{k}) = X_{k}^{T} β + e_{k}, k = 1, 2, \dots, ns,

where $ns$ is the number of samples. In an obvious manner, $Y_{k}$ and $X_{k}$ are the vector of observations and the design matrix for the $k$ th sample respectively. Note that the arbitrary transformation $h_{k}$ can be assumed different for each sample since observations are ranked within the sample.

The earlier analysis can be extended to give a combined estimate of $β$ as $^β = D d$ , where

D^{- 1} = ns \sum k = 1 X^{T} (B_{k} - A_{k}) X_{k}

and

d = ns \sum k = 1 X_{k}^{T} a_{k},

with $a_{k}$ , $B_{k}$ and $A_{k}$ defined as $a$ , $B$ and $A$ above but for the $k$ th sample.

The remaining statistics are calculated as for the one sample case.

References

Kalbfleisch, J D and Prentice, R L, 1980, The Statistical Analysis of Failure Time Data, Wiley

Pettitt, A N, 1982, Inference for the linear model using a likelihood based on ranks, J. Roy. Statist. Soc. Ser. B (44), 234–243

Pettitt, A N, 1983, Approximate methods using ranks for regression with censored data, Biometrika (70), 121–132

NAG and Python

Return to Front

naginterfaces.library.nonpar.rank_regsn_censored¶

naginterfaces.library.nonpar.rank_​regsn_​censored¶

naginterfaces.library.nonpar.rank_regsn_censored¶