naginterfaces.library.surviv.coxmodel¶

naginterfaces.library.surviv.coxmodel(ns, z, isz, t, ic, isi, b, ndmax, omega=None, tol=1.1102230246251565e-14, maxit=1000, iprint=0, io_manager=None)[source]¶

coxmodel returns parameter estimates and other statistics that are associated with the Cox proportional hazards model for fixed covariates.

For full information please refer to the NAG Library document for g12ba

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g12/g12baf.html

Parameters

nsint

The number of strata. If $n s > 0$ then the stratum for each observation must be supplied in $i s i$ .

zfloat, array-like, shape $(n, m)$

The $i$ th row must contain the covariates which are associated with the $i$ th failure time given in $t$ .

iszint, array-like, shape $(m)$

Indicates which subset of covariates is to be included in the model.

$i s z [j - 1] \geq 1$

The $j$ th covariate is included in the model.

$i s z [j - 1] = 0$

The $j$ th covariate is excluded from the model and not referenced.

tfloat, array-like, shape $(n)$

The vector of $n$ failure censoring times.

icint, array-like, shape $(n)$

The status of the individual at time $t$ given in $t$ .

$i c [i - 1] = 0$

The $i$ th individual has failed at time $t [i - 1]$ .

$i c [i - 1] = 1$

The $i$ th individual has been censored at time $t [i - 1]$ .

isiint, array-like, shape $(:)$

Note: the required length for this argument is determined as follows: if $n s > 0$ : $n$ ; otherwise: $1$ .

If $n s > 0$ , the stratum indicators which also allow data points to be excluded from the analysis.

If $n s = 0$ , $i s i$ is not referenced.

$i s i [i - 1] = k$

The $i$ th data point is in the $k$ th stratum, where $k = 1, 2, \dots, n s$ .

$i s i [i - 1] = 0$

The $i$ th data point is omitted from the analysis.

bfloat, array-like, shape $(ip)$

Initial estimates of the covariate coefficient parameters $β$ . $b [j - 1]$ must contain the initial estimate of the coefficient of the covariate in $z$ corresponding to the $j$ th nonzero value of $i s z$ .

Suggested value: in many cases an initial value of zero for $b [j - 1]$ may be used. For other suggestions see Further Comments.

ndmaxint

The dimension of the array $t p$ .

The first dimension of the array $s u r$ .

omegaNone or float, array-like, shape $(n)$ , optional

If $o m e g a is not N o n e$ , the offset, $ω_{i}$ , for $i = 1, 2, \dots, n$ . Otherwise $o m e g a$ is not referenced.

tolfloat, optional

Indicates the accuracy required for the estimation. Convergence is assumed when the decrease in deviance is less than $t o l \times (1.0 + C u r r e n t D e v i a n c e)$ . This corresponds approximately to an absolute precision if the deviance is small and a relative precision if the deviance is large.

maxitint, optional

The maximum number of iterations to be used for computing the estimates. If $m a x i t$ is set to $0$ then the standard errors, score functions, variance-covariance matrix and the survival function are computed for the input value of $β$ in $b$ but $β$ is not updated.

iprintint, optional

Indicates if the printing of information on the iterations is required.

$i p r i n t \leq 0$

No printing.

$i p r i n t \geq 1$

The deviance and the current estimates are printed every $i p r i n t$ iterations. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see FileObjManager).

io_managerFileObjManager, optional

Manager for I/O in this routine.

Returns

devfloat

The deviance, that is $- 2 \times$ (maximized log marginal likelihood).

bfloat, ndarray, shape $(ip)$

$b [j - 1]$ contains the estimate ${^β}_{i}$ , the coefficient of the covariate stored in the $i$ th column of $z$ where $i$ is the $j$ th nonzero value in the array $i s z$ .

sefloat, ndarray, shape $(ip)$

$s e [j - 1]$ is the asymptotic standard error of the estimate contained in $b [j - 1]$ and score function in $s c [j - 1]$ , for $j = 1, 2, \dots, ip$ .

scfloat, ndarray, shape $(ip)$

$s c [j - 1]$ is the value of the score function, $U_{j} (β)$ , for the estimate contained in $b [j - 1]$ .

covfloat, ndarray, shape $(ip \times (ip + 1) / 2)$

The variance-covariance matrix of the parameter estimates in $b$ stored in packed form by column, i.e., the covariance between the parameter estimates given in $b [i - 1]$ and $b [j - 1]$ , $j \geq i$ , is stored in $c o v [j (j - 1) / 2 + i - 1]$ .

resfloat, ndarray, shape $(n)$

The residuals, $r (t_{l})$ , for $l = 1, 2, \dots, n$ .

ndint

The number of distinct failure times.

tpfloat, ndarray, shape $(n d m a x)$

$t p [i - 1]$ contains the $i$ th distinct failure time, for $i = 1, 2, \dots, n d$ .

surfloat, ndarray, shape $(n d m a x, max (n s, 1))$

If $n s = 0$ , $s u r [i - 1, 0]$ contains the estimated survival function for the $i$ th distinct failure time.

If $n s > 0$ , $s u r [i - 1, k - 1]$ contains the estimated survival function for the $i$ th distinct failure time in the $k$ th stratum.

Raises

NagValueError

(errno $1$ )

On entry, $ip = ⟨ v a l u e ⟩$ .

Constraint: $ip \geq 1$ .

(errno $1$ )

On entry, $t o l = ⟨ v a l u e ⟩$ .

Constraint: $t o l \geq 10 \times machine precision$ .

(errno $1$ )

On entry, $m a x i t = ⟨ v a l u e ⟩$ .

Constraint: $m a x i t \geq 0$ .

(errno $1$ )

On entry, $n s = ⟨ v a l u e ⟩$ .

Constraint: $n s \geq 0$ .

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 2$ .

(errno $1$ )

On entry, $m = ⟨ v a l u e ⟩$ .

Constraint: $m \geq 1$ .

(errno $2$ )

On entry, there are not $ip$ values of $i s z > 0$ .

(errno $2$ )

On entry, $i = ⟨ v a l u e ⟩$ and $i s z [i - 1] = ⟨ v a l u e ⟩$ .

Constraint: $i s z [i - 1] \geq 0$ .

(errno $2$ )

On entry too few observations included in model.

(errno $2$ )

On entry, $i = ⟨ v a l u e ⟩$ , $i s i [i - 1] = ⟨ v a l u e ⟩$ and $n s = ⟨ v a l u e ⟩$ .

Constraint: $0 \leq i s i [i - 1] \leq n s$ .

(errno $2$ )

On entry, $i = ⟨ v a l u e ⟩$ and $i c [i - 1] = ⟨ v a l u e ⟩$ .

Constraint: $i c [i - 1] = 0$ or $1$ .

(errno $2$ )

On entry, $n d m a x = ⟨ v a l u e ⟩$ and minimum value for $n d m a x = ⟨ v a l u e ⟩$ .

Constraint: $n d m a x \geq number$ of distinct failure times.

(errno $2$ )

All observations are censored.

(errno $3$ )

The matrix of second partial derivative is singular.

(errno $4$ )

Overflow has been detected in the calculations.

Warns

NagAlgorithmicWarning

(errno $5$ ): Convergence not achieved in $⟨ v a l u e ⟩$ iterations.
(errno $6$ ): Too many step halvings required.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

The proportional hazard model relates the time to an event, usually death or failure, to a number of explanatory variables known as covariates. Some of the observations may be right-censored, that is the exact time to failure is not known, only that it is greater than a known time.

Let $t_{i}$ , for $i = 1, 2, \dots, n$ , be the failure time or censored time for the $i$ th observation with the vector of $p$ covariates $z_{i}$ . It is assumed that censoring and failure mechanisms are independent. The hazard function, $λ (t, z)$ , is the probability that an individual with covariates $z$ fails at time $t$ given that the individual survived up to time $t$ . In the Cox proportional hazards model (see Cox (1972)) $λ (t, z)$ is of the form:

λ (t, z) = λ_{0} (t) e x p (z^{T} β + ω)

where $λ_{0}$ is the base-line hazard function, an unspecified function of time, $β$ is a vector of unknown parameters and $ω$ is a known offset.

Assuming there are ties in the failure times giving $n_{d} < n$ distinct failure times, $t_{(1)} < \dots < t_{(n_{d})}$ such that $d_{i}$ individuals fail at $t_{(i)}$ , it follows that the marginal likelihood for $β$ is well approximated (see Kalbfleisch and Prentice (1980)) by:

L = n_{d} \prod i = 1 \frac{e x p (s_{i}^{T} β + ω_{i})}{{[\sum_{l \in R (t_{(i)})} e x p (z_{l}^{T} β + ω_{l})]}^{d_{i}}}

where $s_{i}$ is the sum of the covariates of individuals observed to fail at $t_{(i)}$ and $R (t_{(i)})$ is the set of individuals at risk just prior to $t_{(i)}$ , that is, it is all individuals that fail or are censored at time $t_{(i)}$ along with all individuals that survive beyond time $t_{(i)}$ . The maximum likelihood estimates (MLEs) of $β$ , given by $^β$ , are obtained by maximizing (1) using a Newton–Raphson iteration technique that includes step halving and utilizes the first and second partial derivatives of (1) which are given by equations (2) and (3) below:

U_{j} (β) = \frac{\partial l n (L)}{\partial β_{j}} = n_{d} \sum i = 1 [s_{j i} - d_{i} α_{j i} (β)] = 0

for $j = 1, 2, \dots, p$ , where $s_{j i}$ is the $j$ th element in the vector $s_{i}$ and

α_{j i} (β) = \frac{\sum_{l \in R (t_{(i)})} z_{j l} e x p (z_{l}^{T} β + ω_{l})}{\sum_{l \in R (t_{(i)})} e x p (z_{l}^{T} β + ω_{l})} .

Similarly,

I_{h j} (β) = - \frac{\partial^{2} l n (L)}{\partial β_{h} \partial β_{j}} = n_{d} \sum i = 1 d_{i} γ_{h j i}

where

γ_{h j i} = \frac{\sum_{l \in R (t_{(i)})} z_{h l} z_{j l} e x p (z_{l}^{T} β + ω_{l})}{\sum_{l \in R (t_{(i)})} e x p (z_{l}^{T} β + ω_{l})} - α_{h i} (β) α_{j i} (β), h, j = 1, \dots, p .

$U_{j} (β)$ is the $j$ th component of a score vector and $I_{h j} (β)$ is the $(h, j)$ element of the observed information matrix $I (β)$ whose inverse $I {(β)}^{- 1} = {[I_{h j} (β)]}^{- 1}$ gives the variance-covariance matrix of $β$ .

It should be noted that if a covariate or a linear combination of covariates is monotonically increasing or decreasing with time then one or more of the $β_{j}$ ’s will be infinite.

If $λ_{0} (t)$ varies across $ν$ strata, where the number of individuals in the $k$ th stratum is $n_{k}$ , for $k = 1, 2, \dots, ν$ with $n = \sum_{k = 1}^{ν} n_{k}$ , then rather than maximizing (1) to obtain $^β$ , the following marginal likelihood is maximized:

L = ν \prod k = 1 L_{k},

where $L_{k}$ is the contribution to likelihood for the $n_{k}$ observations in the $k$ th stratum treated as a single sample in (1). When strata are included the covariate coefficients are constant across strata but there is a different base-line hazard function $λ_{0}$ .

The base-line survivor function associated with a failure time $t_{(i)}$ , is estimated as $e x p (-^H (t_{(i)}))$ , where

^H (t_{(i)}) = \sum t_{(j)} \leq t_{(i)} ⎛ ⎜ ⎜ ⎜ ⎝ \frac{d_{i}}{\sum_{l \in R (t_{(j)})} e x p (z_{l}^{T}^β + ω_{l})} ⎞ ⎟ ⎟ ⎟ ⎠,

where $d_{i}$ is the number of failures at time $t_{(i)}$ . The residual for the $l$ th observation is computed as:

r (t_{l}) =^H (t_{l}) e x p (z_{l}^{T}^β + ω_{l})

where $^H (t_{l}) =^H (t_{(i)}), t_{(i)} \leq t_{l} < t_{(i + 1)}$ . The deviance is defined as $- 2 \times$ (logarithm of marginal likelihood). There are two ways to test whether individual covariates are significant: the differences between the deviances of nested models can be compared with the appropriate $χ^{2}$ -distribution; or, the asymptotic normality of the parameter estimates can be used to form $z$ tests by dividing the estimates by their standard errors or the score function for the model under the null hypothesis can be used to form $z$ tests.

References

Cox, D R, 1972, Regression models in life tables (with discussion), J. Roy. Statist. Soc. Ser. B (34), 187–220

Gross, A J and Clark, V A, 1975, Survival Distributions: Reliability Applications in the Biomedical Sciences, Wiley

Kalbfleisch, J D and Prentice, R L, 1980, The Statistical Analysis of Failure Time Data, Wiley

NAG and Python

Return to Front

naginterfaces.library.surviv.coxmodel¶