PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_surviv_coxmodel (g12ba)
Purpose
nag_surviv_coxmodel (g12ba) returns parameter estimates and other statistics that are associated with the Cox proportional hazards model for fixed covariates.
Syntax
[
dev,
b,
se,
sc,
covar,
res,
nd,
tp,
sur,
ifail] = g12ba(
offset,
ns,
z,
isz,
t,
ic,
omega,
isi,
b,
ndmax,
tol,
maxit,
iprint, 'n',
n, 'm',
m, 'ip',
ip)
[
dev,
b,
se,
sc,
covar,
res,
nd,
tp,
sur,
ifail] = nag_surviv_coxmodel(
offset,
ns,
z,
isz,
t,
ic,
omega,
isi,
b,
ndmax,
tol,
maxit,
iprint, 'n',
n, 'm',
m, 'ip',
ip)
Description
The proportional hazard model relates the time to an event, usually death or failure, to a number of explanatory variables known as covariates. Some of the observations may be right-censored, that is the exact time to failure is not known, only that it is greater than a known time.
Let
, for
, be the failure time or censored time for the
th observation with the vector of
covariates
. It is assumed that censoring and failure mechanisms are independent. The hazard function,
, is the probability that an individual with covariates
fails at time
given that the individual survived up to time
. In the Cox proportional hazards model (see
Cox (1972))
is of the form:
where
is the base-line hazard function, an unspecified function of time,
is a vector of unknown arguments and
is a known offset.
Assuming there are ties in the failure times giving
distinct failure times,
such that
individuals fail at
, it follows that the marginal likelihood for
is well approximated (see
Kalbfleisch and Prentice (1980)) by:
where
is the sum of the covariates of individuals observed to fail at
and
is the set of individuals at risk just prior to
, that is, it is all individuals that fail or are censored at time
along with all individuals that survive beyond time
. The maximum likelihood estimates (MLEs) of
, given by
, are obtained by maximizing
(1) using a Newton–Raphson iteration technique that includes step halving and utilizes the first and second partial derivatives of
(1) which are given by equations
(2) and
(3) below:
for
, where
is the
th element in the vector
and
Similarly,
where
is the
th component of a score vector and
is the
element of the observed information matrix
whose inverse
gives the variance-covariance matrix of
.
It should be noted that if a covariate or a linear combination of covariates is monotonically increasing or decreasing with time then one or more of the 's will be infinite.
If
varies across
strata, where the number of individuals in the
th stratum is
, for
with
, then rather than maximizing
(1) to obtain
, the following marginal likelihood is maximized:
where
is the contribution to likelihood for the
observations in the
th stratum treated as a single sample in
(1). When strata are included the covariate coefficients are constant across strata but there is a different base-line hazard function
.
The base-line survivor function associated with a failure time
, is estimated as
, where
where
is the number of failures at time
. The residual for the
th observation is computed as:
where
. The deviance is defined as
(logarithm of marginal likelihood). There are two ways to test whether individual covariates are significant: the differences between the deviances of nested models can be compared with the appropriate
-distribution; or, the asymptotic normality of the parameter estimates can be used to form
tests by dividing the estimates by their standard errors or the score function for the model under the null hypothesis can be used to form
tests.
References
Cox D R (1972) Regression models in life tables (with discussion) J. Roy. Statist. Soc. Ser. B 34 187–220
Gross A J and Clark V A (1975) Survival Distributions: Reliability Applications in the Biomedical Sciences Wiley
Kalbfleisch J D and Prentice R L (1980) The Statistical Analysis of Failure Time Data Wiley
Parameters
Compulsory Input Parameters
- 1:
– string (length ≥ 1)
-
Indicates if an offset is to be used.
- An offset must be included in omega.
- No offset is included in the model.
Constraint:
or .
- 2:
– int64int32nag_int scalar
-
The number of strata. If
then the stratum for each observation must be supplied in
isi.
Constraint:
.
- 3:
– double array
-
ldz, the first dimension of the array, must satisfy the constraint
.
The
th row must contain the covariates which are associated with the
th failure time given in
t.
- 4:
– int64int32nag_int array
-
Indicates which subset of covariates is to be included in the model.
- The th covariate is included in the model.
- The th covariate is excluded from the model and not referenced.
Constraint:
and at least one and at most
elements of
isz must be nonzero where
is the number of observations excluding any with zero value of
isi.
- 5:
– double array
-
The vector of failure censoring times.
- 6:
– int64int32nag_int array
-
The status of the individual at time
given in
t.
- The th individual has failed at time .
- The th individual has been censored at time .
Constraint:
or , for .
- 7:
– double array
-
The dimension of the array
omega
must be at least
if
, and at least
otherwise
If
, the offset,
, for
. Otherwise
omega is not referenced.
- 8:
– int64int32nag_int array
-
The dimension of the array
isi
must be at least
if
, and at least
otherwise
If
, the stratum indicators which also allow data points to be excluded from the analysis.
If
,
isi is not referenced.
- The th data point is in the th stratum, where .
- The th data point is omitted from the analysis.
Constraint:
if
,
and more than
ip values of
, for
.
- 9:
– double array
Suggested value:
in many cases an initial value of zero for
may be used. For other suggestions see
Further Comments.
Initial estimates of the covariate coefficient arguments
.
must contain the initial estimate of the coefficient of the covariate in
z corresponding to the
th nonzero value of
isz.
- 10:
– int64int32nag_int scalar
-
The dimension of the array
tp and the first dimension of the array
sur.
Constraint:
.
- 11:
– double scalar
-
Indicates the accuracy required for the estimation. Convergence is assumed when the decrease in deviance is less than . This corresponds approximately to an absolute precision if the deviance is small and a relative precision if the deviance is large.
Constraint:
.
- 12:
– int64int32nag_int scalar
-
The maximum number of iterations to be used for computing the estimates. If
maxit is set to
then the standard errors, score functions, variance-covariance matrix and the survival function are computed for the input value of
in
b but
is not updated.
Constraint:
.
- 13:
– int64int32nag_int scalar
-
Indicates if the printing of information on the iterations is required.
- No printing.
- The deviance and the current estimates are printed every iprint iterations. When printing occurs the output is directed to the current advisory message unit (see nag_file_set_unit_advisory (x04ab)).
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the dimension of the arrays
t,
ic and the first dimension of the array
z. (An error is raised if these dimensions are not equal.)
, the number of data points.
Constraint:
.
- 2:
– int64int32nag_int scalar
-
Default:
the dimension of the array
isz and the second dimension of the array
z. (An error is raised if these dimensions are not equal.)
The number of covariates in array
z.
Constraint:
.
- 3:
– int64int32nag_int scalar
-
Default:
the dimension of the array
b.
The number of covariates included in the model as indicated by
isz.
Constraints:
- ;
- .
Output Parameters
- 1:
– double scalar
-
The deviance, that is (maximized log marginal likelihood).
- 2:
– double array
Suggested value:
in many cases an initial value of zero for
may be used. For other suggestions see
Further Comments.
contains the estimate
, the coefficient of the covariate stored in the
th column of
z where
is the
th nonzero value in the array
isz.
- 3:
– double array
-
is the asymptotic standard error of the estimate contained in and score function in , for .
- 4:
– double array
-
is the value of the score function, , for the estimate contained in .
- 5:
– double array
-
The variance-covariance matrix of the parameter estimates in
b stored in packed form by column, i.e., the covariance between the parameter estimates given in
and
,
, is stored in
.
- 6:
– double array
-
The residuals,
, for .
- 7:
– int64int32nag_int scalar
-
The number of distinct failure times.
- 8:
– double array
-
contains the th distinct failure time, for .
- 9:
– double array
-
The second dimension of the array
sur will be
.
If
,
contains the estimated survival function for the
th distinct failure time.
If , contains the estimated survival function for the th distinct failure time in the th stratum.
- 10:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
Cases prefixed with W are classified as warnings and
do not generate an error of type NAG:error_n. See nag_issue_warnings.
-
-
On entry, | or , |
or | , |
or | , |
or | , |
or | , |
or | , |
or | , |
or | . |
-
-
On entry, | for some , |
or | the value of ip is incompatible with isz, |
or | or . |
or | or , |
or | number of values of is greater than or equal to , the number of observations excluding any with , |
or | all observations are censored, i.e., for all , |
or | ndmax is too small. |
-
-
The matrix of second partial derivatives is singular. Try different starting values or include fewer covariates.
-
-
Overflow has been detected. Try using different starting values.
- W
-
Convergence has not been achieved in
maxit iterations. The progress toward convergence can be examined by using a nonzero value of
iprint. Any non-convergence may be due to a linear combination of covariates being monotonic with time.
Full results are returned.
- W
-
In the current iteration step halvings have been performed without decreasing the deviance from the previous iteration. Convergence is assumed.
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
The accuracy is specified by
tol.
Further Comments
nag_surviv_coxmodel (g12ba) uses mean centering which involves subtracting the means from the covariables prior to computation of any statistics. This helps to minimize the effect of outlying observations and accelerates convergence.
If the initial estimates are poor then there may be a problem with overflow in calculating
or there may be non-convergence. Reasonable estimates can often be obtained by fitting an exponential model using
nag_correg_glm_poisson (g02gc).
Example
The data are the remission times for two groups of leukemia patients (see page 242 of
Gross and Clark (1975)). A dummy variable indicates which group they come from. An initial estimate is computed using the exponential model and then the Cox proportional hazard model is fitted and parameter estimates and the survival function are printed.
Open in the MATLAB editor:
g12ba_example
function g12ba_example
fprintf('g12ba example results\n\n');
n = 42;
m = 1;
z = zeros(n,m);
ic = zeros(n,1,'int64');
ic(31:end) = 1;
z(22:end) = 1;
isz = [int64(1)];
t = [ 1; 1; 2; 2; 3; 4; 4;
5; 5; 8; 8; 8; 8; 11;
11; 12; 12; 15; 17; 22; 23;
6; 6; 6; 7; 10; 13; 16;
22; 23; 6; 9; 10; 11; 17;
19; 20; 25; 32; 32; 34; 35];
offset = 'No-offset';
ns = int64(0);
omega = [0];
isi = [int64(0)];
b(1) = [0];
ndmax = int64(n);
tol = 5e-05;
maxit = int64(20);
iprint = int64(0);
[dev, b, se, sc, covar, res, nd, tp, sur, ifail] = ...
g12ba( ...
offset, ns, z, isz, t, ic, omega, isi, b, ndmax, tol, maxit, iprint);
ns = max(ns,1);
fprintf(' Parameter Estimate Standard Error\n\n');
ivar = [1:m]';
fprintf('%6d%18.4f%18.4f\n',[ivar b se]');
fprintf('\n Deviance = %13.3e\n\n', dev);
fprintf(' Time Survivor Function\n\n');
results = [tp(1:nd) sur(1:nd,1:ns)];
fprintf('%7.0f%16.4f\n', results');
g12ba example results
Parameter Estimate Standard Error
1 -1.5092 0.4096
Deviance = 1.728e+02
Time Survivor Function
1 0.9640
2 0.9264
3 0.9065
4 0.8661
5 0.8235
6 0.7566
7 0.7344
8 0.6506
10 0.6241
11 0.5724
12 0.5135
13 0.4785
15 0.4447
16 0.4078
17 0.3727
22 0.2859
23 0.1908
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015