PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_nonpar_rank_regsn_censored (g08rb)
Purpose
nag_nonpar_rank_regsn_censored (g08rb) calculates the parameter estimates, score statistics and their variance-covariance matrices for the linear model using a likelihood based on the ranks of the observations when some of the observations may be right-censored.
Syntax
[
prvr,
irank,
zin,
eta,
vapvec,
parest,
ifail] = g08rb(
nv,
y,
x,
icen,
gamma,
nmax,
tol, 'ns',
ns, 'ip',
ip)
[
prvr,
irank,
zin,
eta,
vapvec,
parest,
ifail] = nag_nonpar_rank_regsn_censored(
nv,
y,
x,
icen,
gamma,
nmax,
tol, 'ns',
ns, 'ip',
ip)
Description
Analysis of data can be made by replacing observations by their ranks. The analysis produces inference for the regression model where the location parameters of the observations,
, for
, are related by
. Here
is an
by
matrix of explanatory variables and
is a vector of
unknown regression parameters. The observations are replaced by their ranks and an approximation, based on Taylor's series expansion, made to the rank marginal likelihood. For details of the approximation see
Pettitt (1982).
An observation is said to be right-censored if we can only observe with . We rank censored and uncensored observations as follows. Suppose we can observe , for , directly but , for and , are censored on the right. We define the rank of , for , in the usual way; equals if and only if is the th smallest amongst the . The right-censored , for , has rank if and only if lies in the interval , with , and the ordered , for .
The distribution of the
is assumed to be of the following form. Let
, the logistic distribution function, and consider the distribution function
defined by
. This distribution function can be thought of as either the distribution function of the minimum,
, of a random sample of size
from the logistic distribution, or as the
being the distribution function of a random variable having the
-distribution with
and
degrees of freedom. This family of generalized logistic distribution functions
naturally links the symmetric logistic distribution
with the skew extreme value distribution (
) and with the limiting negative exponential distribution (
). For this family explicit results are available for right-censored data. See
Pettitt (1983) for details.
Let
denote the logarithm of the rank marginal likelihood of the observations and define the
vector
by
, and let the
by
diagonal matrix
and
by
symmetric matrix
be given by
. Then various statistics can be found from the analysis.
(a) |
The score statistic . This statistic is used to test the hypothesis (see (e)). |
(b) |
The estimated variance-covariance matrix of the score statistic in (a). |
(c) |
The estimate . |
(d) |
The estimated variance-covariance matrix of the estimate . |
(e) |
The statistic , used to test . Under , has an approximate -distribution with degrees of freedom. |
(f) |
The standard errors of the estimates given in (c). |
(g) |
Approximate -statistics, i.e., for testing . For , has an approximate distribution. |
In many situations, more than one sample of observations will be available. In this case we assume the model,
where
ns is the number of samples. In an obvious manner,
and
are the vector of observations and the design matrix for the
th sample respectively. Note that the arbitrary transformation
can be assumed different for each sample since observations are ranked within the sample.
The earlier analysis can be extended to give a combined estimate of
as
, where
and
with
,
and
defined as
,
and
above but for the
th sample.
The remaining statistics are calculated as for the one sample case.
References
Kalbfleisch J D and Prentice R L (1980) The Statistical Analysis of Failure Time Data Wiley
Pettitt A N (1982) Inference for the linear model using a likelihood based on ranks J. Roy. Statist. Soc. Ser. B 44 234–243
Pettitt A N (1983) Approximate methods using ranks for regression with censored data Biometrika 70 121–132
Parameters
Compulsory Input Parameters
- 1:
– int64int32nag_int array
-
The number of observations in the
th sample, for .
Constraint:
, for .
- 2:
– double array
-
nsum, the dimension of the array, must satisfy the constraint
.
The observations in each sample. Specifically, must contain the th observation in the th sample.
- 3:
– double array
-
ldx, the first dimension of the array, must satisfy the constraint
.
The design matrices for each sample. Specifically, must contain the value of the th explanatory variable for the th observations in the th sample.
Constraint:
must not contain a column with all elements equal.
- 4:
– int64int32nag_int array
-
nsum, the dimension of the array, must satisfy the constraint
.
Defines the censoring variable for the observations in
y.
- If is uncensored.
- If is censored.
Constraint:
or , for .
- 5:
– double scalar
-
The value of the parameter defining the generalized logistic distribution. For , the limiting extreme value distribution is assumed.
Constraint:
.
- 6:
– int64int32nag_int scalar
-
The value of the largest sample size.
Constraint:
and .
- 7:
– double scalar
-
The tolerance for judging whether two observations are tied. Thus, observations and are adjudged to be tied if .
Constraint:
.
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the dimension of the array
nv.
The number of samples.
Constraint:
.
- 2:
– int64int32nag_int scalar
-
Default:
the second dimension of the array
x.
The number of parameters to be fitted.
Constraint:
.
Output Parameters
- 1:
– double array
-
The variance-covariance matrices of the score statistics and the parameter estimates, the former being stored in the upper triangle and the latter in the lower triangle. Thus for , contains an estimate of the covariance between the th and th score statistics. For , contains an estimate of the covariance between the th and th parameter estimates.
- 2:
– int64int32nag_int array
-
For the one sample case,
irank contains the ranks of the observations.
- 3:
– double array
-
For the one sample case,
zin contains the expected values of the function
of the order statistics.
- 4:
– double array
-
For the one sample case,
eta contains the expected values of the function
of the order statistics.
- 5:
– double array
-
For the one sample case,
vapvec contains the upper triangle of the variance-covariance matrix of the function
of the order statistics stored column-wise.
- 6:
– double array
-
The statistics calculated by the function.
The first
ip components of
parest contain the score statistics.
The next
ip elements contain the parameter estimates.
contains the value of the statistic.
The next
ip elements of
parest contain the standard errors of the parameter estimates.
Finally, the remaining
ip elements of
parest contain the
-statistics.
- 7:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
-
-
On entry, | , |
or | , |
or | , |
or | , |
or | , |
or | , |
or | for some , , |
or | , |
or | , |
or | , |
or | . |
-
-
On entry, | or , for some . |
-
-
On entry, all the observations are adjudged to be tied. You are advised to check the value supplied for
tol.
-
-
The matrix is either ill-conditioned or not positive definite. This error should only occur with extreme rankings of the data.
-
-
On entry, | at least one column of the matrix has all its elements equal. |
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
The computations are believed to be stable.
Further Comments
The time taken by nag_nonpar_rank_regsn_censored (g08rb) depends on the number of samples, the total number of observations and the number of parameters fitted.
In extreme cases the parameter estimates for certain models can be infinite, although this is unlikely to occur in practice. See
Pettitt (1982) for further details.
Example
This example fits a regression model to a single sample of observations using just one explanatory variable.
Open in the MATLAB editor:
g08rb_example
function g08rb_example
fprintf('g08rb example results\n\n');
y = [143; 164; 188; 188; 190; 192; 206; 209; 213; 216;
220; 227; 230; 234; 246; 265; 304; 216; 244; 142;
156; 163; 198; 205; 232; 232; 233; 233; 233; 233;
239; 240; 261; 280; 280; 296; 296; 323; 204; 344];
nv = [int64(numel(y))];
x = zeros(nv,1);
x(20:end) = 1;
icen = zeros(nv,1,'int64');
icen(18:19) = 1;
icen(39:40) = 1;
gamma = 1e-05;
nmax = int64(nv);
tol = 1e-05;
ns = size(y,2);
ip = size(x,2);
fprintf('Number of samples = %3d\n', ns);
fprintf('Number of parameters fitted = %3d\n', ip);
fprintf('Distribution power parameter = %8.1e\n', gamma);
fprintf('Tolerance for ties = %8.1e\n', tol);
[parvar, irank, zin, eta, vapvec, parest, ifail] = ...
g08rb( ...
nv, y, x, icen, gamma, nmax, tol);
fprintf('\nScore statistic\n');
fprintf('%9.3f\n', parest(1:ip));
fprintf('\nCovariance matrix of score statistic\n');
for j = 1:ip
fprintf('%9.3f', parvar(1:j,j));
fprintf('\n');
end
fprintf('\nParameter estimates\n');
fprintf('%9.3f', parest(ip+1:ip+ip));
fprintf('\n\nCovariance matrix of parameter estimates\n');
for j = 1:ip
fprintf('%9.3f', parvar(j+1,1:j));
fprintf('\n');
end
chisq = parest(2*ip+1);
fprintf('\nChi-squared statistic = %8.3f with %2d d.f.\n\n', chisq, ip);
sterr = reshape(parest(2*ip+2:end),[ip,2]);
fprintf('Standard errors of estimates and approximate z-statistics\n');
disp(sterr);
g08rb example results
Number of samples = 1
Number of parameters fitted = 1
Distribution power parameter = 1.0e-05
Tolerance for ties = 1.0e-05
Score statistic
4.584
Covariance matrix of score statistic
7.653
Parameter estimates
0.599
Covariance matrix of parameter estimates
0.131
Chi-squared statistic = 2.746 with 1 d.f.
Standard errors of estimates and approximate z-statistics
0.3615 1.6571
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015