NAG C Library Function Document
nag_condl_logistic (g11cac)
1
Purpose
nag_condl_logistic (g11cac) returns parameter estimates for the conditional logistic analysis of stratified data, for example, data from casecontrol studies and survival analyses.
2
Specification
#include <nag.h> 
#include <nagg11.h> 
void 
nag_condl_logistic (Nag_OrderType order,
Integer n,
Integer m,
Integer ns,
const double z[],
Integer pdz,
const Integer isz[],
Integer p,
const Integer ic[],
const Integer isi[],
double *dev,
double b[],
double se[],
double sc[],
double cov[],
Integer nca[],
Integer nct[],
double tol,
Integer maxit,
Integer iprint,
const char *outfile,
NagError *fail) 

3
Description
In the analysis of binary data, the logistic model is commonly used. This relates the probability of one of the outcomes, say
$y=1$, to
$p$ explanatory variates or covariates by
where
$\beta $ is a vector of unknown coefficients for the covariates
$z$ and
$\alpha $ is a constant term. If the observations come from different strata or groups,
$\alpha $ would vary from strata to strata. If the observed outcomes are independent then the
$y$s follow a Bernoulli distribution, i.e., a binomial distribution with sample size one and the model can be fitted as a generalized linear model with binomial errors.
In some situations the number of observations for which
$y=1$ may not be independent. For example, in epidemiological research, casecontrol studies are widely used in which one or more observed cases are matched with one or more controls. The matching is based on fixed characteristics such as age and sex, and is designed to eliminate the effect of such characteristics in order to more accurately determine the effect of other variables. Each casecontrol group can be considered as a stratum. In this type of study the binomial model is not appropriate, except if the strata are large, and a conditional logistic model is used. This considers the probability of the cases having the observed vectors of covariates given the set of vectors of covariates in the strata. In the situation of one case per stratum, the conditional likelihood for
${n}_{\mathrm{s}}$ strata can be written as
where
${S}_{i}$ is the set of observations in the
$i$th stratum, with associated vectors of covariates
${z}_{l}$,
$l\in {S}_{i}$, and
${z}_{i}$ is the vector of covariates of the case in the
$i$th stratum. In the general case of
${c}_{i}$ cases per strata then the full conditional likelihood is
where
${s}_{i}$ is the sum of the vectors of covariates for the cases in the
$i$th stratum and
${s}_{l}$,
$l\in {C}_{i}$ refer to the sum of vectors of covariates for all distinct sets of
${c}_{i}$ observations drawn from the
$i$th stratum. The conditional likelihood can be maximized by a Newton–Raphson procedure. The covariances of the parameter estimates can be estimated from the inverse of the matrix of second derivatives of the logarithm of the conditional likelihood, while the first derivatives provide the score function,
${U}_{\mathit{j}}\left(\beta \right)$, for
$\mathit{j}=1,2,\dots ,p$, which can be used for testing the significance of parameters.
If the strata are not small,
${C}_{i}$ can be large so to improve the speed of computation, the algorithm in
Howard (1972) and described by
Krailo and Pike (1984) is used.
A second situation in which the above conditional likelihood arises is in fitting Cox's proportional hazard model (see
nag_surviv_cox_model (g12bac)) in which the strata refer to the risk sets for each failure time and where the failures are cases. When ties are present in the data
nag_surviv_cox_model (g12bac) uses an approximation. For an exact estimate, the data can be expanded using
nag_surviv_risk_sets (g12zac) to create the risk sets/strata and
nag_condl_logistic (g11cac) used.
4
References
Cox D R (1972) Regression models in life tables (with discussion) J. Roy. Statist. Soc. Ser. B 34 187–220
Cox D R and Hinkley D V (1974) Theoretical Statistics Chapman and Hall
Howard S (1972) Remark on the paper by Cox, D R (1972): Regression methods J. R. Statist. Soc. B 34 and life tables 187–220
Krailo M D and Pike M C (1984) Algorithm AS 196. Conditional multivariate logistic analysis of stratified casecontrol studies Appl. Statist. 33 95–103
Smith P G, Pike M C, Hill P, Breslow N E and Day N E (1981) Algorithm AS 162. Multivariate conditional logistic analysis of stratummatched casecontrol studies Appl. Statist. 30 190–197
5
Arguments
 1:
$\mathbf{order}$ – Nag_OrderTypeInput

On entry: the
order argument specifies the twodimensional storage scheme being used, i.e., rowmajor ordering or columnmajor ordering. C language defined storage is specified by
${\mathbf{order}}=\mathrm{Nag\_RowMajor}$. See
Section 3.3.1.3 in How to Use the NAG Library and its Documentation for a more detailed explanation of the use of this argument.
Constraint:
${\mathbf{order}}=\mathrm{Nag\_RowMajor}$ or $\mathrm{Nag\_ColMajor}$.
 2:
$\mathbf{n}$ – IntegerInput

On entry: $n$, the number of observations.
Constraint:
${\mathbf{n}}\ge 2$.
 3:
$\mathbf{m}$ – IntegerInput

On entry: the number of covariates in array
z.
Constraint:
${\mathbf{m}}\ge 1$.
 4:
$\mathbf{ns}$ – IntegerInput

On entry: the number of strata, ${n}_{s}$.
Constraint:
${\mathbf{ns}}\ge 1$.
 5:
$\mathbf{z}\left[\mathit{dim}\right]$ – const doubleInput

Note: the dimension,
dim, of the array
z
must be at least
 $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{pdz}}\times {\mathbf{m}}\right)$ when ${\mathbf{order}}=\mathrm{Nag\_ColMajor}$;
 $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{n}}\times {\mathbf{pdz}}\right)$ when ${\mathbf{order}}=\mathrm{Nag\_RowMajor}$.
The
$\left(i,j\right)$th element of the matrix
$Z$ is stored in
 ${\mathbf{z}}\left[\left(j1\right)\times {\mathbf{pdz}}+i1\right]$ when ${\mathbf{order}}=\mathrm{Nag\_ColMajor}$;
 ${\mathbf{z}}\left[\left(i1\right)\times {\mathbf{pdz}}+j1\right]$ when ${\mathbf{order}}=\mathrm{Nag\_RowMajor}$.
On entry: the $i$th row must contain the covariates which are associated with the $i$th observation.
 6:
$\mathbf{pdz}$ – IntegerInput

On entry: the stride separating row or column elements (depending on the value of
order) in the array
z.
Constraints:
 if ${\mathbf{order}}=\mathrm{Nag\_ColMajor}$,
${\mathbf{pdz}}\ge {\mathbf{n}}$;
 if ${\mathbf{order}}=\mathrm{Nag\_RowMajor}$, ${\mathbf{pdz}}\ge {\mathbf{m}}$.
 7:
$\mathbf{isz}\left[{\mathbf{m}}\right]$ – const IntegerInput

On entry: indicates which subset of covariates are to be included in the model.
If ${\mathbf{isz}}\left[j1\right]\ge 1$, the $j$th covariate is included in the model.
If ${\mathbf{isz}}\left[j1\right]=0$, the $j$th covariate is excluded from the model and not referenced.
Constraint:
${\mathbf{isz}}\left[j1\right]\ge 0$ and at least one value must be nonzero.
 8:
$\mathbf{p}$ – IntegerInput

On entry:
$p$, the number of covariates included in the model as indicated by
isz.
Constraint:
${\mathbf{p}}\ge 1$ and
${\mathbf{p}}=\text{}$ number of nonzero values of
isz.
 9:
$\mathbf{ic}\left[{\mathbf{n}}\right]$ – const IntegerInput

On entry: indicates whether the
$i$th observation is a case or a control.
If ${\mathbf{ic}}\left[i1\right]=0$, indicates that the $i$th observation is a case.
If ${\mathbf{ic}}\left[i1\right]=1$, indicates that the $i$th observation is a control.
Constraint:
${\mathbf{ic}}\left[\mathit{i}1\right]=0$ or $1$, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$.
 10:
$\mathbf{isi}\left[{\mathbf{n}}\right]$ – const IntegerInput

On entry: stratum indicators which also allow data points to be excluded from the analysis.
If ${\mathbf{isi}}\left[i1\right]=k$, indicates that the $i$th observation is from the $k$th stratum, where $k=1,2,\dots ,{\mathbf{ns}}$.
If ${\mathbf{isi}}\left[i1\right]=0$, indicates that the $i$th observation is to be omitted from the analysis.
Constraint:
$0\le {\mathbf{isi}}\left[\mathit{i}1\right]\le {\mathbf{ns}}$ and more than
p values of
${\mathbf{isi}}\left[\mathit{i}1\right]>0$, for
$\mathit{i}=1,2,\dots ,{\mathbf{n}}$.
 11:
$\mathbf{dev}$ – double *Output

On exit: the deviance, that is, minus twice the maximized loglikelihood.
 12:
$\mathbf{b}\left[{\mathbf{p}}\right]$ – doubleInput/Output

On entry: initial estimates of the covariate coefficient parameters
$\beta $.
${\mathbf{b}}\left[j1\right]$ must contain the initial estimate of the coefficent of the covariate in
z corresponding to the
$j$th nonzero value of
isz.
Suggested value:
in many cases an initial value of zero for
${\mathbf{b}}\left[j1\right]$ may be used. For another suggestion see
Section 9.
On exit:
${\mathbf{b}}\left[j1\right]$ contains the estimate
${\hat{\beta}}_{i}$ of the coefficient of the covariate stored in the
$i$th column of
z where
$i$ is the
$j$th nonzero value in the array
isz.
 13:
$\mathbf{se}\left[{\mathbf{p}}\right]$ – doubleOutput

On exit: ${\mathbf{se}}\left[\mathit{j}1\right]$ is the asymptotic standard error of the estimate contained in ${\mathbf{b}}\left[\mathit{j}1\right]$ and score function in ${\mathbf{sc}}\left[\mathit{j}1\right]$, for $\mathit{j}=1,2,\dots ,{\mathbf{p}}$.
 14:
$\mathbf{sc}\left[{\mathbf{p}}\right]$ – doubleOutput

On exit: ${\mathbf{sc}}\left[j\right]$ is the value of the score function ${U}_{j}\left(\beta \right)$ for the estimate contained in ${\mathbf{b}}\left[j1\right]$.
 15:
$\mathbf{cov}\left[{\mathbf{p}}\times \left({\mathbf{p}}+1\right)/2\right]$ – doubleOutput

On exit: the variancecovariance matrix of the parameter estimates in
b stored in packed form by column, i.e., the covariance between the parameter estimates given in
${\mathbf{b}}\left[i1\right]$ and
${\mathbf{b}}\left[j1\right]$,
$j\ge i$, is given in
${\mathbf{cov}}\left[j\left(j1\right)/2+i\right]$.
 16:
$\mathbf{nca}\left[{\mathbf{ns}}\right]$ – IntegerOutput

On exit: ${\mathbf{nca}}\left[\mathit{i}1\right]$ contains the number of cases in the $\mathit{i}$th stratum, for $\mathit{i}=1,2,\dots ,{\mathbf{ns}}$.
 17:
$\mathbf{nct}\left[{\mathbf{ns}}\right]$ – IntegerOutput

On exit: ${\mathbf{nct}}\left[\mathit{i}1\right]$ contains the number of controls in the $\mathit{i}$th stratum, for $\mathit{i}=1,2,\dots ,{\mathbf{ns}}$.
 18:
$\mathbf{tol}$ – doubleInput

On entry: indicates the accuracy required for the estimation. Convergence is assumed when the decrease in deviance is less than ${\mathbf{tol}}\times \left(1.0+\mathrm{CurrentDeviance}\right)$. This corresponds approximately to an absolute accuracy if the deviance is small and a relative accuracy if the deviance is large.
Constraint:
${\mathbf{tol}}\ge 10\times \mathit{machineprecision}$.
 19:
$\mathbf{maxit}$ – IntegerInput

On entry: the maximum number of iterations required for computing the estimates. If
maxit is set to
$0$ then the standard errors, the score functions and the variancecovariance matrix are computed for the input value of
$\beta $ in
b but
$\beta $ is not updated.
Constraint:
${\mathbf{maxit}}\ge 0$.
 20:
$\mathbf{iprint}$ – IntegerInput

On entry: indicates if the printing of information on the iterations is required.
 ${\mathbf{iprint}}\le 0$
 No printing.
 ${\mathbf{iprint}}\ge 1$
 The deviance and the current estimates are printed every iprint iterations.
Suggested value:
${\mathbf{iprint}}=0$.
 21:
$\mathbf{outfile}$ – const char *Input

On entry: the name of a file to which diagnostic output will be directed. If
outfile is
NULL the diagnostic output will be directed to standard output.
 22:
$\mathbf{fail}$ – NagError *Input/Output

The NAG error argument (see
Section 3.7 in How to Use the NAG Library and its Documentation).
6
Error Indicators and Warnings
 NE_ALLOC_FAIL

Dynamic memory allocation failed.
See
Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
 NE_BAD_PARAM

On entry, argument $\u2329\mathit{\text{value}}\u232a$ had an illegal value.
 NE_CONVERGENCE

Convergence not achieved in
$\u2329\mathit{\text{value}}\u232a$ iterations. The progress towards convergence can be examined by using a nonzero value of
iprint. Any nonconvergence may be due to a linear combination of covariates being monotonic with time. Full results are returned.
 NE_INT

On entry, $i=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{ic}}\left[i1\right]=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{ic}}\left[i\right]=0$ or $1$.
On entry, $i=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{isz}}\left[i1\right]<\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{isz}}\left[i1\right]\ge 0$.
On entry, ${\mathbf{m}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{m}}\ge 1$.
On entry, ${\mathbf{maxit}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{maxit}}\ge 0$.
On entry, ${\mathbf{n}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{n}}\ge 2$.
On entry, ${\mathbf{ns}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{ns}}\ge 1$.
On entry, ${\mathbf{p}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{p}}\ge 1$.
On entry, ${\mathbf{pdz}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{pdz}}>0$.
On entry, ${\mathbf{pdz}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{pdz}}\ge {\mathbf{n}}$.
 NE_INT_2

On entry, $i=\u2329\mathit{\text{value}}\u232a$, ${\mathbf{isi}}\left[i1\right]=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{ns}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: $0\le {\mathbf{isi}}\left[i1\right]\le {\mathbf{ns}}$.
On entry, ${\mathbf{pdz}}=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{m}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{pdz}}\ge {\mathbf{m}}$.
 NE_INT_ARRAY_ELEM_CONS

On entry, there are not
p values of
${\mathbf{isz}}>0$.
 NE_INTERNAL_ERROR

An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
See
Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
 NE_NO_LICENCE

Your licence key may have expired or may not have been installed correctly.
See
Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
 NE_NOT_CLOSE_FILE

Cannot close file $\u2329\mathit{\text{value}}\u232a$.
 NE_NOT_WRITE_FILE

Cannot open file $\u2329\mathit{\text{value}}\u232a$ for writing.
 NE_OBSERVATIONS

On entry, too few observations included in model.
 NE_OVERFLOW

Overflow in calculations. Try using different starting values.
 NE_REAL

On entry, ${\mathbf{tol}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{tol}}\ge 10\times \mathit{machineprecision}$.
 NE_SINGULAR

The matrix of second partial derivatives is singular. Try different starting values or include fewer covariates.
7
Accuracy
The accuracy is specified by
tol.
8
Parallelism and Performance
nag_condl_logistic (g11cac) is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
nag_condl_logistic (g11cac) makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
x06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the
Users' Note for your implementation for any additional implementationspecific information.
The other models described in
Section 3 can be fitted using the generalized linear modelling functions
nag_glm_binomial (g02gbc) and
nag_glm_poisson (g02gcc).
The case with one case per stratum can be analysed by having a dummy response variable
$y$ such that
$y=1$ for a case and
$y=0$ for a control, and fitting a Poisson generalized linear model with a log link and including a factor with a level for each strata. These models can be fitted by using
nag_glm_poisson (g02gcc).
nag_condl_logistic (g11cac) uses mean centering, which involves subtracting the means from the covariables prior to computation of any statistics. This helps to minimize the effect of outlying observations and accelerates convergence. In order to reduce the risk of the sums computed by Howard's algorithm becoming too large, the scaling factor described in
Krailo and Pike (1984) is used.
If the initial estimates are poor then there may be a problem with overflow in calculating $\mathrm{exp}\left({\beta}^{\mathrm{T}}{z}_{i}\right)$ or there may be nonconvergence. Reasonable estimates can often be obtained by fitting an unconditional model.
10
Example
The data was used for illustrative purposes by
Smith et al. (1981) and consists of two strata and two covariates. The data is input, the model is fitted and the results are printed.
10.1
Program Text
Program Text (g11cace.c)
10.2
Program Data
Program Data (g11cace.d)
10.3
Program Results
Program Results (g11cace.r)