g07 Chapter Contents
g07 Chapter Introduction
NAG Library Manual

# NAG Library Function Documentnag_censored_normal (g07bbc)

## 1  Purpose

nag_censored_normal (g07bbc) computes maximum likelihood estimates and their standard errors for arguments of the Normal distribution from grouped and/or censored data.

## 2  Specification

 #include #include
 void nag_censored_normal (Nag_CEMethod method, Integer n, const double x[], const double xc[], const Integer ic[], double *xmu, double *xsig, double tol, Integer maxit, double *sexmu, double *sexsig, double *corr, double *dev, Integer nobs[], Integer *nit, NagError *fail)

## 3  Description

A sample of size $n$ is taken from a Normal distribution with mean $\mu$ and variance ${\sigma }^{2}$ and consists of grouped and/or censored data. Each of the $n$ observations is known by a pair of values $\left({L}_{i},{U}_{i}\right)$ such that:
 $Li≤xi≤Ui.$
The data is represented as particular cases of this form:
• exactly specified observations occur when ${L}_{i}={U}_{i}={x}_{i}$,
• right-censored observations, known only by a lower bound, occur when ${U}_{i}\to \infty$,
• left-censored observations, known only by a upper bound, occur when ${L}_{i}\to -\infty$,
• and interval-censored observations when ${L}_{i}<{x}_{i}<{U}_{i}$.
Let the set $A$ identify the exactly specified observations, sets $B$ and $C$ identify the observations censored on the right and left respectively, and set $D$ identify the observations confined between two finite limits. Also let there be $r$ exactly specified observations, i.e., the number in $A$. The probability density function for the standard Normal distribution is
 $Zx=12πexp-12x2 , -∞
and the cumulative distribution function is
 $PX= 1-QX=∫-∞XZxdx.$
The log-likelihood of the sample can be written as:
 $L μ,σ =-r log⁡σ - 1 2 ∑ A x i -μ / σ 2 +∑B log Q l i + ∑ C log P u i + ∑ D log p i$
where ${p}_{i}=P\left({u}_{i}\right)-P\left({l}_{i}\right)$ and ${u}_{i}=\left({U}_{i}-\mu \right)/\sigma \text{, }{l}_{i}=\left({L}_{i}-\mu \right)/\sigma$.
Let
 $Sxi=Zxi Qxi , S1li,ui=Zli-Zuipi$
and
 $S2li,ui=uiZui-liZlipi,$
then the first derivatives of the log-likelihood can be written as:
 $∂Lμ,σ ∂μ =L1μ,σ=σ-2∑Axi-μ+σ-1∑BSli-σ-1∑CS-ui+σ-1∑DS1li,ui$
and
 $∂Lμ,σ ∂σ =L2μ,σ=-rσ-1+σ-3∑A xi-μ 2+σ-1∑BliSli-σ-1∑CuiS-ui$
 $-σ-1∑DS2li,ui$
The maximum likelihood estimates, $\stackrel{^}{\mu }$ and $\stackrel{^}{\sigma }$, are the solution to the equations:
 $L1μ^,σ^=0$ (1)
and
 $L2μ^,σ^=0$ (2)
and if the second derivatives $\frac{{\partial }^{2}L}{{\partial }^{2}\mu }$, $\frac{{\partial }^{2}L}{\partial \mu \partial \sigma }$ and $\frac{{\partial }^{2}L}{{\partial }^{2}\sigma }$ are denoted by ${L}_{11}$, ${L}_{12}$ and ${L}_{22}$ respectively, then estimates of the standard errors of $\stackrel{^}{\mu }$ and $\stackrel{^}{\sigma }$ are given by:
 $seμ^=-L22 L11L22-L122 , seσ^=-L11 L11L22-L122$
and an estimate of the correlation of $\stackrel{^}{\mu }$ and $\stackrel{^}{\sigma }$ is given by:
 $L12L12L22 .$
To obtain the maximum likelihood estimates the equations (1) and (2) can be solved using either the Newton–Raphson method or the Expectation-maximization $\left(EM\right)$ algorithm of Dempster et al. (1977).
Newton–Raphson Method
This consists of using approximate estimates $\stackrel{~}{\mu }$ and $\stackrel{~}{\sigma }$ to obtain improved estimates $\stackrel{~}{\mu }+\delta \stackrel{~}{\mu }$ and $\stackrel{~}{\sigma }+\delta \stackrel{~}{\sigma }$ by solving
 $δμ~L11+δσ~L12+L1=0, δμ~L12+δσ~L22+L2=0,$
for the corrections $\delta \stackrel{~}{\mu }$ and $\delta \stackrel{~}{\sigma }$.
EM Algorithm
The expectation step consists of constructing the variable ${w}_{i}$ as follows:
 $if i∈A, wi= xi E ∣ Li< xi< Ui = μ+σ S1 li,ui$ (3)
 $if i∈B, wi= E xi∣ xi> Li =μ+σS li S1 li,ui$ (4)
 $if i∈C, wi= E xi∣ xi< Ui =μ-σS -ui li,ui$ (5)
 $if i∈D, wi= E xi∣ Li< xi< Ui =μ+σ S1 li,ui$ (6)
the maximization step consists of substituting (3), (4), (5) and (6) into (1) and (2) giving:
 $μ^=∑i=1nw^i/n$ (7)
and
 $σ^2=∑i=1nw^i-μ^2/ r+∑BTl^i+∑CT-u^i+∑DT1l^i,u^i$ (8)
where
 $Tx=SxSx-x , T1l,u=S12l,u+S2l,u$
and where ${\stackrel{^}{w}}_{i}$, ${\stackrel{^}{l}}_{i}$ and ${\stackrel{^}{u}}_{i}$ are ${w}_{i}$, ${l}_{i}$ and ${u}_{i}$ evaluated at $\stackrel{^}{\mu }$ and $\stackrel{^}{\sigma }$. Equations (3) to (8) are the basis of the $EM$ iterative procedure for finding $\stackrel{^}{\mu }$ and ${\stackrel{^}{\sigma }}^{2}$. The procedure consists of alternately estimating $\stackrel{^}{\mu }$ and ${\stackrel{^}{\sigma }}^{2}$ using (7) and (8) and estimating $\left\{{\stackrel{^}{w}}_{i}\right\}$ using (3) to (6).
In choosing between the two methods a general rule is that the Newton–Raphson method converges more quickly but requires good initial estimates whereas the $EM$ algorithm converges slowly but is robust to the initial values. In the case of the censored Normal distribution, if only a small proportion of the observations are censored then estimates based on the exact observations should give good enough initial estimates for the Newton–Raphson method to be used. If there are a high proportion of censored observations then the $EM$ algorithm should be used and if high accuracy is required the subsequent use of the Newton–Raphson method to refine the estimates obtained from the $EM$ algorithm should be considered.

## 4  References

Dempster A P, Laird N M and Rubin D B (1977) Maximum likelihood from incomplete data via the $EM$ algorithm (with discussion) J. Roy. Statist. Soc. Ser. B 39 1–38
Swan A V (1969) Algorithm AS 16. Maximum likelihood estimation from grouped and censored normal data Appl. Statist. 18 110–114
Wolynetz M S (1979) Maximum likelihood estimation from confined and censored normal data Appl. Statist. 28 185–195

## 5  Arguments

1:    $\mathbf{method}$Nag_CEMethodInput
On entry: indicates whether the Newton–Raphson or $EM$ algorithm should be used.
If ${\mathbf{method}}=\mathrm{Nag_CE_NR}$, then the Newton–Raphson algorithm is used.
If ${\mathbf{method}}=\mathrm{Nag_CE_EM}$, then the $EM$ algorithm is used.
Constraint: ${\mathbf{method}}=\mathrm{Nag_CE_NR}$ or $\mathrm{Nag_CE_EM}$.
2:    $\mathbf{n}$IntegerInput
On entry: $n$, the number of observations.
Constraint: ${\mathbf{n}}\ge 2$.
3:    $\mathbf{x}\left[{\mathbf{n}}\right]$const doubleInput
On entry: the observations ${x}_{\mathit{i}}$, ${L}_{\mathit{i}}$ or ${U}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
If the observation is exactly specified – the exact value, ${x}_{i}$.
If the observation is right-censored – the lower value, ${L}_{i}$.
If the observation is left-censored – the upper value, ${U}_{i}$.
If the observation is interval-censored – the lower or upper value, ${L}_{i}$ or ${U}_{i}$, (see xc).
4:    $\mathbf{xc}\left[{\mathbf{n}}\right]$const doubleInput
On entry: if the $\mathit{j}$th observation, for $\mathit{j}=1,2,\dots ,n$ is an interval-censored observation then ${\mathbf{xc}}\left[j-1\right]$ should contain the complementary value to ${\mathbf{x}}\left[j-1\right]$, that is, if ${\mathbf{x}}\left[j-1\right]<{\mathbf{xc}}\left[j-1\right]$, then ${\mathbf{xc}}\left[j-1\right]$ contains upper value, ${U}_{i}$, and if ${\mathbf{x}}\left[j-1\right]>{\mathbf{xc}}\left[j-1\right]$, then ${\mathbf{xc}}\left[j-1\right]$ contains lower value, ${L}_{i}$. Otherwise if the $j$th observation is exact or right- or left-censored ${\mathbf{xc}}\left[j-1\right]$ need not be set.
Note: if ${\mathbf{x}}\left[j-1\right]={\mathbf{xc}}\left[j-1\right]$ then the observation is ignored.
5:    $\mathbf{ic}\left[{\mathbf{n}}\right]$const IntegerInput
On entry: ${\mathbf{ic}}\left[\mathit{i}-1\right]$ contains the censoring codes for the $\mathit{i}$th observation, for $\mathit{i}=1,2,\dots ,n$.
If ${\mathbf{ic}}\left[i-1\right]=0$, the observation is exactly specified.
If ${\mathbf{ic}}\left[i-1\right]=1$, the observation is right-censored.
If ${\mathbf{ic}}\left[i-1\right]=2$, the observation is left-censored.
If ${\mathbf{ic}}\left[i-1\right]=3$, the observation is interval-censored.
Constraint: ${\mathbf{ic}}\left[\mathit{i}-1\right]=0$, $1$, $2$ or $3$, for $\mathit{i}=1,2,\dots ,n$.
6:    $\mathbf{xmu}$double *Input/Output
On entry: if ${\mathbf{xsig}}>0.0$ the initial estimate of the mean, $\mu$; otherwise xmu need not be set.
On exit: the maximum likelihood estimate, $\stackrel{^}{\mu }$, of $\mu$.
7:    $\mathbf{xsig}$double *Input/Output
On entry: specifies whether an initial estimate of $\mu$ and $\sigma$ are to be supplied.
${\mathbf{xsig}}>0.0$
xsig is the initial estimate of $\sigma$ and xmu must contain an initial estimate of $\mu$.
${\mathbf{xsig}}\le 0.0$
Initial estimates of xmu and xsig are calculated internally from:
 (a) the exact observations, if the number of exactly specified observations is $\text{}\ge 2$; or (b) the interval-censored observations; if the number of interval-censored observations is $\text{}\ge 1$; or (c) they are set to $0.0$ and $1.0$ respectively.
On exit: the maximum likelihood estimate, $\stackrel{^}{\sigma }$, of $\sigma$.
8:    $\mathbf{tol}$doubleInput
On entry: the relative precision required for the final estimates of $\mu$ and $\sigma$. Convergence is assumed when the absolute relative changes in the estimates of both $\mu$ and $\sigma$ are less than tol.
If ${\mathbf{tol}}=0.0$, then a relative precision of $0.000005$ is used.
Constraint:  or ${\mathbf{tol}}=0.0$.
9:    $\mathbf{maxit}$IntegerInput
On entry: the maximum number of iterations.
If ${\mathbf{maxit}}\le 0$, then a value of $25$ is used.
10:  $\mathbf{sexmu}$double *Output
On exit: the estimate of the standard error of $\stackrel{^}{\mu }$.
11:  $\mathbf{sexsig}$double *Output
On exit: the estimate of the standard error of $\stackrel{^}{\sigma }$.
12:  $\mathbf{corr}$double *Output
On exit: the estimate of the correlation between $\stackrel{^}{\mu }$ and $\stackrel{^}{\sigma }$.
13:  $\mathbf{dev}$double *Output
On exit: the maximized log-likelihood, $L\left(\stackrel{^}{\mu },\stackrel{^}{\sigma }\right)$.
14:  $\mathbf{nobs}\left[4\right]$IntegerOutput
On exit: the number of the different types of each observation;
${\mathbf{nobs}}\left[0\right]$ contains number of right-censored observations.
${\mathbf{nobs}}\left[1\right]$ contains number of left-censored observations.
${\mathbf{nobs}}\left[2\right]$ contains number of interval-censored observations.
${\mathbf{nobs}}\left[3\right]$ contains number of exactly specified observations.
15:  $\mathbf{nit}$Integer *Output
On exit: the number of iterations performed.
16:  $\mathbf{fail}$NagError *Input/Output
The NAG error argument (see Section 2.7 in How to Use the NAG Library and its Documentation).

## 6  Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
On entry, argument $〈\mathit{\text{value}}〉$ had an illegal value.
NE_CONVERGENCE
Method has not converged in $〈\mathit{\text{value}}〉$ iterations.
NE_DIVERGENCE
Process has diverged.
NE_EM_PROCESS
The EM process has failed.
NE_INT
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 2$.
NE_INT_ARRAY
On entry, ${\mathbf{ic}}\left[〈\mathit{\text{value}}〉\right]$ is invalid, it contains $〈\mathit{\text{value}}〉$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
NE_OBSERVATIONS
On entry, effective number of observations $\text{}<2$.
NE_REAL
On entry, tol is invalid: ${\mathbf{tol}}=〈\mathit{\text{value}}〉$.
NE_STANDARD_ERRORS
Standard errors cannot be computed.

## 7  Accuracy

The accuracy is controlled by the argument tol.
If high precision is requested with the $EM$ algorithm then there is a possibility that, due to the slow convergence, before the correct solution has been reached the increments of $\stackrel{^}{\mu }$ and $\stackrel{^}{\sigma }$ may be smaller than tol and the process will prematurely assume convergence.

## 8  Parallelism and Performance

nag_censored_normal (g07bbc) is not threaded in any implementation.

The process is deemed divergent if three successive increments of $\mu$ or $\sigma$ increase.

## 10  Example

A sample of $18$ observations and their censoring codes are read in and the Newton–Raphson method used to compute the estimates.

### 10.1  Program Text

Program Text (g07bbce.c)

### 10.2  Program Data

Program Data (g07bbce.d)

### 10.3  Program Results

Program Results (g07bbce.r)