NAG FL Interface
g07bbf (estim_​normal)

Settings help

FL Name Style:


FL Specification Language:


1 Purpose

g07bbf computes maximum likelihood estimates and their standard errors for parameters of the Normal distribution from grouped and/or censored data.

2 Specification

Fortran Interface
Subroutine g07bbf ( method, n, x, xc, ic, xmu, xsig, tol, maxit, sexmu, sexsig, corr, dev, nobs, nit, wk, ifail)
Integer, Intent (In) :: n, ic(n), maxit
Integer, Intent (Inout) :: ifail
Integer, Intent (Out) :: nobs(4), nit
Real (Kind=nag_wp), Intent (In) :: x(n), xc(n), tol
Real (Kind=nag_wp), Intent (Inout) :: xmu, xsig
Real (Kind=nag_wp), Intent (Out) :: sexmu, sexsig, corr, dev, wk(2*n)
Character (1), Intent (In) :: method
C Header Interface
#include <nag.h>
void  g07bbf_ (const char *method, const Integer *n, const double x[], const double xc[], const Integer ic[], double *xmu, double *xsig, const double *tol, const Integer *maxit, double *sexmu, double *sexsig, double *corr, double *dev, Integer nobs[], Integer *nit, double wk[], Integer *ifail, const Charlen length_method)
The routine may be called by the names g07bbf or nagf_univar_estim_normal.

3 Description

A sample of size n is taken from a Normal distribution with mean μ and variance σ2 and consists of grouped and/or censored data. Each of the n observations is known by a pair of values (Li,Ui) such that:
LixiUi.  
The data is represented as particular cases of this form:
Let the set A identify the exactly specified observations, sets B and C identify the observations censored on the right and left respectively, and set D identify the observations confined between two finite limits. Also let there be r exactly specified observations, i.e., the number in A. The probability density function for the standard Normal distribution is
Z(x)=12πexp(-12x2) ,  -<x<  
and the cumulative distribution function is
P(X)= 1-Q(X)=-XZ(x)dx.  
The log-likelihood of the sample can be written as:
L (μ,σ) =-r logσ - 1 2 A {( x i -μ)/σ} 2 +B log(Q( l i )) + C log(P( u i )) + D log( p i )  
where pi=P(ui)-P(li) and ui=(Ui-μ)/σ,  li=(Li-μ)/σ.
Let
S(xi)=Z(xi) Q(xi) ,  S1(li,ui)=Z(li)-Z(ui)pi  
and
S2(li,ui)=uiZ(ui)-liZ(li)pi,  
then the first derivatives of the log-likelihood can be written as:
L(μ,σ) μ =L1(μ,σ)=σ-2A(xi-μ)+σ-1BS(li)-σ-1CS(-ui)+σ-1DS1(li,ui)  
and
L(μ,σ) σ =L2(μ,σ)=-rσ-1+σ-3A (xi-μ) 2+σ-1BliS(li)-σ-1CuiS(-ui)  
-σ-1DS2(li,ui)  
The maximum likelihood estimates, μ^ and σ^, are the solution to the equations:
L1(μ^,σ^)=0 (1)
and
L2(μ^,σ^)=0 (2)
and if the second derivatives 2 L 2 μ , 2 L μ σ and 2L 2σ are denoted by L11, L12 and L22 respectively, then estimates of the standard errors of μ^ and σ^ are given by:
se(μ^)=-L22 L11L22-L122 ,  se(σ^)=-L11 L11L22-L122  
and an estimate of the correlation of μ^ and σ^ is given by:
L12L12L22 .  
To obtain the maximum likelihood estimates the equations (1) and (2) can be solved using either the Newton–Raphson method or the Expectation-maximization (EM) algorithm of Dempster et al. (1977).
Newton–Raphson Method
This consists of using approximate estimates μ~ and σ~ to obtain improved estimates μ~+δμ~ and σ~+δσ~ by solving
δμ~L11+δσ~L12+L1=0, δμ~L12+δσ~L22+L2=0,  
for the corrections δμ~ and δσ~.
EM Algorithm
The expectation step consists of constructing the variable wi as follows:
if   iA,   wi= xi E (Li<xi<Ui)= μ+σ S1 (li,ui) (3)
if   iB,   wi= E (xixi>Li)=μ+σS (li) S1 (li,ui) (4)
if   iC,   wi= E (xixi<Ui)=μ-σS (-ui) (li,ui) (5)
if   iD,   wi= E (xiLi<xi<Ui)=μ+σ S1 (li,ui) (6)
the maximization step consists of substituting (3), (4), (5) and (6) into (1) and (2) giving:
μ^=i=1nw^i/n (7)
and
σ^2=i=1n(w^i-μ^)2/ {r+BT(l^i)+CT(-u^i)+DT1(l^i,u^i)} (8)
where
T(x)=S(x){S(x)-x} ,   T1(l,u)=S12(l,u)+S2(l,u)  
and where w^i, l^i and u^i are wi, li and ui evaluated at μ^ and σ^. Equations (3) to (8) are the basis of the EM iterative procedure for finding μ^ and σ^2. The procedure consists of alternately estimating μ^ and σ^2 using (7) and (8) and estimating {w^i} using (3) to (6).
In choosing between the two methods a general rule is that the Newton–Raphson method converges more quickly but requires good initial estimates whereas the EM algorithm converges slowly but is robust to the initial values. In the case of the censored Normal distribution, if only a small proportion of the observations are censored then estimates based on the exact observations should give good enough initial estimates for the Newton–Raphson method to be used. If there are a high proportion of censored observations then the EM algorithm should be used and if high accuracy is required the subsequent use of the Newton–Raphson method to refine the estimates obtained from the EM algorithm should be considered.

4 References

Dempster A P, Laird N M and Rubin D B (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion) J. Roy. Statist. Soc. Ser. B 39 1–38
Swan A V (1969) Algorithm AS 16. Maximum likelihood estimation from grouped and censored normal data Appl. Statist. 18 110–114
Wolynetz M S (1979) Maximum likelihood estimation from confined and censored normal data Appl. Statist. 28 185–195

5 Arguments

1: method Character(1) Input
On entry: indicates whether the Newton–Raphson or EM algorithm should be used.
If method='N', the Newton–Raphson algorithm is used.
If method='E', the EM algorithm is used.
Constraint: method='N' or 'E'.
2: n Integer Input
On entry: n, the number of observations.
Constraint: n2.
3: x(n) Real (Kind=nag_wp) array Input
On entry: the observations xi, Li or Ui, for i=1,2,,n.
If the observation is exactly specified – the exact value, xi.
If the observation is right-censored – the lower value, Li.
If the observation is left-censored – the upper value, Ui.
If the observation is interval-censored – the lower or upper value, Li or Ui, (see xc).
4: xc(n) Real (Kind=nag_wp) array Input
On entry: if the jth observation, for j=1,2,,n is an interval-censored observation then xc(j) should contain the complementary value to x(j), that is, if x(j)<xc(j), then xc(j) contains upper value, Ui, and if x(j)>xc(j), then xc(j) contains lower value, Li. Otherwise if the jth observation is exact or right- or left-censored xc(j) need not be set.
Note: if x(j)=xc(j) then the observation is ignored.
5: ic(n) Integer array Input
On entry: ic(i) contains the censoring codes for the ith observation, for i=1,2,,n.
If ic(i)=0, the observation is exactly specified.
If ic(i)=1, the observation is right-censored.
If ic(i)=2, the observation is left-censored.
If ic(i)=3, the observation is interval-censored.
Constraint: ic(i)=0, 1, 2 or 3, for i=1,2,,n.
6: xmu Real (Kind=nag_wp) Input/Output
On entry: if xsig>0.0 the initial estimate of the mean, μ; otherwise xmu need not be set.
On exit: the maximum likelihood estimate, μ^, of μ.
7: xsig Real (Kind=nag_wp) Input/Output
On entry: specifies whether an initial estimate of μ and σ are to be supplied.
xsig>0.0
xsig is the initial estimate of σ and xmu must contain an initial estimate of μ.
xsig0.0
Initial estimates of xmu and xsig are calculated internally from:
  1. (a)the exact observations, if the number of exactly specified observations is 2; or
  2. (b)the interval-censored observations; if the number of interval-censored observations is 1; or
  3. (c)they are set to 0.0 and 1.0 respectively.
On exit: the maximum likelihood estimate, σ^, of σ.
8: tol Real (Kind=nag_wp) Input
On entry: the relative precision required for the final estimates of μ and σ. Convergence is assumed when the absolute relative changes in the estimates of both μ and σ are less than tol.
If tol=0.0, a relative precision of 0.000005 is used.
Constraint: machine precision<tol1.0 or tol=0.0.
9: maxit Integer Input
On entry: the maximum number of iterations.
If maxit0, a value of 25 is used.
10: sexmu Real (Kind=nag_wp) Output
On exit: the estimate of the standard error of μ^.
11: sexsig Real (Kind=nag_wp) Output
On exit: the estimate of the standard error of σ^.
12: corr Real (Kind=nag_wp) Output
On exit: the estimate of the correlation between μ^ and σ^.
13: dev Real (Kind=nag_wp) Output
On exit: the maximized log-likelihood, L(μ^,σ^).
14: nobs(4) Integer array Output
On exit: the number of the different types of each observation;
nobs(1) contains number of right-censored observations.
nobs(2) contains number of left-censored observations.
nobs(3) contains number of interval-censored observations.
nobs(4) contains number of exactly specified observations.
15: nit Integer Output
On exit: the number of iterations performed.
16: wk(2×n) Real (Kind=nag_wp) array Workspace
17: ifail Integer Input/Output
On entry: ifail must be set to 0, -1 or 1 to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of 0 causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of -1 means that an error message is printed while a value of 1 means that it is not.
If halting is not appropriate, the value -1 or 1 is recommended. If message printing is undesirable, then the value 1 is recommended. Otherwise, the value 0 is recommended. When the value -1 or 1 is used it is essential to test the value of ifail on exit.
On exit: ifail=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry ifail=0 or -1, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
ifail=1
On entry, effective number of observations <2.
On entry, i=value and ic(i)=value.
Constraint: ic(i)=0, 1, 2 or 3.
On entry, method=value.
Constraint: method='N' or 'E'.
On entry, n=value.
Constraint: n2.
On entry, tol=value.
Constraint: machine precision<tol1.0 or tol=0.0.
ifail=2
The chosen method has not converged in value iterations. You should either increase tol or maxit or, if using the EM algorithm try using the Newton–Raphson method with initial values those returned by the current call to g07bbf. All returned values will be reasonable approximations to the correct results if maxit is not very small.
ifail=3
The EM process has failed. Different initial values should be tried.
The process has diverged. Different initial values should be tried.
ifail=4
Standard errors cannot be computed. This can be caused by the method starting to diverge when the maximum number of iterations was reached.
ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
ifail=-399
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
ifail=-999
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

The accuracy is controlled by the argument tol.
If high precision is requested with the EM algorithm then there is a possibility that, due to the slow convergence, before the correct solution has been reached the increments of μ^ and σ^ may be smaller than tol and the process will prematurely assume convergence.

8 Parallelism and Performance

g07bbf is not threaded in any implementation.

9 Further Comments

The process is deemed divergent if three successive increments of μ or σ increase.

10 Example

A sample of 18 observations and their censoring codes are read in and the Newton–Raphson method used to compute the estimates.

10.1 Program Text

Program Text (g07bbfe.f90)

10.2 Program Data

Program Data (g07bbfe.d)

10.3 Program Results

Program Results (g07bbfe.r)