PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_univar_estim_normal (g07bb)
Purpose
nag_univar_estim_normal (g07bb) computes maximum likelihood estimates and their standard errors for arguments of the Normal distribution from grouped and/or censored data.
Syntax
[
xmu,
xsig,
sexmu,
sexsig,
corr,
dev,
nobs,
nit,
ifail] = g07bb(
method,
x,
xc,
ic,
xmu,
xsig,
tol,
maxit, 'n',
n)
[
xmu,
xsig,
sexmu,
sexsig,
corr,
dev,
nobs,
nit,
ifail] = nag_univar_estim_normal(
method,
x,
xc,
ic,
xmu,
xsig,
tol,
maxit, 'n',
n)
Description
A sample of size
is taken from a Normal distribution with mean
and variance
and consists of grouped and/or censored data. Each of the
observations is known by a pair of values
such that:
The data is represented as particular cases of this form:
- exactly specified observations occur when ,
- right-censored observations, known only by a lower bound, occur when ,
- left-censored observations, known only by a upper bound, occur when ,
- and interval-censored observations when .
Let the set
identify the exactly specified observations, sets
and
identify the observations censored on the right and left respectively, and set
identify the observations confined between two finite limits. Also let there be
exactly specified observations, i.e., the number in
. The probability density function for the standard Normal distribution is
and the cumulative distribution function is
The log-likelihood of the sample can be written as:
where
and
.
Let
and
then the first derivatives of the log-likelihood can be written as:
and
The maximum likelihood estimates,
and
, are the solution to the equations:
and
and if the second derivatives
,
and
are denoted by
,
and
respectively, then estimates of the standard errors of
and
are given by:
and an estimate of the correlation of
and
is given by:
To obtain the maximum likelihood estimates the equations
(1) and
(2) can be solved using either the Newton–Raphson method or the Expectation-maximization
algorithm of
Dempster et al. (1977).
Newton–Raphson Method
This consists of using approximate estimates
and
to obtain improved estimates
and
by solving
for the corrections
and
.
EM Algorithm
The expectation step consists of constructing the variable
as follows:
the maximization step consists of substituting
(3),
(4),
(5) and
(6) into
(1) and
(2) giving:
and
where
and where
,
and
are
,
and
evaluated at
and
. Equations
(3) to
(8) are the basis of the
iterative procedure for finding
and
. The procedure consists of alternately estimating
and
using
(7) and
(8) and estimating
using
(3) to
(6).
In choosing between the two methods a general rule is that the Newton–Raphson method converges more quickly but requires good initial estimates whereas the algorithm converges slowly but is robust to the initial values. In the case of the censored Normal distribution, if only a small proportion of the observations are censored then estimates based on the exact observations should give good enough initial estimates for the Newton–Raphson method to be used. If there are a high proportion of censored observations then the algorithm should be used and if high accuracy is required the subsequent use of the Newton–Raphson method to refine the estimates obtained from the algorithm should be considered.
References
Dempster A P, Laird N M and Rubin D B (1977) Maximum likelihood from incomplete data via the algorithm (with discussion) J. Roy. Statist. Soc. Ser. B 39 1–38
Swan A V (1969) Algorithm AS 16. Maximum likelihood estimation from grouped and censored normal data Appl. Statist. 18 110–114
Wolynetz M S (1979) Maximum likelihood estimation from confined and censored normal data Appl. Statist. 28 185–195
Parameters
Compulsory Input Parameters
- 1:
– string (length ≥ 1)
-
Indicates whether the Newton–Raphson or
algorithm should be used.
If , then the Newton–Raphson algorithm is used.
If , then the algorithm is used.
Constraint:
or .
- 2:
– double array
-
The observations
,
or
, for
.
If the observation is exactly specified – the exact value, .
If the observation is right-censored – the lower value, .
If the observation is left-censored – the upper value, .
If the observation is interval-censored – the lower or upper value,
or
, (see
xc).
- 3:
– double array
-
If the
th observation, for is an interval-censored observation then should contain the complementary value to , that is, if , then contains upper value, , and if , then contains lower value, . Otherwise if the th observation is exact or right- or left-censored need not be set.
Note: if then the observation is ignored.
- 4:
– int64int32nag_int array
-
contains the censoring codes for the
th observation, for
.
If , the observation is exactly specified.
If , the observation is right-censored.
If , the observation is left-censored.
If , the observation is interval-censored.
Constraint:
, , or , for .
- 5:
– double scalar
-
If
the initial estimate of the mean,
; otherwise
xmu need not be set.
- 6:
– double scalar
-
Specifies whether an initial estimate of
and
are to be supplied.
- xsig is the initial estimate of and xmu must contain an initial estimate of .
- Initial estimates of xmu and xsig are calculated internally from:
(a) |
the exact observations, if the number of exactly specified observations is ; or |
(b) |
the interval-censored observations; if the number of interval-censored observations is ; or |
(c) |
they are set to and respectively. |
- 7:
– double scalar
-
The relative precision required for the final estimates of
and
. Convergence is assumed when the absolute relative changes in the estimates of both
and
are less than
tol.
If , then a relative precision of is used.
Constraint:
or .
- 8:
– int64int32nag_int scalar
-
The maximum number of iterations.
If , then a value of is used.
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the dimension of the arrays
x,
xc,
ic. (An error is raised if these dimensions are not equal.)
, the number of observations.
Constraint:
.
Output Parameters
- 1:
– double scalar
-
The maximum likelihood estimate, , of .
- 2:
– double scalar
-
The maximum likelihood estimate, , of .
- 3:
– double scalar
-
The estimate of the standard error of .
- 4:
– double scalar
-
The estimate of the standard error of .
- 5:
– double scalar
-
The estimate of the correlation between and .
- 6:
– double scalar
-
The maximized log-likelihood, .
- 7:
– int64int32nag_int array
-
The number of the different types of each observation;
contains number of right-censored observations.
contains number of left-censored observations.
contains number of interval-censored observations.
contains number of exactly specified observations.
- 8:
– int64int32nag_int scalar
-
The number of iterations performed.
- 9:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
-
-
On entry, | or , |
or | , |
or | , , or , for some , |
or | , |
or | , |
or | . |
-
-
The chosen method failed to converge in
maxit iterations. You should either increase
tol or
maxit or, if using the
algorithm try using the Newton–Raphson method with initial values those returned by the current call to
nag_univar_estim_normal (g07bb). All returned values will be reasonable approximations to the correct results if
maxit is not very small.
-
-
The chosen method is diverging. This will be due to poor initial values. You should try different initial values.
-
-
nag_univar_estim_normal (g07bb) was unable to calculate the standard errors. This can be caused by the method starting to diverge when the maximum number of iterations was reached.
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
The accuracy is controlled by the argument
tol.
If high precision is requested with the
algorithm then there is a possibility that, due to the slow convergence, before the correct solution has been reached the increments of
and
may be smaller than
tol and the process will prematurely assume convergence.
Further Comments
The process is deemed divergent if three successive increments of or increase.
Example
A sample of observations and their censoring codes are read in and the Newton–Raphson method used to compute the estimates.
Open in the MATLAB editor:
g07bb_example
function g07bb_example
fprintf('g07bb example results\n\n');
x = [4.5; 5.4; 3.9; 5.1; 4.6; 4.8;
2.9; 6.3; 5.5; 4.6; 4.1; 5.2;
3.2; 4; 3.1; 5.1; 3.8; 2.2];
n = numel(x);
xc = zeros(n,1);
ic = zeros(n,1,'int64');
xc(n,1) = 2.5;
ic(n-5:n,1) = [1; 1; 1; 2; 2; 3];
method = 'N';
xmu = 4;
xsig = 1;
tol = 5e-05;
maxit = int64(50);
[xmu, xsig, sexmu, sexsig, corr, dev, nobs, nit, ifail] = ...
g07bb( ...
method, x, xc, ic, xmu, xsig, tol, maxit);
fprintf(' Mean = %8.4f\n', xmu);
fprintf(' Standard deviation = %8.4f\n', xsig);
fprintf(' Standard error of mean = %8.4f\n', sexmu);
fprintf(' Standard error of sigma = %8.4f\n', sexsig);
fprintf(' Correlation coefficient = %8.4f\n', corr);
fprintf(' Number of right censored observations = %3d\n', nobs(1));
fprintf(' Number of left censored observations = %3d\n', nobs(2));
fprintf(' Number of interval censored observations = %3d\n', nobs(3));
fprintf(' Number of exactly specified observations = %3d\n', nobs(4));
fprintf(' Number of iterations = %3d\n', nit);
fprintf(' Log-likelihood = %8.4f\n', dev);
g07bb example results
Mean = 4.4924
Standard deviation = 1.0196
Standard error of mean = 0.2606
Standard error of sigma = 0.1940
Correlation coefficient = 0.0160
Number of right censored observations = 3
Number of left censored observations = 2
Number of interval censored observations = 1
Number of exactly specified observations = 12
Number of iterations = 5
Log-likelihood = -22.2817
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015