PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_correg_glm_predict (g02gp)
Purpose
Syntax
[
eta,
seeta,
pred,
sepred,
ifail] = g02gp(
errfn,
link,
mean_p,
x,
isx,
b,
covar,
vfobs, 'n',
n, 'm',
m, 'ip',
ip, 't',
t, 'off',
off, 'wt',
wt, 's',
s, 'a',
a)
[
eta,
seeta,
pred,
sepred,
ifail] = nag_correg_glm_predict(
errfn,
link,
mean_p,
x,
isx,
b,
covar,
vfobs, 'n',
n, 'm',
m, 'ip',
ip, 't',
t, 'off',
off, 'wt',
wt, 's',
s, 'a',
a)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 23: |
wt, off, s and a were made optional; weight and offset were removed from the interface; t was made optional (default to vector of 1s) |
Description
A generalized linear model consists of the following elements:
(i) |
A suitable distribution for the dependent variable . |
(ii) |
A linear model, with linear predictor , where is a matrix of independent variables and a column vector of parameters. |
(iii) |
A link function between the expected value of and the linear predictor, that is . |
In order to predict from a generalized linear model, that is estimate a value for the dependent variable,
, given a set of independent variables
, the matrix
must be supplied, along with values for the parameters
and their associated variance-covariance matrix,
. Suitable values for
and
are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example
nag_correg_glm_normal (g02ga),
nag_correg_glm_binomial (g02gb),
nag_correg_glm_poisson (g02gc) or
nag_correg_glm_gamma (g02gd). The predicted variable, and its standard error can then be obtained from:
where
is a vector of offsets and
, if the variance of future observations is not taken into account, and
otherwise. Here
indicates the diagonal elements of matrix
.
If required, the variance for the
th future observation,
, can be calculated as:
where
is a weight,
is the scale (or dispersion) parameter, and
is the variance function. Both the scale parameter and the variance function depend on the distribution used for the
, with:
Poisson |
, |
binomial |
, |
Normal |
|
gamma |
|
In the cases of a Normal and gamma error structure, the scale parameter (), is supplied by you. This value is usually obtained from the function used to fit the prediction model. In many cases, for a Normal error structure, , i.e., the estimated variance.
References
McCullagh P and Nelder J A (1983) Generalized Linear Models Chapman and Hall
Parameters
Compulsory Input Parameters
- 1:
– string (length ≥ 1)
-
Indicates the distribution used to model the dependent variable,
.
- The binomial distribution is used.
- The gamma distribution is used.
- The Normal (Gaussian) distribution is used.
- The Poisson distribution is used.
Constraint:
, , or .
- 2:
– string (length ≥ 1)
-
Indicates which link function to be used.
- A complementary log-log link is used.
- An exponent link is used.
- A logistic link is used.
- An identity link is used.
- A log link is used.
- A probit link is used.
- A reciprocal link is used.
- A square root link is used.
Details on the functional form of the different links can be found in the
G02 Chapter Introduction.
Constraints:
- if , , or ;
- otherwise , , , or .
- 3:
– string (length ≥ 1)
-
Indicates if a mean term is to be included.
- A mean term, intercept, will be included in the model.
- The model will pass through the origin, zero-point.
Constraint:
or .
- 4:
– double array
-
The first dimension of the array
x must be at least
.
The second dimension of the array
x must be at least
.
must contain the th observation for the th independent variable, for and .
- 5:
– int64int32nag_int array
-
Indicates which independent variables are to be included in the model.
If , the th independent variable is included in the regression model.
Constraints:
- , for ;
- if , exactly values of isx must be ;
- if , exactly ip values of isx must be .
- 6:
– double array
-
The model parameters,
.
If
,
must contain the mean parameter and
the coefficient of the variable contained in the
th independent
x, where
is the
th positive value in the array
isx.
If
,
must contain the coefficient of the variable contained in the
th independent
x, where
is the
th positive value in the array
isx.
- 7:
– double array
-
The upper triangular part of the variance-covariance matrix, , of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters and , that is the values stored in and , should be supplied in
, for and .
Constraint:
the matrix represented in
covar must be a valid variance-covariance matrix.
- 8:
– logical scalar
-
If , the variance of future observations is included in the standard error of the predicted variable (i.e., ), otherwise .
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the dimension of the arrays
t,
off,
wt and the first dimension of the array
x. (An error is raised if these dimensions are not equal.)
, the number of observations.
Constraint:
.
- 2:
– int64int32nag_int scalar
-
Default:
the dimension of the array
isx and the second dimension of the array
x. (An error is raised if these dimensions are not equal.)
, the total number of independent variables.
Constraint:
.
- 3:
– int64int32nag_int scalar
-
Default:
the dimension of the array
b.
The number of independent variables in the model, including the mean or intercept if present.
Constraint:
.
- 4:
– double array
-
The dimension of the array
must be at least
if , and at least otherwise
If
,
must contain the binomial denominator,
, for the
th observation.
Otherwise
t is not referenced.
Constraint:
if , , for .
- 5:
– double array
-
The dimension of the array
must be at least
if , and at least otherwise
If
,
must contain the offset
, for the
th observation.
Otherwise
off is not referenced.
- 6:
– double array
-
The dimension of the array
must be at least
if and , and at least otherwise
If
and
,
must contain the weight,
, for the
th observation.
If the variance of future observations is not included in the standard error of the predicted variable,
wt is not referenced.
Constraint:
if and , ., for .
- 7:
– double scalar
Default:
If
or
and
, the scale parameter,
.
Otherwise
s is not referenced and
.
Constraint:
if or and , .
- 8:
– double scalar
Default:
If
,
a must contain the power of the exponential.
If
,
a is not referenced.
Constraint:
if , .
Output Parameters
- 1:
– double array
-
The linear predictor, .
- 2:
– double array
-
The standard error of the linear predictor, .
- 3:
– double array
-
The predicted value, .
- 4:
– double array
-
The standard error of the predicted value, . If could not be calculated, then nag_correg_glm_predict (g02gp) returns , and is set to .
- 5:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Note: nag_correg_glm_predict (g02gp) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:
Cases prefixed with W are classified as warnings and
do not generate an error of type NAG:error_n. See nag_issue_warnings.
-
-
On entry,
errfn is invalid.
-
-
On entry,
errfn and
link combination is invalid.
On entry,
link is invalid.
-
-
On entry,
mean_p is invalid.
-
-
On entry, offset is invalid.
-
-
On entry, weight is invalid.
-
-
Constraint: .
-
-
Constraint: .
-
-
Constraint: .
-
-
On entry,
isx not consistent with
ip.
-
-
Constraint: .
-
-
Constraint: , for all .
-
-
Constraint: , for all .
-
-
Constraint: .
-
-
On entry, .
-
-
On entry, for at least one diagonal element.
- W
-
At least one predicted value could not be calculated as required.
sepred is set to
for affected predicted values.
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
Not applicable.
Further Comments
None.
Example
The model
is fitted to a training dataset with five observations. The resulting model is then used to predict the response for two new observations.
Open in the MATLAB editor:
g02gp_example
function g02gp_example
fprintf('g02gp example results\n\n');
x = [ 1; 2; 3; 4; 5];
y = [25; 10; 6; 4; 3];
isx = [int64(1)];
ip = int64(2);
link = 'R';
mean_p = 'M';
s = 0;
[s, rss, idf, b, irank, se, covar, v, ifail] = ...
g02ga( ...
link, mean_p, x, isx, ip, y, s);
fprintf('\nResidual sum of squares = %12.4e\n', rss);
fprintf('Degrees of freedom = %d\n', idf);
fprintf('\n Estimate Standard error\n');
for i = 1:ip
fprintf('%14.4f %14.4f\n', b(i), se(i));
end
x = [32; 18];
errfn = 'N';
vfobs = true;
[eta, seeta, pred, sepred, ifail] = ...
g02gp( ...
errfn, link, mean_p, x, isx, b, covar, vfobs, 's', s);
fprintf('\n i eta se(eta) predicted se(predicted)\n');
for i = 1:ip
fprintf('%3d%13.5f%13.5f%13.5f%13.5f\n', i, eta(i), seeta(i), ...
pred(i), sepred(i));
end
g02gp example results
Residual sum of squares = 3.8717e-01
Degrees of freedom = 3
Estimate Standard error
-0.0239 0.0028
0.0638 0.0026
i eta se(eta) predicted se(predicted)
1 2.01807 0.08168 0.49552 0.35981
2 1.12472 0.04476 0.88911 0.36098
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015