A generalized linear model consists of the following elements:
(i) |
A suitable distribution for the dependent variable . |
(ii) |
A linear model, with linear predictor , where is a matrix of independent variables and a column vector of parameters. |
(iii) |
A link function between the expected value of and the linear predictor, that is . |
In order to predict from a generalized linear model, that is estimate a value for the dependent variable,
, given a set of independent variables
, the matrix
must be supplied, along with values for the parameters
and their associated variance-covariance matrix,
. Suitable values for
and
are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example
nag_glm_normal (g02gac),
nag_glm_binomial (g02gbc),
nag_glm_poisson (g02gcc) or
nag_glm_gamma (g02gdc). The predicted variable, and its standard error can then be obtained from:
where
is a vector of offsets and
, if the variance of future observations is not taken into account, and
otherwise. Here
indicates the diagonal elements of matrix
.
If required, the variance for the
th future observation,
, can be calculated as:
where
is a weight,
is the scale (or dispersion) parameter, and
is the variance function. Both the scale parameter and the variance function depend on the distribution used for the
, with:
Poisson |
, |
binomial |
, |
Normal |
|
gamma |
|
- 1:
errfn – Nag_DistributionsInput
On entry: indicates the distribution used to model the dependent variable,
.
- The binomial distribution is used.
- The gamma distribution is used.
- The Normal (Gaussian) distribution is used.
- The Poisson distribution is used.
Constraint:
, , or .
- 2:
link – Nag_LinkInput
On entry: indicates which link function to be used.
- A complementary log-log link is used.
- An exponent link is used.
- A logistic link is used.
- An identity link is used.
- A log link is used.
- A probit link is used.
- A reciprocal link is used.
- A square root link is used.
Details on the functional form of the different links can be found in the
g02 Chapter Introduction.
Constraints:
- if , , or ;
- otherwise , , , or .
- 3:
mean – Nag_IncludeMeanInput
On entry: indicates if a mean term is to be included.
- A mean term, intercept, will be included in the model.
- The model will pass through the origin, zero-point.
Constraint:
or .
- 4:
n – IntegerInput
On entry: , the number of observations.
Constraint:
.
- 5:
x[] – const doubleInput
On entry: must contain the th observation for the th independent variable, for and .
- 6:
tdx – IntegerInput
-
On entry: the stride separating matrix column elements in the array
x.
Constraint:
.
- 7:
m – IntegerInput
On entry: , the total number of independent variables.
Constraint:
.
- 8:
sx[m] – const IntegerInput
On entry: indicates which independent variables are to be included in the model.
If , the th independent variable is included in the regression model.
Constraints:
- , for ;
- if , exactly values of sx must be ;
- if , exactly ip values of sx must be .
- 9:
ip – IntegerInput
On entry: the number of independent variables in the model, including the mean or intercept if present.
Constraint:
.
- 10:
binom_t[n] – const doubleInput
On entry: if
,
must contain the binomial denominator,
, for the
th observation.
Otherwise
binom_t is not referenced and may be
NULL.
Constraint:
if , , for .
- 11:
offset[n] – const doubleInput
On entry: if an offset is required then
must contain the value of the offset
, for the
th observation. Otherwise
offset must be supplied as
NULL.
- 12:
wt[n] – const doubleInput
On entry: if weighted estimates are required then
must contain the weight,
for the
th observation. Otherwise
wt must be supplied as
NULL.
If , then the th observation is not included in the model, in which case the effective number of observations is the number of observations with positive weights.
If NULL, then the effective number of observations is .
If the variance of future observations is not included in the standard error of the predicted variable,
wt is not referenced.
Constraint:
if and , , for .
- 13:
scale – doubleInput
On entry: if
or
and
, the scale parameter,
.
Otherwise
scale is not referenced and
.
Constraint:
if or and , .
- 14:
ex_power – doubleInput
On entry: if
,
ex_power must contain the power of the exponential.
If
,
ex_power is not referenced.
Constraint:
if , .
- 15:
b[ip] – const doubleInput
On entry: the model parameters,
.
If
,
must contain the mean parameter and
the coefficient of the variable contained in the
th independent
x, where
is the
th positive value in the array
sx.
If
,
must contain the coefficient of the variable contained in the
th independent
x, where
is the
th positive value in the array
sx.
- 16:
cov[] – const doubleInput
On entry: the upper triangular part of the variance-covariance matrix, , of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters and , that is the values stored in and , should be supplied in
, for and .
Constraint:
the matrix represented in
cov must be a valid variance-covariance matrix.
- 17:
vfobs – Nag_BooleanInput
On entry: if , the variance of future observations is included in the standard error of the predicted variable (i.e., ), otherwise .
- 18:
eta[n] – doubleOutput
On exit: the linear predictor, .
- 19:
seeta[n] – doubleOutput
On exit: the standard error of the linear predictor, .
- 20:
pred[n] – doubleOutput
On exit: the predicted value, .
- 21:
sepred[n] – doubleOutput
On exit: the standard error of the predicted value,
. If
could not be calculated, then nag_glm_predict (g02gpc) returns
NE_INVALID_PRED, and
is set to
.
- 22:
fail – NagError *Input/Output
-
The NAG error argument (see
Section 3.6 in the Essential Introduction).
- NE_ALLOC_FAIL
-
Dynamic memory allocation failed.
- NE_BAD_PARAM
-
On entry, argument had an illegal value.
On entry, the error type and link function combination supplied is invalid.
- NE_INT
-
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INT_2
-
On entry, and .
Constraint: .
- NE_INT_ARRAY_CONS
-
On entry,
sx not consistent with
ip:
values
, expected
.
- NE_INTERNAL_ERROR
-
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
- NE_INVALID_PRED
-
At least one predicted value could not be calculated as required.
sepred is set to
for affected predicted values.
- NE_REAL
-
On entry, .
On entry, .
Constraint: .
- NE_REAL_ARRAY_CONS
-
On entry, for at least one diagonal element: , .
On entry, and .
Constraint: , for all .
On entry, and .
Constraint: , for all .
Not applicable.
nag_glm_predict (g02gpc) is not threaded by NAG in any implementation.
nag_glm_predict (g02gpc) makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
Users' Note for your implementation for any additional implementation-specific information.
None.
The model
is fitted to a training dataset with five observations. The resulting model is then used to predict the response for two new observations.