NAG Library Function Document

nag_glm_predict (g02gpc)

nag_glm_predict (Nag_Distributions errfn, Nag_Link link, Nag_IncludeMean mean, Integer n, const double x[], Integer tdx, Integer m, const Integer sx[], Integer ip, const double binom_t[], const double offset[], const double wt[], double scale, double ex_power, const double b[], const double cov[], Nag_Boolean vfobs, double eta[], double seeta[], double pred[], double sepred[], NagError *fail)

3 Description

A generalized linear model consists of the following elements:

(i)	A suitable distribution for the dependent variable $y$ .
(ii)	A linear model, with linear predictor $η = X β$ , where $X$ is a matrix of independent variables and $β$ a column vector of $p$ parameters.
(iii)	A link function $g (.)$ between the expected value of $y$ and the linear predictor, that is $E (y) = μ = g (η)$ .

In order to predict from a generalized linear model, that is estimate a value for the dependent variable,

y

, given a set of independent variables

X

, the matrix

X

must be supplied, along with values for the parameters

β

and their associated variance-covariance matrix,

C

. Suitable values for

β

and

C

are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example nag_glm_normal (g02gac), nag_glm_binomial (g02gbc), nag_glm_poisson (g02gcc) or nag_glm_gamma (g02gdc). The predicted variable, and its standard error can then be obtained from:

\hat{y} = g^{- 1} (η), se (\hat{y}) = \sqrt{{(\frac{δ g^{- 1} (x)}{δ x})}_{η}} se (η) + I_{fobs} Var (y)

where

η = o + X β, se (η) = diag \sqrt{X C X^{T}},

o

is a vector of offsets and

I_{fobs} = 0

, if the variance of future observations is not taken into account, and

1

otherwise. Here

diag A

indicates the diagonal elements of matrix

A

If required, the variance for the

i

th future observation,

Var (y_{i})

, can be calculated as:

Var (y_{i}) = \frac{ϕ V (θ)}{w_{i}}

where

w_{i}

is a weight,

ϕ

is the scale (or dispersion) parameter, and

V (θ)

is the variance function. Both the scale parameter and the variance function depend on the distribution used for the

y

, with:

Poisson	$V (θ) = μ_{i}$ , $ϕ = 1$
binomial	$V (θ) = \frac{μ_{i} (t_{i} - μ_{i})}{t_{i}}$ , $ϕ = 1$
Normal	$V (θ) = 1$
gamma	$V (θ) = μ_{i}^{2}$

In the cases of a Normal and gamma error structure, the scale parameter (

ϕ

), is supplied by you. This value is usually obtained from the function used to fit the prediction model. In many cases, for a Normal error structure,

ϕ = {\hat{σ}}^{2}

, i.e., the estimated variance.

4 References

McCullagh P and Nelder J A (1983) Generalized Linear Models Chapman and Hall

5 Arguments

1: $errfn$ – Nag_DistributionsInput

On entry: indicates the distribution used to model the dependent variable,

y

$errfn = Nag_Binomial$: The binomial distribution is used.
$errfn = Nag_Gamma$: The gamma distribution is used.
$errfn = Nag_Normal$: The Normal (Gaussian) distribution is used.
$errfn = Nag_Poisson$: The Poisson distribution is used.

Constraint:

errfn = Nag_Binomial

Nag_Gamma

Nag_Normal

Nag_Poisson

2: $link$ – Nag_LinkInput

On entry: indicates which link function to be used.

$link = Nag_Compl$: A complementary log-log link is used.
$link = Nag_Expo$: An exponent link is used.
$link = Nag_Logistic$: A logistic link is used.
$link = Nag_Iden$: An identity link is used.
$link = Nag_Log$: A log link is used.
$link = Nag_Probit$: A probit link is used.
$link = Nag_Reci$: A reciprocal link is used.
$link = Nag_Sqrt$: A square root link is used.

Details on the functional form of the different links can be found in the g02 Chapter Introduction.

Constraints:

if $errfn = Nag_Binomial$ , $link = Nag_Compl$ , $Nag_Logistic$ or $Nag_Probit$ ;
otherwise $link = Nag_Expo$ , $Nag_Iden$ , $Nag_Log$ , $Nag_Reci$ or $Nag_Sqrt$ .

3: $mean$ – Nag_IncludeMeanInput

On entry: indicates if a mean term is to be included.

$mean = Nag_MeanInclude$: A mean term, intercept, will be included in the model.
$mean = Nag_MeanZero$: The model will pass through the origin, zero-point.

Constraint:

mean = Nag_MeanInclude

Nag_MeanZero

4: $n$ – IntegerInput

On entry:

n

, the number of observations.

Constraint:

n \geq 1

5: $x [\dim]$ – const doubleInput

Note: the dimension, dim, of the array x must be at least

n \times tdx

On entry:

x [(i - 1) \times tdx + j - 1]

must contain the

i

th observation for the

j

th independent variable, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m

6: $tdx$ – IntegerInput

On entry: the stride separating matrix column elements in the array x.

Constraint:

tdx \geq m

7: $m$ – IntegerInput

On entry:

m

, the total number of independent variables.

Constraint:

m \geq 1

8: $sx [m]$ – const IntegerInput

On entry: indicates which independent variables are to be included in the model.

sx [j - 1] > 0

, the

j

th independent variable is included in the regression model.

Constraints:

$sx [j - 1] \geq 0$ , for $i = 1, 2, \dots, m$ ;
if $mean = Nag_MeanInclude$ , exactly $ip - 1$ values of sx must be $> 0$ ;
if $mean = Nag_MeanZero$ , exactly ip values of sx must be $> 0$ .

9: $ip$ – IntegerInput

On entry: the number of independent variables in the model, including the mean or intercept if present.

Constraint:

ip > 0

10: $binom_t [n]$ – const doubleInput

On entry: if

errfn = Nag_Binomial

binom_t [i - 1]

must contain the binomial denominator,

t_{i}

, for the

i

th observation.

Otherwise binom_t is not referenced and may be NULL.

Constraint: if

errfn = Nag_Binomial

binom_t [i - 1] \geq 0.0

, for

i = 1, 2, \dots, n

11: $offset [n]$ – const doubleInput

On entry: if an offset is required then

offset [i - 1]

must contain the value of the offset

o_{i}

, for the

i

th observation. Otherwise offset must be supplied as NULL.

12: $wt [n]$ – const doubleInput

On entry: if weighted estimates are required then

wt [i - 1]

must contain the weight,

ω_{i}

for the

i

th observation. Otherwise wt must be supplied as NULL.

wt [i - 1] = 0.0

, then the

i

th observation is not included in the model, in which case the effective number of observations is the number of observations with positive weights.

wt =

NULL, then the effective number of observations is

n

If the variance of future observations is not included in the standard error of the predicted variable, wt is not referenced.

Constraint: if

wt is not NULL

and

vfobs = Nag_TRUE

wt [i - 1] \geq 0.0

, for

i = 1, 2, \dots, n

13: $scale$ – doubleInput

On entry: if

errfn = Nag_Normal

Nag_Gamma

and

vfobs = Nag_TRUE

, the scale parameter,

ϕ

Otherwise scale is not referenced and

ϕ = 1

Constraint: if

errfn = Nag_Normal

Nag_Gamma

and

vfobs = Nag_TRUE

scale > 0.0

14: $ex_power$ – doubleInput

On entry: if

link = Nag_Expo

, ex_power must contain the power of the exponential.

link \neq Nag_Expo

, ex_power is not referenced.

Constraint: if

link = Nag_Expo

ex_power \neq 0.0

15: $b [ip]$ – const doubleInput

On entry: the model parameters,

β

mean = Nag_MeanInclude

b [0]

must contain the mean parameter and

b [i]

the coefficient of the variable contained in the

j

th independent x, where

sx [j - 1]

is the

i

th positive value in the array sx.

mean = Nag_MeanZero

b [i - 1]

must contain the coefficient of the variable contained in the

j

th independent x, where

sx [j - 1]

is the

i

th positive value in the array sx.

16: $cov [ip \times (ip + 1) / 2]$ – const doubleInput

On entry: the upper triangular part of the variance-covariance matrix,

C

, of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters

β_{i}

and

β_{j}

, that is the values stored in

b [i - 1]

and

b [j - 1]

, should be supplied in

cov [j \times (j - 1) / 2 + i - 1]

, for

i = 1, 2, \dots, ip

and

j = i, \dots, ip

Constraint: the matrix represented in cov must be a valid variance-covariance matrix.

17: $vfobs$ – Nag_BooleanInput

On entry: if

vfobs = Nag_TRUE

, the variance of future observations is included in the standard error of the predicted variable (i.e.,

I_{fobs} = 1

), otherwise

I_{fobs} = 0

18: $eta [n]$ – doubleOutput

On exit: the linear predictor,

η

19: $seeta [n]$ – doubleOutput

On exit: the standard error of the linear predictor,

se (η)

20: $pred [n]$ – doubleOutput

On exit: the predicted value,

\hat{y}

21: $sepred [n]$ – doubleOutput

On exit: the standard error of the predicted value,

se (\hat{y})

. If

pred [i - 1]

could not be calculated, then nag_glm_predict (g02gpc) returns

fail . code =

NE_INVALID_PRED, and

sepred [i - 1]

is set to

- 99.0

22: $fail$ – NagError *Input/Output

The NAG error argument (see Section 3.6 in the Essential Introduction).

6 Error Indicators and Warnings

NE_ALLOC_FAIL: Dynamic memory allocation failed.
See Section 3.2.1.2 in the Essential Introduction for further information.
NE_BAD_PARAM: On entry, argument $〈value〉$ had an illegal value.

On entry, the error type and link function combination supplied is invalid.
NE_INT: On entry, $ip = 〈value〉$ .
Constraint: $ip > 0$ .

On entry, $m = 〈value〉$ .
Constraint: $m \geq 1$ .

On entry, $n = 〈value〉$ .
Constraint: $n \geq 1$ .
NE_INT_2: On entry, $tdx = 〈value〉$ and $m = 〈value〉$ .
Constraint: $tdx \geq m$ .
NE_INT_ARRAY_CONS: On entry, sx not consistent with ip: $〈value〉$ values $> 0$ , expected $〈value〉$ .
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.

An unexpected error has been triggered by this function. Please contact NAG.
See Section 3.6.6 in the Essential Introduction for further information.
NE_INVALID_PRED: At least one predicted value could not be calculated as required. sepred is set to $- 99.0$ for affected predicted values.
NE_NO_LICENCE: Your licence key may have expired or may not have been installed correctly.
See Section 3.6.5 in the Essential Introduction for further information.
NE_REAL: On entry, $ex_power = 0.0$ .

On entry, $scale = 〈value〉$ .
Constraint: $scale > 0.0$ .
NE_REAL_ARRAY_CONS: On entry, $cov [i - 1] < 0.0$ for at least one diagonal element: $i = 〈value〉$ , $cov [i - 1] = 〈value〉$ .

On entry, $i = 〈value〉$ and $binom_t [i - 1] = 〈value〉$ .
Constraint: $binom_t [i - 1] \geq 0.0$ , for all $i$ .

On entry, $i = 〈value〉$ and $wt [i - 1] = 〈value〉$ .
Constraint: $wt [i - 1] \geq 0.0$ , for all $i$ .

7 Accuracy

Not applicable.

8 Parallelism and Performance

nag_glm_predict (g02gpc) is not threaded by NAG in any implementation.

nag_glm_predict (g02gpc) makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

None.

10 Example

The model

y = \frac{1}{β_{1} + β_{2} x} + ε

is fitted to a training dataset with five observations. The resulting model is then used to predict the response for two new observations.

NAG Library Function Documentnag_glm_predict (g02gpc)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG Library Function Document

nag_glm_predict (g02gpc)