[eta, seeta, pred, sepred, ifail] = g02gp(errfn, link, mean_p, x, isx, b, covar, vfobs, 'n', n, 'm', m, 'ip', ip, 't', t, 'off', off, 'wt', wt, 's', s, 'a', a)

[eta, seeta, pred, sepred, ifail] = nag_correg_glm_predict(errfn, link, mean_p, x, isx, b, covar, vfobs, 'n', n, 'm', m, 'ip', ip, 't', t, 'off', off, 'wt', wt, 's', s, 'a', a)

Note: the interface to this routine has changed since earlier releases of the toolbox:

At Mark 23:

wt, off, s and a were made optional; weight and offset were removed from the interface; t was made optional (default to vector of 1s)

Description

A generalized linear model consists of the following elements:

(i)	A suitable distribution for the dependent variable $y$ .
(ii)	A linear model, with linear predictor $η = X β$ , where $X$ is a matrix of independent variables and $β$ a column vector of $p$ parameters.
(iii)	A link function $g (.)$ between the expected value of $y$ and the linear predictor, that is $E (y) = μ = g (η)$ .

In order to predict from a generalized linear model, that is estimate a value for the dependent variable,

y

, given a set of independent variables

X

, the matrix

X

must be supplied, along with values for the parameters

β

and their associated variance-covariance matrix,

C

. Suitable values for

β

and

C

are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example nag_correg_glm_normal (g02ga), nag_correg_glm_binomial (g02gb), nag_correg_glm_poisson (g02gc) or nag_correg_glm_gamma (g02gd). The predicted variable, and its standard error can then be obtained from:

\hat{y} = g^{- 1} (η), se (\hat{y}) = \sqrt{{(\frac{δ g^{- 1} (x)}{δ x})}_{η}} se (η) + I_{fobs} Var (y)

where

η = o + X β, se (η) = diag \sqrt{X C X^{T}},

o

is a vector of offsets and

I_{fobs} = 0

, if the variance of future observations is not taken into account, and

1

otherwise. Here

diag A

indicates the diagonal elements of matrix

A

If required, the variance for the

i

th future observation,

Var (y_{i})

, can be calculated as:

Var (y_{i}) = \frac{ϕ V (θ)}{w_{i}}

where

w_{i}

is a weight,

ϕ

is the scale (or dispersion) parameter, and

V (θ)

is the variance function. Both the scale parameter and the variance function depend on the distribution used for the

y

, with:

Poisson	$V (θ) = μ_{i}$ , $ϕ = 1$
binomial	$V (θ) = \frac{μ_{i} (t_{i} - μ_{i})}{t_{i}}$ , $ϕ = 1$
Normal	$V (θ) = 1$
gamma	$V (θ) = μ_{i}^{2}$

In the cases of a Normal and gamma error structure, the scale parameter (

ϕ

), is supplied by you. This value is usually obtained from the function used to fit the prediction model. In many cases, for a Normal error structure,

ϕ = {\hat{σ}}^{2}

, i.e., the estimated variance.

References

McCullagh P and Nelder J A (1983) Generalized Linear Models Chapman and Hall

Parameters

Compulsory Input Parameters

1: $errfn$ – string (length ≥ 1)

Indicates the distribution used to model the dependent variable,

y

$errfn ='B'$: The binomial distribution is used.
$errfn ='G'$: The gamma distribution is used.
$errfn ='N'$: The Normal (Gaussian) distribution is used.
$errfn ='P'$: The Poisson distribution is used.

Constraint:

errfn ='B'

'G'

'N'

'P'

2: $link$ – string (length ≥ 1)

Indicates which link function to be used.

$link ='C'$: A complementary log-log link is used.
$link ='E'$: An exponent link is used.
$link ='G'$: A logistic link is used.
$link ='I'$: An identity link is used.
$link ='L'$: A log link is used.
$link ='P'$: A probit link is used.
$link ='R'$: A reciprocal link is used.
$link ='S'$: A square root link is used.

Details on the functional form of the different links can be found in the G02 Chapter Introduction.

Constraints:

if $errfn ='B'$ , $link ='C'$ , $'G'$ or $'P'$ ;
otherwise $link ='E'$ , $'I'$ , $'L'$ , $'R'$ or $'S'$ .

3: $mean_p$ – string (length ≥ 1)

Indicates if a mean term is to be included.

$mean_p ='M'$: A mean term, intercept, will be included in the model.
$mean_p ='Z'$: The model will pass through the origin, zero-point.

Constraint:

mean_p ='M'

'Z'

4: $x (ldx :)$ – double array

The first dimension of the array x must be at least

n

The second dimension of the array x must be at least

m

x (i, j)

must contain the

i

th observation for the

j

th independent variable, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m

5: $isx (m)$ – int64int32nag_int array

Indicates which independent variables are to be included in the model.

isx (j) > 0

, the

j

th independent variable is included in the regression model.

Constraints:

$isx (j) \geq 0$ , for $i = 1, 2, \dots, m$ ;
if $mean_p ='M'$ , exactly $ip - 1$ values of isx must be $> 0$ ;
if $mean_p ='Z'$ , exactly ip values of isx must be $> 0$ .

6: $b (ip)$ – double array

The model parameters,

β

mean_p ='M'

b (1)

must contain the mean parameter and

b (i + 1)

the coefficient of the variable contained in the

j

th independent x, where

isx (j)

is the

i

th positive value in the array isx.

mean_p ='Z'

b (i)

must contain the coefficient of the variable contained in the

j

th independent x, where

isx (j)

is the

i

th positive value in the array isx.

7: $covar (ip \times (ip + 1) / 2)$ – double array

The upper triangular part of the variance-covariance matrix,

C

, of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters

β_{i}

and

β_{j}

, that is the values stored in

b (i)

and

b (j)

, should be supplied in

covar (j \times (j - 1) / 2 + i)

, for

i = 1, 2, \dots, ip

and

j = i, \dots, ip

Constraint: the matrix represented in covar must be a valid variance-covariance matrix.

8: $vfobs$ – logical scalar

vfobs = true

, the variance of future observations is included in the standard error of the predicted variable (i.e.,

I_{fobs} = 1

), otherwise

I_{fobs} = 0

Optional Input Parameters

1: $n$ – int64int32nag_int scalar: Default: the dimension of the arrays t, off, wt and the first dimension of the array x. (An error is raised if these dimensions are not equal.)
$n$ , the number of observations.

Constraint: $n \geq 1$ .
2: $m$ – int64int32nag_int scalar: Default: the dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
$m$ , the total number of independent variables.

Constraint: $m \geq 1$ .
3: $ip$ – int64int32nag_int scalar: Default: the dimension of the array b.
The number of independent variables in the model, including the mean or intercept if present.

Constraint: $ip > 0$ .
4: $t (:)$ – double array: The dimension of the array must be at least $n$ if $errfn ='B'$ , and at least $1$ otherwise

If $errfn ='B'$ , $t (i)$ must contain the binomial denominator, $t_{i}$ , for the $i$ th observation.
Otherwise t is not referenced.

Constraint: if $errfn ='B'$ , $t (i) \geq 0.0$ , for $i = 1, 2, \dots, n$ .
5: $off (:)$ – double array: The dimension of the array must be at least $n$ if $offset ='Y'$ , and at least $1$ otherwise

If $offset ='Y'$ , $off (i)$ must contain the offset $o_{i}$ , for the $i$ th observation.
Otherwise off is not referenced.
6: $wt (:)$ – double array: The dimension of the array must be at least $n$ if $weight ='W'$ and $vfobs = true$ , and at least $1$ otherwise

If $weight ='W'$ and $vfobs = true$ , $wt (i)$ must contain the weight, $w_{i}$ , for the $i$ th observation.
If the variance of future observations is not included in the standard error of the predicted variable, wt is not referenced.

Constraint: if $vfobs = true$ and $weight ='W'$ , $wt (i) \geq 0$ ., for $i = 1, 2, \dots, i$ .
7: $s$ – double scalar: Default: $0$
If $errfn ='N'$ or $'G'$ and $vfobs = true$ , the scale parameter, $ϕ$ .
Otherwise s is not referenced and $ϕ = 1$ .

Constraint: if $errfn ='N'$ or $'G'$ and $vfobs = true$ , $s > 0.0$ .
8: $a$ – double scalar: Default: $0$
If $link ='E'$ , a must contain the power of the exponential.
If $link \neq'E'$ , a is not referenced.

Constraint: if $link ='E'$ , $a \neq 0.0$ .

Output Parameters

1: $eta (n)$ – double array: The linear predictor, $η$ .
2: $seeta (n)$ – double array: The standard error of the linear predictor, $se (η)$ .
3: $pred (n)$ – double array: The predicted value, $\hat{y}$ .
4: $sepred (n)$ – double array: The standard error of the predicted value, $se (\hat{y})$ . If $pred (i)$ could not be calculated, then nag_correg_glm_predict (g02gp) returns $ifail = 22$ , and $sepred (i)$ is set to $- 99.0$ .
5: $ifail$ – int64int32nag_int scalar: $ifail = 0$ unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Note: nag_correg_glm_predict (g02gp) may return useful information for one or more of the following detected errors or warnings.

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

$ifail = 1$: On entry, errfn is invalid.

$ifail = 2$: On entry, errfn and link combination is invalid.

On entry, link is invalid.

$ifail = 3$: On entry, mean_p is invalid.

$ifail = 4$: On entry, offset is invalid.

$ifail = 5$: On entry, weight is invalid.

$ifail = 6$: Constraint: $n \geq 1$ .

$ifail = 8$: Constraint: $ldx \geq n$ .

$ifail = 9$: Constraint: $m \geq 1$ .

$ifail = 10$: On entry, isx not consistent with ip.

$ifail = 11$: Constraint: $ip > 0$ .

$ifail = 12$: Constraint: $t (i) \geq 0.0$ , for all $i$ .

$ifail = 14$: Constraint: $wt (i) \geq 0.0$ , for all $i$ .

$ifail = 15$: Constraint: $s > 0.0$ .

$ifail = 16$: On entry, $a = 0.0$ .

$ifail = 18$: On entry, $covar (i) < 0.0$ for at least one diagonal element.

W $ifail = 22$: At least one predicted value could not be calculated as required. sepred is set to $- 99.0$ for affected predicted values.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.

$ifail = - 999$: Dynamic memory allocation failed.

Accuracy

Not applicable.

Further Comments

None.

Example

The model

y = \frac{1}{β_{1} + β_{2} x} + ε

is fitted to a training dataset with five observations. The resulting model is then used to predict the response for two new observations.

Open in the MATLAB editor: g02gp_example

function g02gp_example


fprintf('g02gp example results\n\n');

x = [ 1;  2; 3; 4; 5];
y = [25; 10; 6; 4; 3];

isx = [int64(1)];
ip  = int64(2);

link   = 'R';
mean_p = 'M';
s      = 0;

% Fit generalized linear model, with Normal errors to training data
[s, rss, idf, b, irank, se, covar, v, ifail] = ...
  g02ga( ...
         link, mean_p, x, isx, ip, y, s);

% Display parameter estimates for training data
fprintf('\nResidual sum of squares =  %12.4e\n', rss);
fprintf('Degrees of freedom      =  %d\n', idf);
fprintf('\n      Estimate     Standard error\n');
for i = 1:ip
  fprintf('%14.4f %14.4f\n', b(i), se(i));
end

% Prediction data
x = [32; 18];

% compute redicted values
errfn  = 'N';
vfobs = true;
[eta, seeta, pred, sepred, ifail] = ...
  g02gp( ...
         errfn, link, mean_p, x, isx, b, covar, vfobs, 's', s);

% Display predicted values
fprintf('\n  i      eta          se(eta)      predicted    se(predicted)\n');
for i = 1:ip
  fprintf('%3d%13.5f%13.5f%13.5f%13.5f\n', i, eta(i), seeta(i), ...
          pred(i), sepred(i));
end

g02gp example results


Residual sum of squares =    3.8717e-01
Degrees of freedom      =  3

      Estimate     Standard error
       -0.0239         0.0028
        0.0638         0.0026

  i      eta          se(eta)      predicted    se(predicted)
  1      2.01807      0.08168      0.49552      0.35981
  2      1.12472      0.04476      0.88911      0.36098

PDF version (NAG web site, 64-bit version, 64-bit version)

Chapter Contents

Chapter Introduction

NAG Toolbox