g02da:: Correlation and Regression Analysis (NAG Toolbox)

nag_correg_linregm_fit (g02da) performs a general multiple linear regression when the independent variables may be linearly dependent. Parameter estimates, standard errors, residuals and influence statistics are computed. nag_correg_linregm_fit (g02da) may be used to perform a weighted regression.

Syntax

Description

The general linear regression model is defined by

y = X β + ε,

where

$y$ is a vector of $n$ observations on the dependent variable,
$X$ is an $n$ by $p$ matrix of the independent variables of column rank $k$ ,
$β$ is a vector of length $p$ of unknown arguments, and
$ε$ is a vector of length $n$ of unknown random errors such that $var ε = V σ^{2}$ , where $V$ is a known diagonal matrix.

nag_correg_linregm_fit (g02da) finds a

Q R

decomposition of

X

(or

W^{1 / 2} X

in weighted case), i.e.,

X = Q R^{*} (or W^{1 / 2} X = Q R^{*}),

where

R^{*} = (\begin{matrix} R \\ 0 \end{matrix})

and

R

is a

p

p

upper triangular matrix and

Q

is an

n

n

orthogonal matrix. If

R

is of full rank, then

\hat{β}

is the solution to

R \hat{β} = c_{1},

where

c = Q^{T} y

(or

Q^{T} W^{1 / 2} y

) and

c_{1}

is the first

p

elements of

c

. If

R

is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of

R

R = Q_{*} (\begin{matrix} D & 0 \\ 0 & 0 \end{matrix}) P^{T},

where

D

is a

k

k

diagonal matrix with nonzero diagonal elements,

k

being the rank of

R

, and

Q_{*}

and

P

are

p

p

orthogonal matrices. This gives the solution

\hat{β} = P_{1} D^{- 1} Q_{*_{1}}^{T} c_{1},

P_{1}

being the first

k

columns of

P

, i.e.,

P = (\begin{matrix} P_{1} & P_{0} \end{matrix})

, and

Q_{*_{1}}

being the first

k

columns of

Q_{*}

Details of the SVD, are made available, in the form of the matrix

P^{*}

P^{*} = (\begin{matrix} D^{- 1} P_{1}^{T} \\ P_{0}^{T} \end{matrix}) .

This will be only one of the possible solutions. Other estimates may be obtained by applying constraints to the arguments. These solutions can be obtained by using nag_correg_linregm_constrain (g02dk) after using nag_correg_linregm_fit (g02da). Only certain linear combinations of the arguments will have unique estimates; these are known as estimable functions.

The fit of the model can be examined by considering the residuals,

r_{i} = y_{i} - \hat{y}

, where

\hat{y} = X \hat{β}

are the fitted values. The fitted values can be written as

H y

for an

n

n

matrix

H

. The

i

th diagonal elements of

H

h_{i}

, give a measure of the influence of the

i

th values of the independent variables on the fitted regression model. The values

h_{i}

are sometimes known as leverages. Both

r_{i}

and

h_{i}

are provided by nag_correg_linregm_fit (g02da).

In many linear regression models the first term is taken as a mean term or an intercept, i.e.,

X_{i, 1} = 1

, for

i = 1, 2, \dots, n

. This is provided as an option. Also only some of the possible independent variables are required to be included in a model, a facility to select variables to be included in the model is provided.

Details of the

Q R

decomposition and, if used, the SVD, are made available. These allow the regression to be updated by adding or deleting an observation using nag_correg_linregm_obs_edit (g02dc), adding or deleting a variable using nag_correg_linregm_var_add (g02de) and nag_correg_linregm_var_del (g02df) or estimating and testing an estimable function using nag_correg_linregm_estfunc (g02dn).

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

Accuracy

Further Comments

Example

Data from an experiment with four treatments and three observations per treatment are read in. The treatments are represented by dummy (

0 - 1

) variables. An unweighted model is fitted with a mean included in the model. nag_correg_ssqmat (g02bu) is then called to calculate the total sums of squares and the coefficient of determination (

R_{2}

), adjusted

R_{2}

and Akaike's information criteria (AIC) are calculated.

function g02da_example


fprintf('g02da example results\n\n');

x = [1, 0, 0, 0;
     0, 0, 0, 1;
     0, 1, 0, 0;
     0, 0, 1, 0;
     0, 0, 0, 1;
     0, 1, 0, 0;
     0, 0, 0, 1;
     1, 0, 0, 0;
     0, 0, 1, 0;
     1, 0, 0, 0;
     0, 0, 1, 0;
     0, 1, 0, 0];
y = [33.63;     39.62;     38.18;     41.46;     38.02;     35.83;
     35.99;     36.58;     42.92;     37.80;     40.43;     37.89];

[n,m]  = size(x);
isx    = ones(m,1,'int64');
mean_p = 'M';
ip     = int64(m+1);

% Fit general linear regression model
[rss, idf, b, se, covar, res, h, q, svd, irank, p, wk, ifail] = ...
  g02da(mean_p, x, isx, ip, y);

% Calculate total sums of squares about mean
[sw, wmean, c, ifail] = g02bu(y);

% Effective number of observations
en = double(idf + irank);
% Calculate R-squared, corrected R-squared and AIC
rsq = 1 - rss/c(1);
mult = (en-1)/double(idf);
arsq = 1 - mult*(1-rsq);
aic = en*log(rss/en) + 2*double(irank);

% Display results
if svd
  fprintf('Model not of full rank, rank = %4d\n\n', irank);
end
fprintf('Residual sum of squares = %12.4e\n', rss);
fprintf('Degrees of freedom      = %4d\n', idf);
fprintf('R-squared               = %12.4e\n', rsq);
fprintf('Adjusted R-squared      = %12.4e\n', arsq);
fprintf('AIC                     = %12.4e\n', aic);
fprintf('\nVariable   Parameter estimate   Standard error\n\n');
ivar = double([1:ip]');
fprintf('%6d%20.4e%20.4e\n',[ivar b se]');
fprintf('\n   Obs          Residuals              H\n\n');
ivar = double([1:n]');
fprintf('%6d%20.4e%20.4e\n',[ivar res h]');

g02da example results

Model not of full rank, rank =    4

Residual sum of squares =   2.2227e+01
Degrees of freedom      =    8
R-squared               =   7.0042e-01
Adjusted R-squared      =   5.8808e-01
AIC                     =   1.5397e+01

Variable   Parameter estimate   Standard error

     1          3.0557e+01          3.8494e-01
     2          5.4467e+00          8.3896e-01
     3          6.7433e+00          8.3896e-01
     4          1.1047e+01          8.3896e-01
     5          7.3200e+00          8.3896e-01

   Obs          Residuals              H

     1         -2.3733e+00          3.3333e-01
     2          1.7433e+00          3.3333e-01
     3          8.8000e-01          3.3333e-01
     4         -1.4333e-01          3.3333e-01
     5          1.4333e-01          3.3333e-01
     6         -1.4700e+00          3.3333e-01
     7         -1.8867e+00          3.3333e-01
     8          5.7667e-01          3.3333e-01
     9          1.3167e+00          3.3333e-01
    10          1.7967e+00          3.3333e-01
    11         -1.1733e+00          3.3333e-01
    12          5.9000e-01          3.3333e-01

On entry,	$n < 2$ ,
or	$m < 1$ ,
or	$ldx < n$ ,
or	$ldq < n$ ,
or	$tol < 0.0$ ,
or	$ip \leq 0$ ,
or	$ip > n$ .

On entry,	$mean_p \neq'M'$ or $'Z'$ ,
or	$weight \neq'W'$ or $'U'$ .

On entry,	a value of $isx < 0$ ,
or	the value of ip is incompatible with the values of mean_p and isx,
or	ip is greater than the effective number of observations.

NAG Toolbox: nag_correg_linregm_fit (g02da)

▸▿ Contents

Purpose

Syntax

Description

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Accuracy

Further Comments

Example