g02de:: Correlation and Regression Analysis (NAG Toolbox)

A linear regression model may be built up by adding new independent variables to an existing model. nag_correg_linregm_var_add (g02de) updates the

Q R

decomposition used in the computation of the linear regression model. The

Q R

decomposition may come from nag_correg_linregm_fit (g02da) or a previous call to nag_correg_linregm_var_add (g02de). The general linear regression model is defined by

y = X β + ε,

where	$y$ is a vector of $n$ observations on the dependent variable,
	$X$ is an $n$ by $p$ matrix of the independent variables of column rank $k$ ,
	$β$ is a vector of length $p$ of unknown arguments,
and	$ε$ is a vector of length $n$ of unknown random errors such that $var ε = V σ^{2}$ , where $V$ is a known diagonal matrix.

The parameter estimates may be found by computing a

Q R

decomposition of

X

(or

W^{\frac{1}{2}} X

in the weighted case), i.e.,

X = Q R^{*} (or W^{\frac{1}{2}} X = Q R^{*}),

where

R^{*} = (\begin{array}{l} R \\ 0 \end{array})

and

R

is a

p

p

upper triangular matrix and

Q

is an

n

n

orthogonal matrix.

To add a new independent variable,

x_{p + 1}

R

and

c

have to be updated. The matrix

Q_{p + 1}

is found such that

Q_{p + 1}^{T} [R : Q^{T} x_{p + 1}]

(or

Q_{p + 1}^{T} [R : Q^{T} W^{\frac{1}{2}} x_{p + 1}]

) is upper triangular. The vector

c

is then updated by multiplying by

Q_{p + 1}^{T}

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

Accuracy

Further Comments

It should be noted that the residual sum of squares produced by nag_correg_linregm_var_add (g02de) may not be correct if the model to which the new independent variable is added is not of full rank. In such a case nag_correg_linregm_update (g02dd) should be used to calculate the residual sum of squares.

Example

A dataset consisting of

12

observations is read in. The four independent variables are stored in the array x while the dependent variable is read into the first column of q. If the character variable

mean

indicates that a mean should be included in the model a variable taking the value

1.0

for all observations is set up and fitted. Subsequently, one variable at a time is selected to enter the model as indicated by the input value of

indx

. After the variable has been added the parameter estimates are calculated by nag_correg_linregm_update (g02dd) and the results printed. This is repeated until the input value of

indx

0

function g02de_example


fprintf('g02de example results\n\n');

x = [1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0;
     1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5;
     0.0  0.0  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0  1.0  1.0;
     0.0  0.0  0.0  0.0  0.0  0.0  4.0  4.5  5.0  5.5  6.0  6.5;
     1.4  2.2  4.5  6.1  7.1  7.7  8.3  8.6  8.8  9.0  9.3  9.2];
[m,n] = size(x);
y = [4.32; 5.21; 6.49; 7.10; 7.94; 8.53;
     8.84; 9.02; 9.27; 9.43; 9.68; 9.83]; 

q = zeros(n,m+1);
q(:,1) = y;
p = zeros(m*(m+2),1);;
ip = int64(0);

% Add variables to model one at a time
for j = 1:m
  [q, p, rss, ifail] = g02de( ...
                              ip, q, p, x(j,1:n));
  ip = ip + 1;
  fprintf('\nVariable %4d added\n',ip);

  % Calculate parameter estimates
  rsst = 0;
  [rsst, idf, b, se, covar, svd, irank, p2, ifail] = ...
  g02dd(int64(n), ip, q, rsst);

  if svd
    fprintf('Model not of full rank\n\n');
  end
  fprintf('Residual sum of squares = %12.4e\n', rsst);
  fprintf('Degrees of freedom      = %4d\n', idf);
  fprintf('\nVariable   Parameter estimate   Standard error\n\n');
  ivar = double([1:ip]');
  fprintf('%6d%20.4e%20.4e\n',[ivar b se]');
end

g02de example results


Variable    1 added
Residual sum of squares =   3.6267e+01
Degrees of freedom      =   11

Variable   Parameter estimate   Standard error

     1          7.9717e+00          5.2416e-01

Variable    2 added
Residual sum of squares =   4.0164e+00
Degrees of freedom      =   10

Variable   Parameter estimate   Standard error

     1          4.4100e+00          4.3756e-01
     2          9.4979e-01          1.0599e-01

Variable    3 added
Residual sum of squares =   3.8872e+00
Degrees of freedom      =    9

Variable   Parameter estimate   Standard error

     1          4.2236e+00          5.6734e-01
     2          1.0554e+00          2.2217e-01
     3         -4.1962e-01          7.6695e-01

Variable    4 added
Residual sum of squares =   1.8702e-01
Degrees of freedom      =    8

Variable   Parameter estimate   Standard error

     1          2.7605e+00          1.7592e-01
     2          1.7057e+00          7.3100e-02
     3          4.4575e+00          4.2676e-01
     4         -1.3006e+00          1.0338e-01

Variable    5 added
Residual sum of squares =   8.4066e-02
Degrees of freedom      =    7

Variable   Parameter estimate   Standard error

     1          3.1440e+00          1.8181e-01
     2          9.0748e-01          2.7761e-01
     3          2.0790e+00          8.6804e-01
     4         -6.1589e-01          2.4530e-01
     5          2.9224e-01          9.9810e-02

On entry,	$n < 1$ ,
or	$ip < 0$ ,
or	$ip \geq n$ ,
or	$ldq < n$ ,
or	$tol \leq 0.0$ ,
or	$weight \neq'U'$ or $'W'$ .

NAG Toolbox: nag_correg_linregm_var_add (g02de)

▸▿ Contents

Purpose

Syntax

Description