NAG C Library Function Document

nag_regsn_mult_linear_newyvar (g02dgc) calculates the estimates of the arguments of a general linear regression model for a new dependent variable after a call to nag_regsn_mult_linear (g02dac).

2

Specification

#include <nag.h>

#include <nagg02.h>

void	nag_regsn_mult_linear_newyvar (Integer n, const double wt[], double rss, Integer ip, Integer rank, double cov[], double q[], Integer tdq, Nag_Boolean svd, const double p[], const double y[], double b[], double se[], double res[], const double com_ar[], NagError fail)

3

Description

nag_regsn_mult_linear_newyvar (g02dgc) uses the results given by nag_regsn_mult_linear (g02dac) to fit the same set of independent variables to a new dependent variable.

nag_regsn_mult_linear (g02dac) computes a

Q R

decomposition of the matrix of

p

independent variables and also, if the model is not of full rank, a singular value decomposition (SVD). These results can be used to compute estimates of the arguments for a general linear model with a new dependent variable. The

Q R

decomposition leads to the formation of an upper triangular

p

p

matrix

R

and an

n

n

orthogonal matrix

Q

. In addition the vector

c = Q^{T} y

(or

Q^{T} W^{1 / 2} y

) is computed. For a new dependent variable,

y_{new}

, nag_regsn_mult_linear_newyvar (g02dgc) computes a new value of

c = Q^{T} y_{new}

Q^{T} W^{1 / 2} y_{new}

R

is of full rank, then the least squares parameter estimates,

\hat{β}

, are the solution to:

R \hat{β} = c_{1}

, where

c_{1}

is the first

p

elements of

c

R

is not of full rank, then nag_regsn_mult_linear (g02dac) will have computed the SVD of

R

R = Q_{*} (\begin{matrix} D & 0 \\ 0 & 0 \end{matrix}) P^{T}

where

D

is a

k

k

diagonal matrix with nonzero diagonal elements,

k

being the rank of

R

, and

Q_{*}

and

P

are

p

p

orthogonal matrices. This gives the solution

\hat{β} = P_{1} D^{- 1} Q_{*_{1}}^{T} c_{1}

P_{1}

being the first

k

columns of

P

, i.e.,

P = (P_{1} P_{0})

and

Q_{*_{1}}

being the first

k

columns of

Q_{*}

. Details of the SVD are made available by nag_regsn_mult_linear (g02dac) in the form of the matrix

P^{*}

P^{*} = (\begin{matrix} D^{- 1} P_{1}^{T} \\ P_{0}^{T} \end{matrix}) .

The matrix

Q_{*}

is made available through the com_ar argument of nag_regsn_mult_linear (g02dac).

In addition to parameter estimates, the new residuals are computed and the variance-covariance matrix of the parameter estimates are found by scaling the variance-covariance matrix for the original regression.

4

References

Golub G H and Van Loan C F (1996) Matrix Computations (3rd Edition) Johns Hopkins University Press, Baltimore

Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25

Searle S R (1971) Linear Models Wiley

5

Arguments

1: $n$ – IntegerInput

On entry: the number of observations,

n

Constraint:

n \geq 2

2: $wt [n]$ – const doubleInput

On entry: optionally, the weights to be used in the weighted regression.

wt [i - 1] = 0.0

, then the

i

th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. The values of res and h will be set to zero for observations with zero weights (see nag_regsn_mult_linear (g02dac)).

If weights are not provided then wt must be set to NULL and the effective number of observations is n.

Constraint: if

wt is not NULL

wt [i - 1] = 0.0

, for

i = 1, 2, \dots, n

3: $rss$ – double *Input/Output

On entry: the residual sum of squares for the original dependent variable.

On exit: the residual sum of squares for the new dependent variable.

4: $ip$ – IntegerInput

On entry: the number

p

of independent variables in the model (including the mean if fitted).

Constraint:

1 \leq ip \leq n

5: $rank$ – IntegerInput

On entry: the rank of the independent variables, as given by nag_regsn_mult_linear (g02dac).

Constraint:

rank > 0

and if

svd = Nag_FALSE

rank = ip

otherwise

rank \leq ip

6: $cov [ip \times (ip + 1) / 2]$ – doubleInput/Output

On entry: the covariance matrix of the parameter estimates as given by nag_regsn_mult_linear (g02dac).

On exit: the upper triangular part of the variance-covariance matrix of the ip parameter estimates given in b. They are stored packed by column, i.e., the covariance between the parameter estimate given in

b [i]

and the parameter estimate given in

b [j]

j \geq i

, is stored in

cov [j (j + 1) / 2 + i]

for

i = 0, 1, \dots, ip - 1

and

j = i, i + 1, \dots, ip - 1

7: $q [n \times tdq]$ – doubleInput/Output

Note: the

(i, j)

th element of the matrix

Q

is stored in

q [(i - 1) \times tdq + j - 1]

On entry: the results of the

Q R

decomposition as returned by nag_regsn_mult_linear (g02dac).

On exit: the first column of q contains the new values of

c

, the remainder of q will be unchanged.

8: $tdq$ – IntegerInput

On entry: the stride separating matrix column elements in the array q.

Constraint:

tdq \geq ip + 1

9: $svd$ – Nag_BooleanInput

On entry: indicates if a singular value decomposition was used by nag_regsn_mult_linear (g02dac).

$svd = Nag_TRUE$: A singular value decomposition was used by nag_regsn_mult_linear (g02dac).
$svd = Nag_FALSE$: A singular value decomposition was not used by nag_regsn_mult_linear (g02dac).

10: $p [2 \times ip + ip \times ip]$ – const doubleInput

On entry: details of the

Q R

decomposition and SVD, if used, as returned in array p by nag_regsn_mult_linear (g02dac).

svd = Nag_FALSE

, only the first ip elements of p are used, these will contain details of the Householder vector in the

Q R

decomposition (Sections 2.2.1 and 3.3.6 in the f08 Chapter Introduction).

svd = Nag_TRUE

, the first ip elements of p will contain details of the Householder vector in the

Q R

decomposition (Sections 2.2.1 and 3.3.6 in the f08 Chapter Introduction) and the next ip elements of p contain singular values. The following ip by ip elements contain the matrix

P^{*}

stored by rows.

11: $y [n]$ – const doubleInput

On entry: the new dependent variable

y_{new}

12: $b [ip]$ – doubleOutput

On exit:

b [i]

i = 0, 1, \dots, ip - 1

contain the least squares estimates of the arguments of the regression model,

\hat{β}

13: $se [ip]$ – doubleOutput

On exit:

se [i]

i = 0, 1, \dots, ip - 1

contain the standard errors of the ip parameter estimates given in b.

14: $res [n]$ – doubleOutput

On exit: the residuals for the new regression model.

15: $com_ar [ip + ip \times ip]$ – const doubleInput

On entry: if

svd = Nag_TRUE

, com_ar must be unaltered from the previous call to nag_regsn_mult_linear (g02dac).

16: $fail$ – NagError *Input/Output

The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6

Error Indicators and Warnings

NE_2_INT_ARG_LT: On entry, $n = 〈value〉$ while $ip = 〈value〉$ . These arguments must satisfy $n \geq ip$ .

On entry, $tdq = 〈value〉$ while $ip + 1 = 〈value〉$ . These arguments must satisfy $tdq \geq ip + 1$ .
NE_INT_ARG_LE: On entry, $rank = 〈value〉$ .
Constraint: $rank > 0$ .
NE_INT_ARG_LT: On entry, $ip = 〈value〉$ .
Constraint: $ip \geq 1$ .
NE_REAL_ARG_LE: On entry, rss must not be less than or equal to 0.0: $rss = 〈value〉$ .
NE_REAL_ARG_LT: On entry, $wt [〈value〉]$ must not be less than 0.0: $wt [〈value〉] = 〈value〉$ .
NE_SVD_RANK_GT_IP: On entry, the Boolean variable, svd, is Nag_TRUE and rank must not be greater than ip: rank = $〈value〉$ , $ip = 〈value〉$ .
NE_SVD_RANK_NE_IP: On entry, the Boolean variable, svd, is Nag_FALSE and rank must be equal to ip: $rank = 〈value〉$ , $ip = 〈value〉$ .

7

Accuracy

The same accuracy as nag_regsn_mult_linear (g02dac) is obtained.

8

Parallelism and Performance

nag_regsn_mult_linear_newyvar (g02dgc) is not threaded in any implementation.

9

Further Comments

The values of the leverages,

h_{i}

, are unaltered by a change in the dependent variable so a call to nag_regsn_std_resid_influence (g02fac) can be made using the value of h from nag_regsn_mult_linear (g02dac).

9.1

Internal Changes

Internal changes have been made to this function as follows:

At Mark 26.1: The documented minimum length of the array argument com_ar was too large. The documented minimum length was given as $ip \times ip \times 5 \times (ip - 1)$ but the actual minimum length is $ip \times ip + ip$ which is much less for non-trivial cases, $ip > 1$ .
In addition, provided example programs that called nag_regsn_mult_linear_newyvar (g02dgc) allocated lengths of $ip \times ip + 5 \times (ip - 1)$ for the array argument, which was also larger than necessary for non-trivial problems.

The nag_regsn_mult_linear_newyvar (g02dgc) routine document has been updated to document the actual minimum length requirement for com_ar, and those example programs that call nag_regsn_mult_linear_newyvar (g02dgc) have been updated to allocate the actual minimum length required for com_ar.

For details of all known issues which have been reported for the NAG Library please refer to the Known Issues list.

10

Example

A dataset consisting of 12 observations with four independent variables and two dependent variables is read in. A model with all four independent variables is fitted to the first dependent variable by nag_regsn_mult_linear (g02dac) and the results printed. The model is then fitted to the second dependent variable by nag_regsn_mult_linear_newyvar (g02dgc) and those results printed.

NAG C Library Function Document

nag_regsn_mult_linear_newyvar (g02dgc)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

9.1 Internal Changes

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Parallelism and Performance

9

Further Comments

9.1

Internal Changes

10

Example

10.1

Program Text

10.2

Program Data

10.3

Program Results