NAG CL Interfaceg02dcc (linregm_​obs_​edit)

Settings help

CL Name Style:

1Purpose

g02dcc adds or deletes an observation from a general regression model fitted by g02dac.

2Specification

 #include
 void g02dcc (Nag_UpdateObserv update, Nag_IncludeMean mean, Integer m, const Integer sx[], double q[], Integer tdq, Integer ip, const double x[], Integer nr, Integer tdx, Integer ix, double y, const double wt[], double *rss, NagError *fail)
The function may be called by the names: g02dcc, nag_correg_linregm_obs_edit or nag_regsn_mult_linear_addrem_obs.

3Description

g02dac fits a general linear regression model to a dataset. You may wish to change the model by either adding or deleting an observation from the dataset. g02dcc takes the results from g02dac and makes the required changes to the vector $c$ and the upper triangular matrix $R$ produced by g02dac. The regression coefficients, standard errors and the variance-covariance matrix of the regression coefficients can be obtained from g02ddc after all required changes to the dataset have been made.
g02dac performs a $QR$ decomposition on the (weighted) $X$ matrix of independent variables. To add a new observation to a model with $p$ arguments the upper triangular matrix $R$ and vector ${c}_{1}$, the first $p$ elements of $c$, are augmented by the new observation on independent variables in ${x}^{\mathrm{T}}$ and dependent variable $y$. Givens rotations are then used to restore the upper triangular form.
 $( R : c 1 x y ) ⟶ ( R * c 1 * y * 0 )$
To delete an observation Givens rotations are applied to give:
 $( R c 1 ) ⟶ ( R * c 1 * x y )$
Note: only the $R$ and upper part of the $c$ are updated, the remainder of the $Q$ matrix is unchanged.

4References

Golub G H and Van Loan C F (1996) Matrix Computations (3rd Edition) Johns Hopkins University Press, Baltimore
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25

5Arguments

1: $\mathbf{update}$Nag_UpdateObserv Input
On entry: indicates if an observation is to be added or deleted.
${\mathbf{update}}=\mathrm{Nag_ObservAdd}$
The observation is added.
${\mathbf{update}}=\mathrm{Nag_ObservDel}$
The observation is deleted.
Constraint: ${\mathbf{update}}=\mathrm{Nag_ObservAdd}$ or $\mathrm{Nag_ObservDel}$.
2: $\mathbf{mean}$Nag_IncludeMean Input
On entry: indicates if a mean has been used in the model.
${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$
A mean term or intercept will have been included in the model by g02dac.
${\mathbf{mean}}=\mathrm{Nag_MeanZero}$
A model with no mean term or intercept will have been fitted by g02dac.
Constraint: ${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$ or $\mathrm{Nag_MeanZero}$.
3: $\mathbf{m}$Integer Input
On entry: the total number of independent variables in the dataset.
Constraint: ${\mathbf{m}}\ge 1$.
4: $\mathbf{sx}\left[{\mathbf{m}}\right]$const Integer Input
On entry: if ${\mathbf{sx}}\left[\mathit{j}\right]$ is greater than $0$, then the value contained in ${\mathbf{x}}\left[{\mathbf{tdx}}×\left({\mathbf{ix}}-1\right)+\mathit{j}\right]$ is to be included as a value of ${x}^{\mathrm{T}}$, an observation on an independent variable, for $\mathit{j}=0,1,\dots ,m-1$.
Constraint: if ${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$, then exactly ${\mathbf{ip}}-1$ elements of sx must be $>0$ and if ${\mathbf{mean}}=\mathrm{Nag_MeanZero}$, then exactly ip elements of sx must be $>0$.
5: $\mathbf{q}\left[{\mathbf{ip}}×{\mathbf{tdq}}\right]$double Input/Output
Note: the $\left(i,j\right)$th element of the matrix $Q$ is stored in ${\mathbf{q}}\left[\left(i-1\right)×{\mathbf{tdq}}+j-1\right]$.
On entry: q must be array q as output by g02dac, g02dec, g02dfc, or a previous call to g02dcc.
On exit: the first ip elements of the first column of q will contain ${c}_{1}^{*}$, the upper triangular part of columns 2 to ${\mathbf{ip}}+1$ will contain ${R}^{*}$, the remainder is unchanged.
6: $\mathbf{tdq}$Integer Input
On entry: the stride separating matrix column elements in the array q.
Constraint: ${\mathbf{tdq}}\ge {\mathbf{ip}}+1$.
7: $\mathbf{ip}$Integer Input
On entry: the number of linear terms in general linear regression model (including mean if there is one).
Constraint: ${\mathbf{ip}}\ge 1$.
8: $\mathbf{x}\left[{\mathbf{nr}}×{\mathbf{tdx}}\right]$const double Input
On entry: the ip values for the dependent variables of the observation to be added or deleted, ${x}^{\mathrm{T}}$. The positions of the values x extracted depends on ix and tdx.
9: $\mathbf{nr}$Integer Input
On entry: the number of rows of the notional two-dimensional array x.
Constraint: ${\mathbf{nr}}\ge 1$.
10: $\mathbf{tdx}$Integer Input
On entry: the stride separating matrix column elements in the array x.
Constraint: ${\mathbf{tdx}}\ge {\mathbf{m}}$.
11: $\mathbf{ix}$Integer Input
On entry: the row of the notional two-dimensional array x that contains the values for the dependent variables of the observation to be added or deleted.
Constraint: $1\le {\mathbf{ix}}\le nr$.
12: $\mathbf{y}$double Input
On entry: the value of the dependent variable for the observation to be added or deleted, $y$.
13: $\mathbf{wt}\left[1\right]$const double Input
On entry: if the new observation is to be weighted, then wt must contain the weight to be used with the new observation. If ${\mathbf{wt}}\left[0\right]=0.0$, then the observation is not included in the model. If the new observation is to be unweighted, then wt must be supplied as NULL.
Constraint: if the new observation is to be weighted ${\mathbf{wt}}\left[0\right]\ge 0.0$.
14: $\mathbf{rss}$double * Input/Output
On entry: the value of the residual sums of squares for the original set of observations.
Constraint: ${\mathbf{rss}}\ge 0.0$.
On exit: the updated values of the residual sums of squares.
Note: this will only be valid if the model is of full rank.
15: $\mathbf{fail}$NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6Error Indicators and Warnings

NE_2_INT_ARG_GT
On entry, ${\mathbf{ix}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{nr}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{ix}}\le {\mathbf{nr}}$.
NE_2_INT_ARG_LT
On entry, ${\mathbf{tdq}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{ip}}+1=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{tdq}}\ge {\mathbf{ip}}+1$.
On entry, ${\mathbf{tdx}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{tdx}}\ge {\mathbf{m}}$.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
On entry, mean had an illegal value.
On entry, update had an illegal value.
NE_INT_ARG_LT
On entry, ${\mathbf{ip}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ip}}\ge 1$.
On entry, ${\mathbf{ix}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ix}}\ge 1$.
On entry, ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{m}}\ge 1$.
On entry, ${\mathbf{nr}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{nr}}\ge 1$.
NE_IP_INCOMP_WITH_SX
On entry, for ${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$, number of nonzero values of sx must be equal to ${\mathbf{ip}}-1$: number of nonzero values of ${\mathbf{sx}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{ip}}-1=⟨\mathit{\text{value}}⟩$.
On entry, for ${\mathbf{mean}}=\mathrm{Nag_MeanZero}$, number of nonzero values of sx must be equal to ip: number of nonzero values of ${\mathbf{sx}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{ip}}=⟨\mathit{\text{value}}⟩$.
NE_MAT_NOT_UPD
The $R$ matrix could not be updated: to, either, delete nonexistent observation, or, add an observation to $R$ matrix with zero diagonal element.
NE_REAL_ARG_LT
On entry, ${\mathbf{rss}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{rss}}\ge 0.0$.
On entry, ${\mathbf{wt}}\left[0\right]=⟨\mathit{\text{value}}⟩$
Constraint: ${\mathbf{wt}}\left[0\right]\ge 0.0$.
The rss could not be updated because the input rss was less than the calculated decrease in rss when the new observation was deleted.

7Accuracy

Higher accuracy is achieved by updating the $R$ matrix rather than the traditional methods of updating X'X.

8Parallelism and Performance

g02dcc is not threaded in any implementation.

Care should be taken with the use of this function.
1. (a)It is possible to delete observations which were not included in the original model.
2. (b)If several additions/deletions have been performed you are advised to recompute the regression using g02dac.
3. (c)Adding or deleting observations can alter the rank of the model. Such changes will only be detected when a call to g02ddc has been made. g02ddc should also be used to compute the new residual sum of squares when the model is not of full rank.
g02dcc may also be used after g02dec and g02dfc.

10Example

A dataset consisting of 12 observations with four independent variables is read in and a general linear regression model fitted by g02dac and parameter estimates printed. The last observation is then dropped and the parameter estimates recalculated, using g02ddc, and printed.

10.1Program Text

Program Text (g02dcce.c)

10.2Program Data

Program Data (g02dcce.d)

10.3Program Results

Program Results (g02dcce.r)