# NAG CL Interfaceg02cac (linregs_​const)

Settings help

CL Name Style:

## 1Purpose

g02cac performs a simple linear regression with or without a constant term. The data is optionally weighted.

## 2Specification

 #include
 void g02cac (Nag_SumSquare mean, Integer n, const double x[], const double y[], const double wt[], double *a, double *b, double *a_serr, double *b_serr, double *rsq, double *rss, double *df, NagError *fail)
The function may be called by the names: g02cac, nag_correg_linregs_const or nag_simple_linear_regression.

## 3Description

g02cac fits a straight line model of the form,
 $E (y) = a + bx ,$
where $E\left(y\right)$ is the expected value of the variable $y$, to the data points
 $( x 1 , y 1 ) , ( x 2 , y 2 ) , … , ( x n , y n ) ,$
such that
 $y i = a + bx i + e i , i = 1 , 2 , … , n (n>2) .$
where the ${e}_{i}$ values are independent random errors. The $i$th data point may have an associated weight ${w}_{i}$, these may be used either in the situation when var $\left({\epsilon }_{i}\right)={\sigma }^{2}/{w}_{i}$ or if observations have to be removed from the regression by having zero weight or have been observed with frequency ${w}_{i}$.
The regression coefficient, $b$, and the regression constant, $a$ are estimated by minimizing
 $∑ i=1 n w i e i 2 ,$
if the weights option is not selected then ${w}_{i}=1.0$.
The following statistics are computed:
• the estimate of regression constant $\stackrel{^}{a}=\overline{y}-\stackrel{^}{b}\overline{x}$,
• the estimate of regression coefficient $\stackrel{^}{b}=\frac{\sum {w}_{i}\left({x}_{i}-\overline{x}\right)\left({y}_{i}-\overline{y}\right)}{\sum {w}_{i}{\left({x}_{i}-\overline{x}\right)}^{2}}$,
• the residual sum of squares $rss=\sum {w}_{i}{\left({y}_{i}-{\stackrel{^}{y}}_{i}\right)}^{2}$,
where the weighted means $\overline{x}$ and $\overline{y}$ are
 $x ¯ = ∑ w i x i ∑ w i and y ¯ = ∑ w i y i ∑ w i .$
The number of degrees of freedom associated with $rss$ is
• $df=\sum {w}_{i}-2$ where ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$
• $df=\sum {w}_{i}-1$ where ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$
Note: the weights should be scaled to give the correct degrees of freedom in the case var $\left({\epsilon }_{i}\right)={\sigma }^{2}/{w}_{i}$.
The ${R}^{2}$ value or coefficient of determination
 $R 2 = ∑ w i ( y ^ i - y ¯ i ) 2 ∑ w i ( y i - y ¯) 2 = ∑ w i ( y i - y ¯) 2 - rss ∑ w i ( y i - y ¯) 2 .$
This measures the proportion of the total variation about the mean $\overline{y}$ that can be explained by the regression.
The standard error for the regression constant $\stackrel{^}{a}$
 $a_serr = rss df ( 1 ∑ w i + ( x ¯) 2 ∑ w i ( x i - x ¯) 2 ) = rss df 1 ∑ w i ∑ w i x i 2 ∑ w i ( x i - x ¯) 2 .$
The standard error for the regression coefficient $\stackrel{^}{b}$
 $b_serr = rss df ∑ w i ( x i - x ¯) 2 .$
Similar formulae can be derived for the case when the line goes through the origin, that is $a=0$.

## 4References

Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley

## 5Arguments

1: $\mathbf{mean}$Nag_SumSquare Input
On entry: indicates whether g02cac is to include a constant term in the regression.
${\mathbf{mean}}=\mathrm{Nag_AboutMean}$
The regression constant $a$ is included.
${\mathbf{mean}}=\mathrm{Nag_AboutZero}$
The regression constant $a$ is not included, i.e., $a=0$.
Constraint: ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$ or $\mathrm{Nag_AboutZero}$.
2: $\mathbf{n}$Integer Input
On entry: $n$, the number of observations.
Constraints:
• if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$, ${\mathbf{n}}\ge 2$;
• if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$, ${\mathbf{n}}\ge 1$.
3: $\mathbf{x}\left[{\mathbf{n}}\right]$const double Input
On entry: the values of the independent variable with the $\mathit{i}$th value stored in $x\left[\mathit{i}-1\right]$, for $\mathit{i}=1,2,\dots ,n$.
Constraint: all the values of $x$ must not be identical.
4: $\mathbf{y}\left[{\mathbf{n}}\right]$const double Input
On entry: the values of the dependent variable with the $\mathit{i}$th value stored in $y\left[\mathit{i}-1\right]$, for $\mathit{i}=1,2,\dots ,n$.
Constraint: all the values of $y$ must not be identical.
5: $\mathbf{wt}\left[{\mathbf{n}}\right]$const double Input
On entry: if weighted estimates are required then wt must contain the weights to be used in the weighted regression. Usually ${\mathbf{wt}}\left[i-1\right]$ will be an integral value corresponding to the number of observations associated with the $i$th data point, or zero if the $i$th data point is to be ignored. The sum of the weights, therefore, represents the effective total number of observations used to create the regression line.
If weights are not provided then wt must be set to NULL and the effective number of observations is n.
Constraint: if ${\mathbf{wt}}\phantom{\rule{0.25em}{0ex}}\text{is not}\phantom{\rule{0.25em}{0ex}}\mathbf{NULL}$, ${\mathbf{wt}}\left[\mathit{i}-1\right]=0.0$, for $\mathit{i}=1,2,\dots ,n$.
6: $\mathbf{a}$double * Output
On exit: if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$ then a is the regression constant $\stackrel{^}{a}$, otherwise a is set to zero.
7: $\mathbf{b}$double * Output
On exit: the regression coefficient $\stackrel{^}{b}$.
8: $\mathbf{a_serr}$double * Output
On exit: the standard error of the regression constant $\stackrel{^}{a}$.
9: $\mathbf{b_serr}$double * Output
On exit: the standard error of the regression coefficient $\stackrel{^}{b}$.
10: $\mathbf{rsq}$double * Output
On exit: the coefficient of determination, ${R}^{2}$.
11: $\mathbf{rss}$double * Output
On exit: the sum of squares of the residuals about the regression.
12: $\mathbf{df}$double * Output
On exit: the degrees of freedom associated with the residual sum of squares.
13: $\mathbf{fail}$NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

## 6Error Indicators and Warnings

On entry, argument mean had an illegal value.
NE_INT_ARG_LT
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}\ge 1$
if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$.
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}\ge 2$
if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$.
NE_NEG_WEIGHT
On entry, at least one of the weights is negative.
NE_SW_LOW
On entry, the sum of elements of wt must be greater than $1.0$ if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$ or greater than $2.0$ if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$.
NE_WT_LOW
On entry, wt must contain at least 1 positive element if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$ or at least 2 positive elements if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$.
NE_X_OR_Y_IDEN
On entry, all elements of x and/or y are equal.
NE_ZERO_DOF_RESID
On entry, the degrees of freedom for the residual are zero, i.e., the designated number of arguments $\text{}=\text{}$ the effective number of observations.
Residual sum of squares is zero, i.e., a perfect fit was obtained.

## 7Accuracy

The computations are believed to be stable.

## 8Parallelism and Performance

g02cac is not threaded in any implementation.

The time taken by the function depends on $n$. The function uses a two-pass algorithm.

## 10Example

A program to calculate regression constants, $\stackrel{^}{a}$ and $\stackrel{^}{b}$, the standard error of the regression constants, the regression coefficient of determination and the degrees of freedom about the regression.

### 10.1Program Text

Program Text (g02cace.c)

### 10.2Program Data

Program Data (g02cace.d)

### 10.3Program Results

Program Results (g02cace.r)