NAG C Library Function Document

nag_simple_linear_regression (g02cac)

1
Purpose

nag_simple_linear_regression (g02cac) performs a simple linear regression with or without a constant term. The data is optionally weighted.

2
Specification

#include <nag.h>
#include <nagg02.h>
void  nag_simple_linear_regression (Nag_SumSquare mean, Integer n, const double x[], const double y[], const double wt[], double *a, double *b, double *a_serr, double *b_serr, double *rsq, double *rss, double *df, NagError *fail)

3
Description

nag_simple_linear_regression (g02cac) fits a straight line model of the form,
E y = a + bx ,  
where E y  is the expected value of the variable y , to the data points
x 1 , y 1 , x 2 , y 2 , , x n , y n ,  
such that
y i = a + bx i + e i , i = 1 , 2 , , n n>2 .  
where the e i  values are independent random errors. The i th data point may have an associated weight w i , these may be used either in the situation when var ε i = σ 2 / w i  or if observations have to be removed from the regression by having zero weight or have been observed with frequency w i .
The regression coefficient, b , and the regression constant, a  are estimated by minimizing
i=1 n w i e i 2 ,  
if the weights option is not selected then w i = 1.0 .
The following statistics are computed: where the weighted means x -  and y -  are
x - = w i x i w i   and   y - = w i y i w i .  
The number of degrees of freedom associated with rss  is
Note: the weights should be scaled to give the correct degrees of freedom in the case var ε i = σ 2 / w i .
The R 2  value or coefficient of determination
R 2 = w i y ^ i - y - i 2 w i y i - y - 2 = w i y i - y - 2 - rss w i y i - y - 2 .  
This measures the proportion of the total variation about the mean y -  that can be explained by the regression.
The standard error for the regression constant a ^  
a_serr = rss df 1 w i + x - 2 w i x i - x - 2 = rss df 1 w i w i x i 2 w i x i - x - 2 .  
The standard error for the regression coefficient b ^  
b_serr = rss df w i x i - x - 2 .  
Similar formulae can be derived for the case when the line goes through the origin, that is a=0 .

4
References

Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley

5
Arguments

1:     mean Nag_SumSquareInput
On entry: indicates whether nag_simple_linear_regression (g02cac) is to include a constant term in the regression.
mean=Nag_AboutMean
The regression constant a  is included.
mean=Nag_AboutZero
The regression constant a  is not included, i.e., a=0 .
Constraint: mean=Nag_AboutMean or Nag_AboutZero.
2:     n IntegerInput
On entry: n , the number of observations.
Constraints:
  • if mean=Nag_AboutMean, n2 ;
  • if mean=Nag_AboutZero, n1 .
3:     x[n] const doubleInput
On entry: the values of the independent variable with the i th value stored in x i-1 , for i=1,2,,n.
Constraint: all the values of x  must not be identical.
4:     y[n] const doubleInput
On entry: the values of the dependent variable with the i th value stored in y i-1 , for i=1,2,,n.
Constraint: all the values of y  must not be identical.
5:     wt[n] const doubleInput
On entry: if weighted estimates are required then wt must contain the weights to be used in the weighted regression. Usually wt[i-1]  will be an integral value corresponding to the number of observations associated with the i th data point, or zero if the i th data point is to be ignored. The sum of the weights therefore represents the effective total number of observations used to create the regression line.
If weights are not provided then wt must be set to NULL and the effective number of observations is n.
Constraint: if wtis notNULL, wt[i-1] = 0.0 , for i=1,2,,n.
6:     a double *Output
On exit: if mean=Nag_AboutMean then a is the regression constant a ^ , otherwise a is set to zero.
7:     b double *Output
On exit: the regression coefficient b ^ .
8:     a_serr double *Output
On exit: the standard error of the regression constant a ^ .
9:     b_serr double *Output
On exit: the standard error of the regression coefficient b ^ .
10:   rsq double *Output
On exit: the coefficient of determination, R 2 .
11:   rss double *Output
On exit: the sum of squares of the residuals about the regression.
12:   df double *Output
On exit: the degrees of freedom associated with the residual sum of squares.
13:   fail NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6
Error Indicators and Warnings

NE_BAD_PARAM
On entry, argument mean had an illegal value.
NE_INT_ARG_LT
On entry, n=value.
Constraint: n1
if mean=Nag_AboutZero.
On entry, n=value.
Constraint: n2
if mean=Nag_AboutMean.
NE_NEG_WEIGHT
On entry, at least one of the weights is negative.
NE_SW_LOW
On entry, the sum of elements of wt must be greater than 1.0 if mean=Nag_AboutZero or greater than 2.0 if mean=Nag_AboutMean.
NE_WT_LOW
On entry, wt must contain at least 1 positive element if mean=Nag_AboutZero or at least 2 positive elements if mean=Nag_AboutMean.
NE_X_OR_Y_IDEN
On entry, all elements of x and/or y are equal.
NE_ZERO_DOF_RESID
On entry, the degrees of freedom for the residual are zero, i.e., the designated number of arguments =  the effective number of observations.
NW_RSS_EQ_ZERO
Residual sum of squares is zero, i.e., a perfect fit was obtained.

7
Accuracy

The computations are believed to be stable.

8
Parallelism and Performance

nag_simple_linear_regression (g02cac) is not threaded in any implementation.

9
Further Comments

The time taken by the function depends on n . The function uses a two-pass algorithm.

10
Example

A program to calculate regression constants, a ^  and b ^ , the standard error of the regression constants, the regression coefficient of determination and the degrees of freedom about the regression.

10.1
Program Text

Program Text (g02cace.c)

10.2
Program Data

Program Data (g02cace.d)

10.3
Program Results

Program Results (g02cace.r)