NAG Library Function Document
nag_simple_linear_regression (g02cac)
1 Purpose
nag_simple_linear_regression (g02cac) performs a simple linear regression with or without a constant term. The data is optionally weighted.
2 Specification
#include <nag.h> |
#include <nagg02.h> |
void |
nag_simple_linear_regression (Nag_SumSquare mean,
Integer n,
const double x[],
const double y[],
const double wt[],
double *a,
double *b,
double *a_serr,
double *b_serr,
double *rsq,
double *rss,
double *df,
NagError *fail) |
|
3 Description
nag_simple_linear_regression (g02cac) fits a straight line model of the form,
where
is the expected value of the variable
, to the data points
such that
where the
values are independent random errors. The
th data point may have an associated weight
, these may be used either in the situation when var
or if observations have to be removed from the regression by having zero weight or have been observed with frequency
.
The regression coefficient,
, and the regression constant,
are estimated by minimizing
if the weights option is not selected then
.
The following statistics are computed:
- the estimate of regression constant ,
- the estimate of regression coefficient ,
- the residual sum of squares ,
where the weighted means
and
are
The number of degrees of freedom associated with
is
- where
- where
Note: the weights should be scaled to give the correct degrees of freedom in the case var .
The
value or coefficient of determination
This measures the proportion of the total variation about the mean that can be explained by the regression.
The standard error for the regression constant
The standard error for the regression coefficient
Similar formulae can be derived for the case when the line goes through the origin, that is .
4 References
Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley
5 Arguments
- 1:
– Nag_SumSquareInput
-
On entry: indicates whether nag_simple_linear_regression (g02cac) is to include a constant term in the regression.
- The regression constant is included.
- The regression constant is not included, i.e., .
Constraint:
or .
- 2:
– IntegerInput
-
On entry: , the number of observations.
Constraints:
- if , ;
- if , .
- 3:
– const doubleInput
-
On entry: the values of the independent variable with the th value stored in , for .
Constraint:
all the values of must not be identical.
- 4:
– const doubleInput
-
On entry: the values of the dependent variable with the th value stored in , for .
Constraint:
all the values of must not be identical.
- 5:
– const doubleInput
-
On entry: if weighted estimates are required then
wt must contain the weights to be used in the weighted regression. Usually
will be an integral value corresponding to the number of observations associated with the
th data point, or zero if the
th data point is to be ignored. The sum of the weights therefore represents the effective total number of observations used to create the regression line.
If weights are not provided then
wt must be set to
NULL and the effective number of observations is
n.
Constraint:
if , , for .
- 6:
– double *Output
-
On exit: if
then
a is the regression constant
, otherwise
a is set to zero.
- 7:
– double *Output
-
On exit: the regression coefficient .
- 8:
– double *Output
-
On exit: the standard error of the regression constant .
- 9:
– double *Output
-
On exit: the standard error of the regression coefficient .
- 10:
– double *Output
-
On exit: the coefficient of determination, .
-
On exit: the sum of squares of the residuals about the regression.
- 12:
– double *Output
-
On exit: the degrees of freedom associated with the residual sum of squares.
- 13:
– NagError *Input/Output
-
The NAG error argument (see
Section 2.7 in How to Use the NAG Library and its Documentation).
6 Error Indicators and Warnings
- NE_BAD_PARAM
-
On entry, argument
mean had an illegal value.
- NE_INT_ARG_LT
-
On entry, .
Constraint:
if .
On entry, .
Constraint:
if .
- NE_NEG_WEIGHT
-
On entry, at least one of the weights is negative.
- NE_SW_LOW
-
On entry, the sum of elements of
wt must be greater than 1.0 if
or greater than 2.0 if
.
- NE_WT_LOW
-
On entry,
wt must contain at least 1 positive element if
or at least 2 positive elements if
.
- NE_X_OR_Y_IDEN
-
On entry, all elements of
x and/or
y are equal.
- NE_ZERO_DOF_RESID
-
On entry, the degrees of freedom for the residual are zero, i.e., the designated number of arguments the effective number of observations.
-
Residual sum of squares is zero, i.e., a perfect fit was obtained.
7 Accuracy
The computations are believed to be stable.
8 Parallelism and Performance
nag_simple_linear_regression (g02cac) is not threaded in any implementation.
The time taken by the function depends on . The function uses a two-pass algorithm.
10 Example
A program to calculate regression constants, and , the standard error of the regression constants, the regression coefficient of determination and the degrees of freedom about the regression.
10.1 Program Text
Program Text (g02cace.c)
10.2 Program Data
Program Data (g02cace.d)
10.3 Program Results
Program Results (g02cace.r)