nag_simple_linear_regression (g02cac) performs a simple linear regression with or without a constant term. The data is optionally weighted.
nag_simple_linear_regression (g02cac) fits a straight line model of the form,
where
is the expected value of the variable
, to the data points
such that
where the
values are independent random errors. The
th data point may have an associated weight
, these may be used either in the situation when var
or if observations have to be removed from the regression by having zero weight or have been observed with frequency
.
The regression coefficient,
, and the regression constant,
are estimated by minimizing
if the weights option is not selected then
.
The following statistics are computed:
- the estimate of regression constant ,
- the estimate of regression coefficient ,
- the residual sum of squares ,
where the weighted means
and
are
The number of degrees of freedom associated with
is
- where
- where
Note: the weights should be scaled to give the correct degrees of freedom in the case var .
The
value or coefficient of determination
The standard error for the regression constant
The standard error for the regression coefficient
Similar formulae can be derived for the case when the line goes through the origin, that is .
- 1:
mean – Nag_SumSquareInput
On entry: indicates whether nag_simple_linear_regression (g02cac) is to include a constant term in the regression.
- The regression constant is included.
- The regression constant is not included, i.e., .
Constraint:
or .
- 2:
n – IntegerInput
On entry: , the number of observations.
Constraints:
- if , ;
- if , .
- 3:
x[n] – const doubleInput
-
On entry: the values of the independent variable with the th value stored in , for .
Constraint:
all the values of must not be identical.
- 4:
y[n] – const doubleInput
-
On entry: the values of the dependent variable with the th value stored in , for .
Constraint:
all the values of must not be identical.
- 5:
wt[n] – const doubleInput
-
On entry: if weighted estimates are required then
wt must contain the weights to be used in the weighted regression. Usually
will be an integral value corresponding to the number of observations associated with the
th data point, or zero if the
th data point is to be ignored. The sum of the weights therefore represents the effective total number of observations used to create the regression line.
If weights are not provided then
wt must be set to
NULL and the effective number of observations is
n.
Constraint:
if , , for .
- 6:
a – double *Output
-
On exit: if
then
a is the regression constant
, otherwise
a is set to zero.
- 7:
b – double *Output
-
On exit: the regression coefficient .
- 8:
a_serr – double *Output
-
On exit: the standard error of the regression constant .
- 9:
b_serr – double *Output
-
On exit: the standard error of the regression coefficient .
- 10:
rsq – double *Output
-
On exit: the coefficient of determination, .
-
On exit: the sum of squares of the residuals about the regression.
- 12:
df – double *Output
-
On exit: the degrees of freedom associated with the residual sum of squares.
- 13:
fail – NagError *Input/Output
-
The NAG error argument (see
Section 3.6 in the Essential Introduction).
The computations are believed to be stable.
Not applicable.