The function may be called by the names: g02cac, nag_correg_linregs_const or nag_simple_linear_regression.
3Description
g02cac fits a straight line model of the form,
where is the expected value of the variable , to the data points
such that
where the values are independent random errors. The th data point may have an associated weight , these may be used either in the situation when var or if observations have to be removed from the regression by having zero weight or have been observed with frequency .
The regression coefficient, , and the regression constant, are estimated by minimizing
if the weights option is not selected then .
The following statistics are computed:
the estimate of regression constant ,
the estimate of regression coefficient ,
the residual sum of squares ,
where the weighted means and are
The number of degrees of freedom associated with is
where
where
Note: the weights should be scaled to give the correct degrees of freedom in the case var .
The value or coefficient of determination
This measures the proportion of the total variation about the mean that can be explained by the regression.
The standard error for the regression constant
The standard error for the regression coefficient
Similar formulae can be derived for the case when the line goes through the origin, that is .
4References
Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley
5Arguments
1: – Nag_SumSquareInput
On entry: indicates whether g02cac is to include a constant term in the regression.
The regression constant is included.
The regression constant is not included, i.e., .
Constraint:
or .
2: – IntegerInput
On entry: , the number of observations.
Constraints:
if , ;
if , .
3: – const doubleInput
On entry: the values of the independent variable with the th value stored in , for .
Constraint:
all the values of must not be identical.
4: – const doubleInput
On entry: the values of the dependent variable with the th value stored in , for .
Constraint:
all the values of must not be identical.
5: – const doubleInput
On entry: if weighted estimates are required then wt must contain the weights to be used in the weighted regression. Usually will be an integral value corresponding to the number of observations associated with the th data point, or zero if the th data point is to be ignored. The sum of the weights, therefore, represents the effective total number of observations used to create the regression line.
If weights are not provided then wt must be set to NULL and the effective number of observations is n.
Constraint:
if , , for .
6: – double *Output
On exit: if then a is the regression constant , otherwise a is set to zero.
7: – double *Output
On exit: the regression coefficient .
8: – double *Output
On exit: the standard error of the regression constant .
9: – double *Output
On exit: the standard error of the regression coefficient .
10: – double *Output
On exit: the coefficient of determination, .
11: – double *Output
On exit: the sum of squares of the residuals about the regression.
12: – double *Output
On exit: the degrees of freedom associated with the residual sum of squares.
13: – NagError *Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).
On entry, the degrees of freedom for the residual are zero, i.e., the designated number of arguments the effective number of observations.
NW_RSS_EQ_ZERO
Residual sum of squares is zero, i.e., a perfect fit was obtained.
7Accuracy
The computations are believed to be stable.
8Parallelism and Performance
Background information to multithreading can be found in the Multithreading documentation.
g02cac is not threaded in any implementation.
9Further Comments
The time taken by the function depends on . The function uses a two-pass algorithm.
10Example
A program to calculate regression constants, and , the standard error of the regression constants, the regression coefficient of determination and the degrees of freedom about the regression.