g02cbc performs a simple linear regression with or without a constant term. The data is optionally weighted, and confidence intervals are calculated for the predicted and average values of y at a given x.
The function may be called by the names: g02cbc, nag_correg_linregs_noconst or nag_regress_confid_interval.
3Description
g02cbc fits a straight line model of the form,
where is the expected value of the variable , to the data points
such that
where the values are independent random errors. The th data point may have an associated weight . The values of and are estimated by minimizing (if the weights option is not selected then ). The fitted values are calculated using
where
and the weighted means and are given by
The residuals of the regression are calculated using
and the residual mean square about the regression , is determined using
where (the number of degrees of freedom) has the following values
where
where .
Note: the weights should be scaled to give the required degrees of freedom.
The function calculates predicted estimates for a value of , , is given by
this prediction has a standard error
The confidence interval for this estimation of is given by
where refers to the point of the distribution with degrees of freedom (e.g., when and , ). If you specify the probability then the lower limit of this interval is
and the upper limit is
The mean value of at is estimated by the fitted value . This has a standard error of
and a confidence interval is given by
For example, if you specify the probability then the lower limit of this interval is
and the upper limit is
The leverage, , is a measure of the influence a value has on the fitted line at that point, . The leverage is given by
so it can be seen that
Similar formulae can be derived for the case when the line goes through the origin, that is .
4References
Snedecor G W and Cochran W G (1967) Statistical Methods Iowa State University Press
5Arguments
1: – Nag_SumSquareInput
On entry: indicates whether g02cbc is to include a constant term in the regression.
The constant term, , is included.
The constant term, , is not included, i.e., .
Constraint:
or .
2: – IntegerInput
On entry: , the number of observations.
Constraints:
if , ;
if , .
3: – const doubleInput
On entry: observations on the independent variable, .
Constraint:
all the values of must not be identical.
4: – const doubleInput
On entry: observations on the dependent variable, .
5: – const doubleInput
On entry: if weighted estimates are required then wt must contain the weights to be used in the weighted regression. Usually will be an integral value corresponding to the number of observations associated with the th data point, or zero if the th data point is to be ignored. The sum of the weights, therefore, represents the effective total number of observations used to create the regression line.
If weights are not provided then wt must be set to NULL and the effective number of observations is n.
Constraint:
if , , for .
6: – doubleInput
On entry: the confidence level for the confidence intervals for the mean.
Constraint:
.
7: – doubleInput
On entry: the confidence level for the prediction intervals.
Constraint:
.
8: – doubleOutput
On exit: the fitted values, .
9: – doubleOutput
On exit: contains the lower limit of the confidence interval for the regression line at .
10: – doubleOutput
On exit: contains the upper limit of the confidence interval for the regression line at .
11: – doubleOutput
On exit: contains the lower limit of the confidence interval for the individual y value at .
12: – doubleOutput
On exit: contains the upper limit of the confidence interval for the individual y value at .
13: – doubleOutput
On exit: the leverage of each observation on the regression.
14: – doubleOutput
On exit: the residuals of the regression.
15: – double *Output
On exit: the residual mean square about the regression.
16: – NagError *Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).
Residual mean sum of squares is zero, i.e., a perfect fit was obtained.
7Accuracy
The computations are believed to be stable.
8Parallelism and Performance
Background information to multithreading can be found in the Multithreading documentation.
g02cbc is not threaded in any implementation.
9Further Comments
None.
10Example
A program to calculate the fitted value of and the upper and lower limits of the confidence interval for the regression line as well as the individual values.