NAG FL Interface
g02chf (linregm_coeffs_noconst)
1
Purpose
g02chf performs a multiple linear regression with no constant on a set of variables whose sums of squares and cross-products about zero and correlation-like coefficients are given.
2
Specification
Fortran Interface
Subroutine g02chf ( |
n, k1, k, sspz, ldsspz, rz, ldrz, result, coef, ldcoef, rznv, ldrznv, cz, ldcz, wkz, ldwkz, ifail) |
Integer, Intent (In) |
:: |
n, k1, k, ldsspz, ldrz, ldcoef, ldrznv, ldcz, ldwkz |
Integer, Intent (Inout) |
:: |
ifail |
Real (Kind=nag_wp), Intent (In) |
:: |
sspz(ldsspz,k+1) |
Real (Kind=nag_wp), Intent (Inout) |
:: |
rz(ldrz,k+1), coef(ldcoef,3), rznv(ldrznv,k), cz(ldcz,k), wkz(ldwkz,k) |
Real (Kind=nag_wp), Intent (Out) |
:: |
result(13) |
|
C Header Interface
#include <nag.h>
void |
g02chf_ (const Integer *n, const Integer *k1, const Integer *k, const double sspz[], const Integer *ldsspz, double rz[], const Integer *ldrz, double result[], double coef[], const Integer *ldcoef, double rznv[], const Integer *ldrznv, double cz[], const Integer *ldcz, double wkz[], const Integer *ldwkz, Integer *ifail) |
|
C++ Header Interface
#include <nag.h> extern "C" {
void |
g02chf_ (const Integer &n, const Integer &k1, const Integer &k, const double sspz[], const Integer &ldsspz, double rz[], const Integer &ldrz, double result[], double coef[], const Integer &ldcoef, double rznv[], const Integer &ldrznv, double cz[], const Integer &ldcz, double wkz[], const Integer &ldwkz, Integer &ifail) |
}
|
The routine may be called by the names g02chf or nagf_correg_linregm_coeffs_noconst.
3
Description
g02chf fits a curve of the form
to the data points
such that
The routine calculates the regression coefficients,
, (and various other statistical quantities) by minimizing
The actual data values
are not provided as input to the routine. Instead, input to the routine consists of:
-
(i)The number of cases, , on which the regression is based.
-
(ii)The total number of variables, dependent and independent, in the regression, .
-
(iii)The number of independent variables in the regression, .
-
(iv)The by matrix of sums of squares and cross-products about zero of all the variables in the regression; the terms involving the dependent variable, , appear in the th row and column.
-
(v)The by matrix of correlation-like coefficients for all the variables in the regression; the correlations involving the dependent variable, , appear in the th row and column.
The quantities calculated are:
-
(a)The inverse of the by partition of the matrix of correlation-like coefficients, , involving only the independent variables. The inverse is obtained using an accurate method which assumes that this sub-matrix is positive definite (see Section 9).
-
(b)The modified matrix, , where
where is the th element of the inverse matrix of as described in (a) above. Each element of is thus the corresponding element of the matrix of correlation-like coefficients multiplied by the corresponding element of the inverse of this matrix, divided by the corresponding element of the matrix of sums of squares and cross-products about zero.
-
(c)The regression coefficients:
where is the sum of cross-products about zero for the independent variable and the dependent variable .
-
(d)The sum of squares attributable to the regression, , the sum of squares of deviations about the regression, , and the total sum of squares, :
- , the sum of squares about zero for the dependent variable, ;
- .
-
(e)The degrees of freedom attributable to the regression, , the degrees of freedom of deviations about the regression, , and the total degrees of freedom, :
-
(f)The mean square attributable to the regression, , and the mean square of deviations about the regression, :
-
(g)The value for the analysis of variance:
-
(h)The standard error estimate:
-
(i)The coefficient of multiple correlation, , the coefficient of multiple determination, , and the coefficient of multiple determination corrected for the degrees of freedom, :
-
(j)The standard error of the regression coefficients:
-
(k)The values for the regression coefficients:
4
References
Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley
5
Arguments
-
1:
– Integer
Input
-
On entry: , the number of cases used in calculating the sums of squares and cross-products and correlation-like coefficients.
-
2:
– Integer
Input
-
On entry: is no longer required by g02chf but is retained for backwards compatibility.
-
3:
– Integer
Input
-
On entry: , the number of independent variables in the regression.
Constraint:
.
-
4:
– Real (Kind=nag_wp) array
Input
-
On entry: must be set to , the sum of cross-products about zero for the th and th variables, for and ; terms involving the dependent variable appear in row and column .
-
5:
– Integer
Input
-
On entry: the first dimension of the array
sspz as declared in the (sub)program from which
g02chf is called.
Constraint:
.
-
6:
– Real (Kind=nag_wp) array
Input
-
On entry: must be set to , the correlation-like coefficient for the th and th variables, for and ; coefficients involving the dependent variable appear in row and column .
-
7:
– Integer
Input
-
On entry: the first dimension of the array
rz as declared in the (sub)program from which
g02chf is called.
Constraint:
.
-
8:
– Real (Kind=nag_wp) array
Output
-
On exit: the following information:
| , the sum of squares attributable to the regression; |
| , the degrees of freedom attributable to the regression; |
| , the mean square attributable to the regression; |
| , the value for the analysis of variance; |
| , the sum of squares of deviations about the regression; |
| , the degrees of freedom of deviations about the regression; |
| , the mean square of deviations about the regression; |
| , the total sum of squares; |
| , the total degrees of freedom; |
| , the standard error estimate; |
| , the coefficient of multiple correlation; |
| , the coefficient of multiple determination; |
| , the coefficient of multiple determination corrected for the degrees of freedom. |
-
9:
– Real (Kind=nag_wp) array
Output
-
On exit: for
, the following information:
- , the regression coefficient for the th variable.
- , the standard error of the regression coefficient for the th variable.
- , the value of the regression coefficient for the th variable.
-
10:
– Integer
Input
-
On entry: the first dimension of the array
coef as declared in the (sub)program from which
g02chf is called.
Constraint:
.
-
11:
– Real (Kind=nag_wp) array
Output
-
On exit: the inverse of the matrix of correlation-like coefficients for the independent variables; that is, the inverse of the matrix consisting of the first
rows and columns of
rz.
-
12:
– Integer
Input
-
On entry: the first dimension of the array
rznv as declared in the (sub)program from which
g02chf is called.
Constraint:
.
-
13:
– Real (Kind=nag_wp) array
Output
-
On exit: the modified inverse matrix,
, where
-
14:
– Integer
Input
-
On entry: the first dimension of the array
cz as declared in the (sub)program from which
g02chf is called.
Constraint:
.
-
15:
– Real (Kind=nag_wp) array
Workspace
-
16:
– Integer
Input
-
On entry: the first dimension of the array
wkz as declared in the (sub)program from which
g02chf is called.
Constraint:
.
-
17:
– Integer
Input/Output
-
On entry:
ifail must be set to
,
or
to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of means that an error message is printed while a value of means that it is not.
If halting is not appropriate, the value
or
is recommended. If message printing is undesirable, then the value
is recommended. Otherwise, the value
is recommended.
When the value or is used it is essential to test the value of ifail on exit.
On exit:
unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
or
, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
-
On entry, .
Constraint: .
-
On entry, and .
Constraint: .
-
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
-
The
by
partition of
rz which requires inversion is not positive definite.
-
The refinement following the actual inversion has failed. This indicates that the
by
partition of the matrix held in
rz, which is to be inverted, is ill-conditioned. The use of
g02daf, which employs a different numerical technique, may avoid the difficulty.
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 7 in the Introduction to the NAG Library FL Interface for further information.
Your licence key may have expired or may not have been installed correctly.
See
Section 8 in the Introduction to the NAG Library FL Interface for further information.
Dynamic memory allocation failed.
See
Section 9 in the Introduction to the NAG Library FL Interface for further information.
7
Accuracy
The accuracy of
g02chf is almost entirely dependent on the accuracy of the matrix inversion method used. As
g02chf works with the matrix of correlation coefficients rather than that of the sums of squares and cross-products of deviations from means all terms in the matrix being inverted are of a similar order and therefore the scope for computational error is reduced. An alternative, and potentially more numerically reliable, routine is
g02daf.
g02daf works directly with the data matrix and therefore avoids explicitly performing a matrix inversion. However,
g02daf does not handle missing values, nor does it provide the same output as this routine.
If, in calculating
or any of the
(see
Section 3), the numbers involved are such that the result would be outside the range of numbers which can be stored by the machine, then the answer is set to the largest quantity which can be stored as a real variable, by means of a call to
x02alf.
8
Parallelism and Performance
g02chf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g02chf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementation-specific information.
The time taken by g02chf depends on .
This routine assumes that the matrix of correlation-like coefficients for the independent variables in the regression is positive definite; it fails if this is not the case.
This correlation matrix will in fact be positive definite whenever the correlation-like matrix and the sums of squares and cross-products (about zero) matrix have been formed either without regard to missing values, or by eliminating
completely any cases involving missing values for any variable. If, however, these matrices are formed by eliminating cases with missing values from only those calculations involving the variables for which the values are missing, no such statement can be made, and the correlation-like matrix may or may not be positive definite. You should be aware of the possible dangers of using correlation matrices formed in this way (see the
G02 Chapter Introduction), but if they nevertheless wish to carry out regressions using such matrices, this routine is capable of handling the inversion of such matrices, provided they are positive definite.
If a matrix is positive definite, its subsequent re-organisation by either of
g02cef or
g02cff will not affect this property and the new matrix can safely be used in this routine. Thus correlation matrices produced by any of
g02bdf,
g02bef,
g02bkf or
g02blf, even if subsequently modified by either
g02cef or
g02cff, can be handled by this routine.
It should be noted that the routine requires the dependent variable to be the last of the
variables whose statistics are provided as input to the routine. If this variable is not correctly positioned in the original data, the means, standard deviations, sums of squares and cross-products about zero, and correlation-like coefficients can be manipulated by using
g02cef or
g02cff to reorder the variables as necessary.
10
Example
This example reads in the sums of squares and cross-products about zero, and correlation-like coefficients for three variables. A multiple linear regression with no constant is then performed with the third and final variable as the dependent variable. Finally the results are printed.
10.1
Program Text
10.2
Program Data
10.3
Program Results