G02CHF (PDF version)
G02 Chapter Contents
G02 Chapter Introduction
NAG Library Manual

NAG Library Routine Document

G02CHF

Note:  before using this routine, please read the Users' Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent details.

 Contents

    1  Purpose
    7  Accuracy

1  Purpose

G02CHF performs a multiple linear regression with no constant on a set of variables whose sums of squares and cross-products about zero and correlation-like coefficients are given.

2  Specification

SUBROUTINE G02CHF ( N, K1, K, SSPZ, LDSSPZ, RZ, LDRZ, RESULT, COEF, LDCOEF, RZNV, LDRZNV, CZ, LDCZ, WKZ, LDWKZ, IFAIL)
INTEGER  N, K1, K, LDSSPZ, LDRZ, LDCOEF, LDRZNV, LDCZ, LDWKZ, IFAIL
REAL (KIND=nag_wp)  SSPZ(LDSSPZ,K1), RZ(LDRZ,K1), RESULT(13), COEF(LDCOEF,3), RZNV(LDRZNV,K), CZ(LDCZ,K), WKZ(LDWKZ,K)

3  Description

G02CHF fits a curve of the form
y=b1x1+b2x2++bkxk  
to the data points
x11,x21,,xk1,y1 x12,x22,,xk2,y2 x1n,x2n,,xkn,yn  
such that
yi=b1x1i+b2x2i++bkxki+ei,  i=1,2,,n.  
The routine calculates the regression coefficients, b1,b2,,bk, (and various other statistical quantities) by minimizing
i=1nei2.  
The actual data values x1i,x2i,,xki,yi are not provided as input to the routine. Instead, input to the routine consists of:
(i) The number of cases, n, on which the regression is based.
(ii) The total number of variables, dependent and independent, in the regression, k+1.
(iii) The number of independent variables in the regression, k.
(iv) The k+1 by k+1 matrix S~ij of sums of squares and cross-products about zero of all the variables in the regression; the terms involving the dependent variable, y, appear in the k+1th row and column.
(v) The k+1 by k+1 matrix R~ij of correlation-like coefficients for all the variables in the regression; the correlations involving the dependent variable, y, appear in the k+1th row and column.
The quantities calculated are:
(a) The inverse of the k by k partition of the matrix of correlation-like coefficients, R~ij, involving only the independent variables. The inverse is obtained using an accurate method which assumes that this sub-matrix is positive definite (see Section 9).
(b) The modified matrix, C=cij, where
cij=R~ijr~ijS~ij,  i,j=1,2,,k,  
where r~ij is the i,jth element of the inverse matrix of R~ij as described in (a) above. Each element of C is thus the corresponding element of the matrix of correlation-like coefficients multiplied by the corresponding element of the inverse of this matrix, divided by the corresponding element of the matrix of sums of squares and cross-products about zero.
(c) The regression coefficients:
bi=j=1kcijS~jk+1,  i=1,2,,k,  
where S~jk+1 is the sum of cross-products about zero for the independent variable xj and the dependent variable y.
(d) The sum of squares attributable to the regression, SSR, the sum of squares of deviations about the regression, SSD, and the total sum of squares, SST:
  • SST=S~k+1k+1, the sum of squares about zero for the dependent variable, y;
  • SSR=j=1kbjS~jk+1;  SSD=SST-SSR.
(e) The degrees of freedom attributable to the regression, DFR, the degrees of freedom of deviations about the regression, DFD, and the total degrees of freedom, DFT:
DFR=k;  DFD=n-k;  DFT=n.  
(f) The mean square attributable to the regression, MSR, and the mean square of deviations about the regression, MSD:
MSR=SSR/DFR;  MSD=SSD/DFD.  
(g) The F value for the analysis of variance:
F=MSR/MSD.  
(h) The standard error estimate:
s=MSD.  
(i) The coefficient of multiple correlation, R, the coefficient of multiple determination, R2, and the coefficient of multiple determination corrected for the degrees of freedom, R-2:
R=1-SSD SST ;  R2=1-SSD SST ;   R-2=1-SSD×DFT SST×DFD .  
(j) The standard error of the regression coefficients:
sebi=MSD×cii,   i= 1,2,,k.  
(k) The t values for the regression coefficients:
tbi=bi sebi ,  i=1,2,,k.  

4  References

Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley

5  Parameters

1:     N – INTEGERInput
On entry: n, the number of cases used in calculating the sums of squares and cross-products and correlation-like coefficients.
2:     K1 – INTEGERInput
On entry: the total number of variables, independent and dependent k+1, in the regression.
Constraint: 2K1N.
3:     K – INTEGERInput
On entry: the number of independent variables k in the regression.
Constraint: K=K1-1.
4:     SSPZLDSSPZK1 – REAL (KIND=nag_wp) arrayInput
On entry: SSPZij must be set to S~ij, the sum of cross-products about zero for the ith and jth variables, for i=1,2,,k+1 and j=1,2,,k+1; terms involving the dependent variable appear in row k+1 and column k+1.
5:     LDSSPZ – INTEGERInput
On entry: the first dimension of the array SSPZ as declared in the (sub)program from which G02CHF is called.
Constraint: LDSSPZK1.
6:     RZLDRZK1 – REAL (KIND=nag_wp) arrayInput
On entry: RZij must be set to R~ij, the correlation-like coefficient for the ith and jth variables, for i=1,2,,k+1 and j=1,2,,k+1; coefficients involving the dependent variable appear in row k+1 and column k+1.
7:     LDRZ – INTEGERInput
On entry: the first dimension of the array RZ as declared in the (sub)program from which G02CHF is called.
Constraint: LDRZK1.
8:     RESULT13 – REAL (KIND=nag_wp) arrayOutput
On exit: the following information:
RESULT1 SSR, the sum of squares attributable to the regression;
RESULT2 DFR, the degrees of freedom attributable to the regression;
RESULT3 MSR, the mean square attributable to the regression;
RESULT4 F, the F value for the analysis of variance;
RESULT5 SSD, the sum of squares of deviations about the regression;
RESULT6 DFD, the degrees of freedom of deviations about the regression;
RESULT7 MSD, the mean square of deviations about the regression;
RESULT8 SST, the total sum of squares;
RESULT9 DFT, the total degrees of freedom;
RESULT10 s, the standard error estimate;
RESULT11 R, the coefficient of multiple correlation;
RESULT12 R2, the coefficient of multiple determination;
RESULT13 R-2, the coefficient of multiple determination corrected for the degrees of freedom.
9:     COEFLDCOEF3 – REAL (KIND=nag_wp) arrayOutput
On exit: for i=1,2,,k, the following information:
COEFi1
bi, the regression coefficient for the ith variable.
COEFi2
sebi, the standard error of the regression coefficient for the ith variable.
COEFi3
tbi, the t value of the regression coefficient for the ith variable.
10:   LDCOEF – INTEGERInput
On entry: the first dimension of the array COEF as declared in the (sub)program from which G02CHF is called.
Constraint: LDCOEFK.
11:   RZNVLDRZNVK – REAL (KIND=nag_wp) arrayOutput
On exit: the inverse of the matrix of correlation-like coefficients for the independent variables; that is, the inverse of the matrix consisting of the first k rows and columns of RZ.
12:   LDRZNV – INTEGERInput
On entry: the first dimension of the array RZNV as declared in the (sub)program from which G02CHF is called.
Constraint: LDRZNVK.
13:   CZLDCZK – REAL (KIND=nag_wp) arrayOutput
On exit: the modified inverse matrix, C, where
CZij=RZij×RZNVij SSPZij ,  i,j=1,2,,k.  
14:   LDCZ – INTEGERInput
On entry: the first dimension of the array CZ as declared in the (sub)program from which G02CHF is called.
Constraint: LDCZK.
15:   WKZLDWKZK – REAL (KIND=nag_wp) arrayWorkspace
16:   LDWKZ – INTEGERInput
On entry: the first dimension of the array WKZ as declared in the (sub)program from which G02CHF is called.
Constraint: LDWKZK.
17:   IFAIL – INTEGERInput/Output
On entry: IFAIL must be set to 0, -1​ or ​1. If you are unfamiliar with this parameter you should refer to Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value -1​ or ​1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is 0. When the value -1​ or ​1 is used it is essential to test the value of IFAIL on exit.
On exit: IFAIL=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6  Error Indicators and Warnings

If on entry IFAIL=0 or -1, explanatory error messages are output on the current error message unit (as defined by X04AAF).
Errors or warnings detected by the routine:
IFAIL=1
On entry,K1<2.
IFAIL=2
On entry,K1K+1.
IFAIL=3
On entry,N<K1.
IFAIL=4
On entry,LDSSPZ<K1,
orLDRZ<K1,
orLDCOEF<K,
orLDRZNV<K,
orLDCZ<K,
orLDWKZ<K.
IFAIL=5
This indicates that the k by k partition of the matrix held in RZ, which is to be inverted, is not positive definite.
IFAIL=6
This indicates that the refinement following the actual inversion fails, indicating that the k by k partition of the matrix held in RZ, which is to be inverted, is ill-conditioned. The use of G02DAF, which employs a different numerical technique, may avoid the difficulty.
IFAIL=7
Unexpected error in F04ABF.
IFAIL=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 3.8 in the Essential Introduction for further information.
IFAIL=-399
Your licence key may have expired or may not have been installed correctly.
See Section 3.7 in the Essential Introduction for further information.
IFAIL=-999
Dynamic memory allocation failed.
See Section 3.6 in the Essential Introduction for further information.

7  Accuracy

The accuracy of any regression routine is almost entirely dependent on the accuracy of the matrix inversion method used. In G02CHF, it is the matrix of correlation-like coefficients rather than that of the sums of squares and cross-products about zero that is inverted; this means that all terms in the matrix for inversion are of a similar order, and reduces the scope for computational error. For details on absolute accuracy, the relevant section of the document describing the inversion routine used, F04ABF, should be consulted. G02DAF uses a different method, based on F04AMF, and that routine may well prove more reliable numerically. It does not handle missing values, nor does it provide the same output as this routine.
If, in calculating F or any of the tbi  (see Section 3), the numbers involved are such that the result would be outside the range of numbers which can be stored by the machine, then the answer is set to the largest quantity which can be stored as a real variable, by means of a call to X02ALF.

8  Parallelism and Performance

G02CHF is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
G02CHF makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9  Further Comments

The time taken by G02CHF depends on k.
This routine assumes that the matrix of correlation-like coefficients for the independent variables in the regression is positive definite; it fails if this is not the case.
This correlation matrix will in fact be positive definite whenever the correlation-like matrix and the sums of squares and cross-products (about zero) matrix have been formed either without regard to missing values, or by eliminating completely any cases involving missing values for any variable. If, however, these matrices are formed by eliminating cases with missing values from only those calculations involving the variables for which the values are missing, no such statement can be made, and the correlation-like matrix may or may not be positive definite. You should be aware of the possible dangers of using correlation matrices formed in this way (see the G02 Chapter Introduction), but if they nevertheless wish to carry out regressions using such matrices, this routine is capable of handling the inversion of such matrices, provided they are positive definite.
If a matrix is positive definite, its subsequent re-organisation by either of G02CEF or G02CFF will not affect this property and the new matrix can safely be used in this routine. Thus correlation matrices produced by any of G02BDF, G02BEF, G02BKF or G02BLF, even if subsequently modified by either G02CEF or G02CFF, can be handled by this routine.
It should be noted that the routine requires the dependent variable to be the last of the k+1 variables whose statistics are provided as input to the routine. If this variable is not correctly positioned in the original data, the means, standard deviations, sums of squares and cross-products about zero, and correlation-like coefficients can be manipulated by using G02CEF or G02CFF to reorder the variables as necessary.

10  Example

This example reads in the sums of squares and cross-products about zero, and correlation-like coefficients for three variables. A multiple linear regression with no constant is then performed with the third and final variable as the dependent variable. Finally the results are printed.

10.1  Program Text

Program Text (g02chfe.f90)

10.2  Program Data

Program Data (g02chfe.d)

10.3  Program Results

Program Results (g02chfe.r)


G02CHF (PDF version)
G02 Chapter Contents
G02 Chapter Introduction
NAG Library Manual

© The Numerical Algorithms Group Ltd, Oxford, UK. 2015