naginterfaces.library.correg.linregm_fit¶
- naginterfaces.library.correg.linregm_fit(x, isx, y, mean='M', wt=None, tol=1e-06)[source]¶
linregm_fit
performs a general multiple linear regression when the independent variables may be linearly dependent. Parameter estimates, standard errors, residuals and influence statistics are computed.linregm_fit
may be used to perform a weighted regression.For full information please refer to the NAG Library document for g02da
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02daf.html
- Parameters
- xfloat, array-like, shape
must contain the th observation for the th independent variable, for , for .
- isxint, array-like, shape
Indicates which independent variables are to be included in the model.
The variable contained in the th column of is included in the regression model.
- yfloat, array-like, shape
, the observations on the dependent variable.
- meanstr, length 1, optional
Indicates if a mean term is to be included.
A mean term, intercept, will be included in the model.
The model will pass through the origin, zero-point.
- wtNone or float, array-like, shape , optional
If provided must contain the weights to be used with the model.
If , the th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
The values of and will be set to zero for observations with zero weights.
If is not provided the effective number of observations is .
- tolfloat, optional
The value of is used to decide if the independent variables are of full rank and if not what is the rank of the independent variables. The smaller the value of the stricter the criterion for selecting the singular value decomposition. If , the singular value decomposition will never be used; this may cause run time errors or inaccurate results if the independent variables are not of full rank.
- Returns
- rssfloat
The residual sum of squares for the regression.
- idfint
The degrees of freedom associated with the residual sum of squares.
- bfloat, ndarray, shape
, contains the least squares estimates of the parameters of the regression model, .
If , will contain the estimate of the mean parameter and will contain the coefficient of the variable contained in column of , where is the th positive value in the array .
If , will contain the coefficient of the variable contained in column of , where is the th positive value in the array .
- sefloat, ndarray, shape
, contains the standard errors of the parameter estimates given in .
- covfloat, ndarray, shape
The first elements of contain the upper triangular part of the variance-covariance matrix of the parameter estimates given in . They are stored packed by column, i.e., the covariance between the parameter estimate given in and the parameter estimate given in , , is stored in .
- resfloat, ndarray, shape
The (weighted) residuals, , for .
- hfloat, ndarray, shape
The diagonal elements of , , for .
- qfloat, ndarray, shape
The results of the decomposition:
the first column of contains ;
the upper triangular part of columns to contain the matrix;
the strictly lower triangular part of columns to contain details of the matrix.
- svdbool
If a singular value decomposition has been performed then will be , otherwise will be .
- irankint
The rank of the independent variables.
If , .
If , is an estimate of the rank of the independent variables.
is calculated as the number of singular values greater that (largest singular value).
It is possible for the SVD to be carried out but to be returned as .
- pfloat, ndarray, shape
Details of the decomposition and SVD if used.
If , only the first elements of are used these will contain the zeta values for the decomposition (see
lapackeig.dgeqrf
for details).If , the first elements of will contain the zeta values for the decomposition (see
lapackeig.dgeqrf
for details) and the next elements of contain singular values.The following by elements contain the matrix stored by columns.
- wkfloat, ndarray, shape
If on exit , contains information which is needed by
linregm_fit_newvar()
; otherwise is used as workspace.
- Raises
- NagValueError
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, and .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: or .
- (errno )
On entry, .
Constraint: or .
- (errno )
On entry, .
Constraint: , for .
- (errno )
On entry, .
Constraint: , for .
- (errno )
On entry, .
Constraint: must be compatible with the number of nonzero elements in .
- (errno )
SVD solution failed to converge.
- Warns
- NagAlgorithmicWarning
- (errno )
The degrees of freedom for the residuals are zero, i.e., the designated number of arguments is equal to the effective number of observations. In this case the parameter estimates will be returned along with the diagonal elements of , but neither standard errors nor the variance-covariance matrix will be calculated.
- Notes
In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.
The general linear regression model is defined by
where
is a vector of observations on the dependent variable,
is an matrix of the independent variables of column rank ,
is a vector of length of unknown parameters, and
is a vector of length of unknown random errors such that , where is a known diagonal matrix.
If , the identity matrix, then least squares estimation is used. If , then for a given weight matrix , weighted least squares estimation is used.
The least squares estimates of the parameters minimize while the weighted least squares estimates minimize .
linregm_fit
finds a decomposition of (or in weighted case), i.e.,where and is a upper triangular matrix and is an orthogonal matrix. If is of full rank, then is the solution to
where (or ) and is the first elements of . If is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of ,
where is a diagonal matrix with nonzero diagonal elements, being the rank of , and and are orthogonal matrices. This gives the solution
being the first columns of , i.e., , and being the first columns of .
Details of the SVD, are made available, in the form of the matrix :
This will be only one of the possible solutions. Other estimates may be obtained by applying constraints to the parameters. These solutions can be obtained by using
linregm_constrain()
after usinglinregm_fit
. Only certain linear combinations of the parameters will have unique estimates; these are known as estimable functions.The fit of the model can be examined by considering the residuals, , where are the fitted values. The fitted values can be written as for an matrix . The th diagonal elements of , , give a measure of the influence of the th values of the independent variables on the fitted regression model. The values are sometimes known as leverages. Both and are provided by
linregm_fit
.The output of
linregm_fit
also includes , the residual sum of squares and associated degrees of freedom, , the standard errors of the parameter estimates and the variance-covariance matrix of the parameter estimates.In many linear regression models the first term is taken as a mean term or an intercept, i.e., , for . This is provided as an option. Also only some of the possible independent variables are required to be included in a model, a facility to select variables to be included in the model is provided.
Details of the decomposition and, if used, the SVD, are made available. These allow the regression to be updated by adding or deleting an observation using
linregm_obs_edit()
, adding or deleting a variable usinglinregm_var_add()
andlinregm_var_del()
or estimating and testing an estimable function usinglinregm_estfunc()
. For the same matrix of independent variables, a new set of parameter estimates can be quickly calculated from a new vector of dependent variables usinglinregm_fit_newvar()
. The details of the factorizations held in , and are only for use by this suite of functions and cannot be used by other functions that use such factorizations, e.g.,lapackeig.dormqr
since these will expect a different storage scheme for the input factorization.
- References
Cook, R D and Weisberg, S, 1982, Residuals and Influence in Regression, Chapman and Hall
Draper, N R and Smith, H, 1985, Applied Regression Analysis, (2nd Edition), Wiley
Golub, G H and Van Loan, C F, 1996, Matrix Computations, (3rd Edition), Johns Hopkins University Press, Baltimore
Hammarling, S, 1985, The singular value decomposition in multivariate statistics, SIGNUM Newsl. (20(3)), 2–25
McCullagh, P and Nelder, J A, 1983, Generalized Linear Models, Chapman and Hall
Searle, S R, 1971, Linear Models, Wiley