naginterfaces.library.correg.linregm_​fit

naginterfaces.library.correg.linregm_fit(x, isx, y, mean='M', wt=None, tol=1e-06)[source]

linregm_fit performs a general multiple linear regression when the independent variables may be linearly dependent. Parameter estimates, standard errors, residuals and influence statistics are computed. linregm_fit may be used to perform a weighted regression.

For full information please refer to the NAG Library document for g02da

https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02daf.html

Parameters
xfloat, array-like, shape

must contain the th observation for the th independent variable, for , for .

isxint, array-like, shape

Indicates which independent variables are to be included in the model.

The variable contained in the th column of is included in the regression model.

yfloat, array-like, shape

, the observations on the dependent variable.

meanstr, length 1, optional

Indicates if a mean term is to be included.

A mean term, intercept, will be included in the model.

The model will pass through the origin, zero-point.

wtNone or float, array-like, shape , optional

If provided must contain the weights to be used with the model.

If , the th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.

The values of and will be set to zero for observations with zero weights.

If is not provided the effective number of observations is .

tolfloat, optional

The value of is used to decide if the independent variables are of full rank and if not what is the rank of the independent variables. The smaller the value of the stricter the criterion for selecting the singular value decomposition. If , the singular value decomposition will never be used; this may cause run time errors or inaccurate results if the independent variables are not of full rank.

Returns
rssfloat

The residual sum of squares for the regression.

idfint

The degrees of freedom associated with the residual sum of squares.

bfloat, ndarray, shape

, contains the least squares estimates of the parameters of the regression model, .

If , will contain the estimate of the mean parameter and will contain the coefficient of the variable contained in column of , where is the th positive value in the array .

If , will contain the coefficient of the variable contained in column of , where is the th positive value in the array .

sefloat, ndarray, shape

, contains the standard errors of the parameter estimates given in .

covfloat, ndarray, shape

The first elements of contain the upper triangular part of the variance-covariance matrix of the parameter estimates given in . They are stored packed by column, i.e., the covariance between the parameter estimate given in and the parameter estimate given in , , is stored in .

resfloat, ndarray, shape

The (weighted) residuals, , for .

hfloat, ndarray, shape

The diagonal elements of , , for .

qfloat, ndarray, shape

The results of the decomposition:

the first column of contains ;

the upper triangular part of columns to contain the matrix;

the strictly lower triangular part of columns to contain details of the matrix.

svdbool

If a singular value decomposition has been performed then will be , otherwise will be .

irankint

The rank of the independent variables.

If , .

If , is an estimate of the rank of the independent variables.

is calculated as the number of singular values greater that (largest singular value).

It is possible for the SVD to be carried out but to be returned as .

pfloat, ndarray, shape

Details of the decomposition and SVD if used.

If , only the first elements of are used these will contain the zeta values for the decomposition (see lapackeig.dgeqrf for details).

If , the first elements of will contain the zeta values for the decomposition (see lapackeig.dgeqrf for details) and the next elements of contain singular values.

The following by elements contain the matrix stored by columns.

wkfloat, ndarray, shape

If on exit , contains information which is needed by linregm_fit_newvar(); otherwise is used as workspace.

Raises
NagValueError
(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, and .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: or .

(errno )

On entry, .

Constraint: or .

(errno )

On entry, .

Constraint: , for .

(errno )

On entry, .

Constraint: , for .

(errno )

On entry, .

Constraint: must be compatible with the number of nonzero elements in .

(errno )

SVD solution failed to converge.

Warns
NagAlgorithmicWarning
(errno )

The degrees of freedom for the residuals are zero, i.e., the designated number of arguments is equal to the effective number of observations. In this case the parameter estimates will be returned along with the diagonal elements of , but neither standard errors nor the variance-covariance matrix will be calculated.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

The general linear regression model is defined by

where

is a vector of observations on the dependent variable,

is an matrix of the independent variables of column rank ,

is a vector of length of unknown parameters, and

is a vector of length of unknown random errors such that , where is a known diagonal matrix.

If , the identity matrix, then least squares estimation is used. If , then for a given weight matrix , weighted least squares estimation is used.

The least squares estimates of the parameters minimize while the weighted least squares estimates minimize .

linregm_fit finds a decomposition of (or in weighted case), i.e.,

where and is a upper triangular matrix and is an orthogonal matrix. If is of full rank, then is the solution to

where (or ) and is the first elements of . If is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of ,

where is a diagonal matrix with nonzero diagonal elements, being the rank of , and and are orthogonal matrices. This gives the solution

being the first columns of , i.e., , and being the first columns of .

Details of the SVD, are made available, in the form of the matrix :

This will be only one of the possible solutions. Other estimates may be obtained by applying constraints to the parameters. These solutions can be obtained by using linregm_constrain() after using linregm_fit. Only certain linear combinations of the parameters will have unique estimates; these are known as estimable functions.

The fit of the model can be examined by considering the residuals, , where are the fitted values. The fitted values can be written as for an matrix . The th diagonal elements of , , give a measure of the influence of the th values of the independent variables on the fitted regression model. The values are sometimes known as leverages. Both and are provided by linregm_fit.

The output of linregm_fit also includes , the residual sum of squares and associated degrees of freedom, , the standard errors of the parameter estimates and the variance-covariance matrix of the parameter estimates.

In many linear regression models the first term is taken as a mean term or an intercept, i.e., , for . This is provided as an option. Also only some of the possible independent variables are required to be included in a model, a facility to select variables to be included in the model is provided.

Details of the decomposition and, if used, the SVD, are made available. These allow the regression to be updated by adding or deleting an observation using linregm_obs_edit(), adding or deleting a variable using linregm_var_add() and linregm_var_del() or estimating and testing an estimable function using linregm_estfunc(). For the same matrix of independent variables, a new set of parameter estimates can be quickly calculated from a new vector of dependent variables using linregm_fit_newvar(). The details of the factorizations held in , and are only for use by this suite of functions and cannot be used by other functions that use such factorizations, e.g., lapackeig.dormqr since these will expect a different storage scheme for the input factorization.

References

Cook, R D and Weisberg, S, 1982, Residuals and Influence in Regression, Chapman and Hall

Draper, N R and Smith, H, 1985, Applied Regression Analysis, (2nd Edition), Wiley

Golub, G H and Van Loan, C F, 1996, Matrix Computations, (3rd Edition), Johns Hopkins University Press, Baltimore

Hammarling, S, 1985, The singular value decomposition in multivariate statistics, SIGNUM Newsl. (20(3)), 2–25

McCullagh, P and Nelder, J A, 1983, Generalized Linear Models, Chapman and Hall

Searle, S R, 1971, Linear Models, Wiley