naginterfaces.library.correg.quantile_linreg¶
- naginterfaces.library.correg.quantile_linreg(sorder, dat, isx, y, tau, intcpt='Y', wt=None, b=None, comm=None, statecomm=None, io_manager=None)[source]¶
quantile_linreg
performs a multiple linear quantile regression. Parameter estimates and, if required, confidence limits, covariance matrices and residuals are calculated.quantile_linreg
may be used to perform a weighted quantile regression. A simplified interface forquantile_linreg
is provided byquantile_linreg_easy()
.Note: this function uses optional algorithmic parameters, see also:
optset()
,optget()
.For full information please refer to the NAG Library document for g02qg
https://support.nag.com/numeric/nl/nagdoc_30.2/flhtml/g02/g02qgf.html
- Parameters
- sorderint
Determines the storage order of variates supplied in .
- datfloat, array-like, shape
Note: the required extent for this argument in dimension 1 is determined as follows: if : ; otherwise: .
Note: the required extent for this argument in dimension 2 is determined as follows: if : ; otherwise: .
The th value for the th variate, for , for , must be supplied in
if , and
if .
The design matrix is constructed from , and .
- isxint, array-like, shape
Indicates which independent variables are to be included in the model.
The th variate, supplied in , is not included in the regression model.
The th variate, supplied in , is included in the regression model.
- yfloat, array-like, shape
, the observations on the dependent variable.
- taufloat, array-like, shape
The vector of quantiles of interest. A separate model is fitted to each quantile.
- intcptstr, length 1, optional
Indicates whether an intercept will be included in the model. The intercept is included by adding a column of ones as the first column in the design matrix, .
An intercept will be included in the model.
An intercept will not be included in the model.
- wtNone or float, array-like, shape , optional
Note: the required length for this argument is determined as follows: if : ; otherwise: .
If not None, must contain the diagonal elements of the weight matrix .
If weights are not provided then must be set to None.
When
If , the th observation is not included in the model, in which case the effective number of observations, , is the number of observations with nonzero weights. If , the values of will be set to zero for observations with zero weights.
All observations are included in the model and the effective number of observations is , i.e., .
- bNone or float, array-like, shape , optional
If , must hold an initial estimates for , for , for . If , need not be set.
- commNone or dict, communication object, optional
Communication structure.
If not None, this argument must have been initialized by a prior call to
optset()
.- statecommNone or dict, RNG communication object, optional, modified in place
RNG communication structure.
If , this argument must have been initialized by a prior call to
rand.init_repeat
orrand.init_nonrepeat
.- io_managerFileObjManager, optional
Manager for I/O in this routine.
- Returns
- dffloat
The degrees of freedom given by , where is the effective number of observations and is the rank of the cross-product matrix .
- bfloat, ndarray, shape
, for , contains the estimates of the parameters of the regression model, , estimated for .
If , will contain the estimate corresponding to the intercept and will contain the coefficient of the th variate contained in , where is the th nonzero value in the array .
If , will contain the coefficient of the th variate contained in , where is the th nonzero value in the array .
- blNone or float, ndarray, shape
If , contains the lower limit of an confidence interval for , for , for .
If , is not referenced.
The method used for calculating the interval is controlled by the options ‘Interval Method’ and ‘Bootstrap Interval Method’.
The size of the interval, , is controlled by the option ‘Significance Level’.
- buNone or float, ndarray, shape
If , contains the upper limit of an confidence interval for , for , for .
If , is not referenced.
The method used for calculating the interval is controlled by the options ‘Interval Method’ and ‘Bootstrap Interval Method’.
The size of the interval, is controlled by the option ‘Significance Level’.
- chNone or float, ndarray, shape
Depending on the supplied options, will either not be referenced, hold an estimate of the upper triangular part of the covariance matrix, , or an estimate of the upper triangular parts of and .
If or , is not referenced.
If or and , is not referenced.
Otherwise, for and :
If , holds an estimate of the covariance between and .
If , holds an estimate of the th element of and holds an estimate of the th element of , for .
The method used for calculating and is controlled by the option ‘Interval Method’.
- resNone or float, ndarray, shape
If , holds the (weighted) residuals, , for , for , for .
If and , the value of will be set to zero for observations with zero weights.
If , is returned as None.
- infoint, ndarray, shape
holds additional information concerning the model fitting and confidence limit calculations when .
Code
Warning
Model fitted and confidence limits (if requested) calculated successfully
The function did not converge. The returned values are based on the estimate at the last iteration. Try increasing ‘Iteration Limit’ whilst calculating the parameter estimates or relaxing the definition of convergence by increasing ‘Tolerance’.
A singular matrix was encountered during the optimization. The model was not fitted for this value of .
Some truncation occurred whilst calculating the confidence limits for this value of . See Algorithmic Details for details. The returned upper and lower limits may be narrower than specified.
The function did not converge whilst calculating the confidence limits. The returned limits are based on the estimate at the last iteration. Try increasing ‘Iteration Limit’.
Confidence limits for this value of could not be calculated. The returned upper and lower limits are set to a large positive and large negative value respectively as defined by the option ‘Big’.
It is possible for multiple warnings to be applicable to a single model.
In these cases the value returned in is the sum of the corresponding individual nonzero warning codes.
- Other Parameters
- ‘Band Width Alpha’float
Default
A multiplier used to construct the parameter used when calculating the Sheather–Hall bandwidth (see Notes), with . Here, is the ‘Significance Level’.
- ‘Band Width Method’str
Default
The method used to calculate the bandwidth used in the calculation of the asymptotic covariance matrix and if , or (see Notes).
- ‘Big’float
Default
This argument should be set to something larger than the biggest value supplied in and .
- ‘Bootstrap Interval Method’str
Default
If , ‘Bootstrap Interval Method’ controls how the confidence intervals are calculated from the bootstrap estimates.
intervals are calculated. That is, the covariance matrix, is calculated from the bootstrap estimates and the limits calculated as where is the percentage point from a Student’s distribution on degrees of freedom, is the effective number of observations and is given by the option ‘Significance Level’.
Quantile intervals are calculated. That is, the upper and lower limits are taken as the and quantiles of the bootstrap estimates, as calculated using
stat.quantiles
.
- ‘Bootstrap Iterations’int
Default
The number of bootstrap samples used to calculate the confidence limits and covariance matrix (if requested) when .
- ‘Bootstrap Monitoring’str
Default
If and , the parameter estimates for each of the bootstrap samples are displayed. This information is sent to the unit number specified by ‘Unit Number’.
- ‘Calculate Initial Values’str
Default
If then the initial values for the regression parameters, , are calculated from the data. Otherwise they must be supplied in .
- ‘Defaults’valueless
This special keyword is used to reset all options to their default values.
- ‘Drop Zero Weights’str
Default
If a weighted regression is being performed and then observations with zero weight are dropped from the analysis. Otherwise such observations are included.
- ‘Epsilon’float
Default
, the tolerance used when calculating the covariance matrix and the initial values for and . For additional details see Calculation of Covariance Matrix and Additional information respectively.
- ‘Interval Method’str
Default
The value of ‘Interval Method’ controls whether confidence limits are returned in and and how these limits are calculated. This argument also controls how the matrices returned in are calculated.
No limits are calculated and , and are not referenced.
The Powell Sandwich method with a Gaussian kernel is used.
The Hendricks–Koenker Sandwich is used.
The errors are assumed to be identical, and independently distributed.
A bootstrap method is used, where sampling is done on the pair . The number of bootstrap samples is controlled by the argument ‘Bootstrap Iterations’ and the type of interval constructed from the bootstrap samples is controlled by ‘Bootstrap Interval Method’.
- ‘Iteration Limit’int
Default
The maximum number of iterations to be performed by the interior point optimization algorithm.
- ‘Matrix Returned’str
Default
The value of ‘Matrix Returned’ controls the type of matrices returned in . If , this argument is ignored and is not referenced. Otherwise:
No matrices are returned and is not referenced.
The covariance matrices are returned.
If or , the matrices and are returned. Otherwise no matrices are returned and is not referenced.
The matrices returned are calculated as described in Notes, with the algorithm used specified by ‘Interval Method’. In the case of the covariance matrix is calculated directly from the bootstrap estimates.
- ‘Monitoring’str
Default
If then the duality gap is displayed at each iteration of the interior point optimization algorithm. In addition, the final estimates for are also displayed.
The monitoring information is sent to the unit number specified by ‘Unit Number’.
- ‘QR Tolerance’float
Default
The tolerance used to calculate the rank, , of the cross-product matrix, . Letting be the orthogonal matrix obtained from a decomposition of , then the rank is calculated by comparing with .
If the cross-product matrix is rank deficient, the parameter estimates for the columns with the smallest values of are set to zero, along with the corresponding entries in , and , if returned. This is equivalent to dropping these variables from the model. Details on the decomposition used can be found in
lapackeig.dgeqp3
.- ‘Return Residuals’str
Default
If , the residuals are returned in . Otherwise is not referenced.
- ‘Sigma’float
Default
The scaling factor used when calculating the affine scaling step size (see equation [equation]).
- ‘Significance Level’float
Default
, the size of the confidence interval whose limits are returned in and .
- ‘Tolerance’float
Default
Convergence tolerance. The optimization is deemed to have converged if the duality gap is less than ‘Tolerance’ (see Update and convergence).
- ‘Unit Number’int
Default
The unit number to which any monitoring information is sent.
- Raises
- NagValueError
- (errno )
On entry, .
Constraint: or .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: or , for all .
- (errno )
On entry, and .
Constraint: .
- (errno )
On entry, is not consistent with or : , .
- (errno )
On entry, .
Constraint: , for all .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: where is the machine precision returned by
machine.precision
, for all .- (errno )
On entry, either the option arrays have not been initialized or they have been corrupted.
- (errno )
On entry, [‘state’] vector has been corrupted or not initialized.
- Warns
- NagAlgorithmicWarning
- (errno )
A potential problem occurred whilst fitting the model(s).
Additional information has been returned in .
- Notes
Given a vector of observed values, , an design matrix , a column vector, , of length holding the th row of and a quantile ,
quantile_linreg
estimates the -element vector as the solution towhere is the piecewise linear loss function , and is an indicator function taking the value if and otherwise. Weights can be incorporated by replacing and with and respectively, where is an diagonal matrix. Observations with zero weights can either be included or excluded from the analysis; this is in contrast to least squares regression where such observations do not contribute to the objective function and are, therefore, always dropped.
quantile_linreg
uses the interior point algorithm of Portnoy and Koenker (1997), described briefly in Algorithmic Details, to obtain the parameter estimates , for a given value of .Under the assumption of Normally distributed errors, Koenker (2005) shows that the limiting covariance matrix of has the form
where and is a function of , as described below. Given an estimate of the covariance matrix, , lower () and upper () limits for an confidence interval can be calculated for each of the parameters, via
where is the percentile of the Student’s distribution with degrees of freedom, where is the rank of the cross-product matrix .
Four methods for estimating the covariance matrix, , are available:
Independent, identically distributed (IID) errors
Under an assumption of IID errors the asymptotic relationship for simplifies to
where is the sparsity function.
quantile_linreg
estimates from the residuals, and a bandwidth .Powell Sandwich
Powell (1991) suggested estimating the matrix by a kernel estimator of the form
where is a kernel function and satisfies and . When the Powell method is chosen,
quantile_linreg
uses a Gaussian kernel (i.e., ) and setswhere is a bandwidth, and are, respectively, the standard deviation and the and quantiles for the residuals, .
Hendricks–Koenker Sandwich
Koenker (2005) suggested estimating the matrix using
where is a bandwidth and denotes the parameter estimates obtained from a quantile regression using the th quantile. Similarly with .
Bootstrap
The last method uses bootstrapping to either estimate a covariance matrix or obtain confidence intervals for the parameter estimates directly. This method, therefore, does not assume Normally distributed errors. Samples of size are taken from the paired data (i.e., the independent and dependent variables are sampled together). A quantile regression is then fitted to each sample resulting in a series of bootstrap estimates for the model parameters, . A covariance matrix can then be calculated directly from this series of values. Alternatively, confidence limits, and , can be obtained directly from the and sample quantiles of the bootstrap estimates.
Further details of the algorithms used to calculate the covariance matrices can be found in Algorithmic Details.
All three asymptotic estimates of the covariance matrix require a bandwidth, . Two alternative methods for determining this are provided:
Sheather–Hall
for a user-supplied value ,
Bofinger
quantile_linreg
allows options to be supplied via the [‘iopts’] and [‘opts’] arrays (see Other Parameters for details of the available options). Prior to callingquantile_linreg
the option arrays, [‘iopts’] and [‘opts’] must be initialized by callingoptset()
with set to (see Other Parameters for details on the available options). If bootstrap confidence limits are required () then one of the random number initialization functionsrand.init_repeat
(for a repeatable analysis) orrand.init_nonrepeat
(for an unrepeatable analysis) must also have been previously called.
- References
Koenker, R, 2005, Quantile Regression, Econometric Society Monographs, Cambridge University Press, New York
Mehrotra, S, 1992, On the implementation of a primal-dual interior point method, SIAM J. Optim. (2), 575–601
Nocedal, J and Wright, S J, 2006, Numerical Optimization, (2nd Edition), Springer Series in Operations Research, Springer, New York
Portnoy, S and Koenker, R, 1997, The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute error estimators, Statistical Science (4), 279–300
Powell, J L, 1991, Estimation of monotonic regression models under quantile restrictions, Nonparametric and Semiparametric Methods in Econometrics, Cambridge University Press, Cambridge