PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_correg_quantile_linreg (g02qg)
Purpose
nag_correg_quantile_linreg (g02qg) performs a multiple linear quantile regression. Parameter estimates and, if required, confidence limits, covariance matrices and residuals are calculated.
nag_correg_quantile_linreg (g02qg) may be used to perform a weighted quantile regression. A simplified interface for
nag_correg_quantile_linreg (g02qg) is provided by
nag_correg_quantile_linreg_easy (g02qf).
Syntax
[
df,
b,
bl,
bu,
ch,
res,
state,
info,
ifail] = g02qg(
sorder,
intcpt,
weight,
dat,
isx,
y,
tau,
b,
iopts,
opts,
state, 'n',
n, 'm',
m, 'ip',
ip, 'wt',
wt, 'ntau',
ntau)
[
df,
b,
bl,
bu,
ch,
res,
state,
info,
ifail] = nag_correg_quantile_linreg(
sorder,
intcpt,
weight,
dat,
isx,
y,
tau,
b,
iopts,
opts,
state, 'n',
n, 'm',
m, 'ip',
ip, 'wt',
wt, 'ntau',
ntau)
Description
Given a vector of
observed values,
, an
design matrix
, a column vector,
, of length
holding the
th row of
and a quantile
,
nag_correg_quantile_linreg (g02qg) estimates the
-element vector
as the solution to
where
is the piecewise linear loss function
, and
is an indicator function taking the value
if
and
otherwise. Weights can be incorporated by replacing
and
with
and
respectively, where
is an
diagonal matrix. Observations with zero weights can either be included or excluded from the analysis; this is in contrast to least squares regression where such observations do not contribute to the objective function and are therefore always dropped.
nag_correg_quantile_linreg (g02qg) uses the interior point algorithm of
Portnoy and Koenker (1997), described briefly in
Algorithmic Details, to obtain the parameter estimates
, for a given value of
.
Under the assumption of Normally distributed errors,
Koenker (2005) shows that the limiting covariance matrix of
has the form
where
and
is a function of
, as described below. Given an estimate of the covariance matrix,
, lower (
) and upper (
) limits for an
confidence interval can be calculated for each of the
parameters, via
where
is the
percentile of the Student's
distribution with
degrees of freedom, where
is the rank of the cross-product matrix
.
Four methods for estimating the covariance matrix,
, are available:
(i) |
Independent, identically distributed (IID) errors
Under an assumption of IID errors the asymptotic relationship for simplifies to
where is the sparsity function. nag_correg_quantile_linreg (g02qg) estimates from the residuals,
and a bandwidth .
|
(ii) |
Powell Sandwich
Powell (1991) suggested estimating the matrix by a kernel estimator of the form
where is a kernel function and satisfies and . When the Powell method is chosen, nag_correg_quantile_linreg (g02qg) uses a Gaussian kernel (i.e., ) and sets
where is a bandwidth, and are, respectively, the standard deviation and the and quantiles for the residuals, .
|
(iii) |
Hendricks–Koenker Sandwich
Koenker (2005) suggested estimating the matrix using
where is a bandwidth and denotes the parameter estimates obtained from a quantile regression using the th quantile. Similarly with .
|
(iv) |
Bootstrap
The last method uses bootstrapping to either estimate a covariance matrix or obtain confidence intervals for the parameter estimates directly. This method therefore does not assume Normally distributed errors. Samples of size are taken from the paired data (i.e., the independent and dependent variables are sampled together). A quantile regression is then fitted to each sample resulting in a series of bootstrap estimates for the model parameters, . A covariance matrix can then be calculated directly from this series of values. Alternatively, confidence limits, and , can be obtained directly from the and sample quantiles of the bootstrap estimates. |
Further details of the algorithms used to calculate the covariance matrices can be found in
Algorithmic Details.
All three asymptotic estimates of the covariance matrix require a bandwidth,
. Two alternative methods for determining this are provided:
(i) |
Sheather–Hall
for a user-supplied value ,
|
(ii) |
Bofinger
|
nag_correg_quantile_linreg (g02qg) allows optional arguments to be supplied via the
iopts and
opts arrays (see
Optional Parameters for details of the available options).
Prior
to calling
nag_correg_quantile_linreg (g02qg) the optional parameter arrays,
iopts and
opts
must be initialized by calling
nag_correg_optset (g02zk) with
optstr set to
(see
Optional Parameters for details on the available options). If bootstrap confidence limits are required (
) then one of the random number initialization functions
nag_rand_init_repeat (g05kf) (for a repeatable analysis) or
nag_rand_init_nonrepeat (g05kg) (for an unrepeatable analysis) must also have been previously called.
References
Koenker R (2005) Quantile Regression Econometric Society Monographs, Cambridge University Press, New York
Mehrotra S (1992) On the implementation of a primal-dual interior point method SIAM J. Optim. 2 575–601
Nocedal J and Wright S J (1999) Numerical Optimization Springer Series in Operations Research, Springer, New York
Portnoy S and Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute error estimators Statistical Science 4 279–300
Powell J L (1991) Estimation of monotonic regression models under quantile restrictions Nonparametric and Semiparametric Methods in Econometrics Cambridge University Press, Cambridge
Parameters
Compulsory Input Parameters
- 1:
– int64int32nag_int scalar
-
Determines the storage order of variates supplied in
dat.
Constraint:
or .
- 2:
– string (length ≥ 1)
-
Indicates whether an intercept will be included in the model. The intercept is included by adding a column of ones as the first column in the design matrix,
.
- An intercept will be included in the model.
- An intercept will not be included in the model.
Constraint:
or .
- 3:
– string (length ≥ 1)
-
Indicates if weights are to be used.
- A weighted regression model is fitted to the data using weights supplied in array wt.
- An unweighted regression model is fitted to the data and array wt is not referenced.
Constraint:
or .
- 4:
– double array
-
The first dimension,
, of the array
dat must satisfy
- if , ;
- otherwise .
The second dimension of the array
dat must be at least
if
and at least
if
.
The
th value for the
th variate, for
and
, must be supplied in
- if , and
- if .
The design matrix
is constructed from
dat,
isx and
intcpt.
- 5:
– int64int32nag_int array
-
Indicates which independent variables are to be included in the model.
- The th variate, supplied in dat, is not included in the regression model.
- The th variate, supplied in dat, is included in the regression model.
Constraints:
- or , for ;
- if , exactly values of isx must be set to ;
- if , exactly ip values of isx must be set to .
- 6:
– double array
-
, the observations on the dependent variable.
- 7:
– double array
-
The vector of quantiles of interest. A separate model is fitted to each quantile.
Constraint:
where
is the
machine precision returned by
nag_machine_precision (x02aj), for
.
- 8:
– double array
-
If
,
must hold an initial estimates for
, for
and
. If
,
b need not be set.
- 9:
– int64int32nag_int array
-
Note: the dimension of this array is dictated by the requirements of associated functions that must have been previously called. This array
must be the same array passed as argument
iopts in the previous call to
nag_correg_optset (g02zk).
Optional parameter array, as initialized by a call to
nag_correg_optset (g02zk).
- 10:
– double array
-
Note: the dimension of this array is dictated by the requirements of associated functions that must have been previously called. This array
must be the same array passed as argument
opts in the previous call to
nag_correg_optset (g02zk).
Optional parameter array, as initialized by a call to
nag_correg_optset (g02zk).
- 11:
– int64int32nag_int array
-
Note: the actual argument supplied
must be the array
state supplied to the initialization routines
nag_rand_init_repeat (g05kf) or
nag_rand_init_nonrepeat (g05kg).
If
,
state contains information about the selected random number generator. Otherwise
state is not referenced.
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the dimension of the array
y and the first dimension of the array
dat. (An error is raised if these dimensions are not equal.)
The total number of observations in the dataset. If no weights are supplied, or no zero weights are supplied or observations with zero weights are included in the model then . Otherwise the number of observations with zero weights.
Constraint:
.
- 2:
– int64int32nag_int scalar
-
Default:
the dimension of the array
isx and the first dimension of the array
dat. (An error is raised if these dimensions are not equal.)
, the total number of variates in the dataset.
Constraint:
.
- 3:
– int64int32nag_int scalar
-
Default:
the first dimension of the array
b.
, the number of independent variables in the model, including the intercept, see
intcpt, if present.
Constraints:
- ;
- if , ;
- if , .
- 4:
– double array
-
The dimension of the array
wt
must be at least
if
If
,
wt must contain the diagonal elements of the weight matrix
. Otherwise
wt is not referenced.
When
- If , the th observation is not included in the model, in which case the effective number of observations, , is the number of observations with nonzero weights. If , the values of res will be set to zero for observations with zero weights.
- All observations are included in the model and the effective number of observations is n, i.e., .
Constraints:
- If , , for ;
- The effective number of observations .
- 5:
– int64int32nag_int scalar
-
Default:
the dimension of the array
tau and the second dimension of the array
b. (An error is raised if these dimensions are not equal.)
The number of quantiles of interest.
Constraint:
.
Output Parameters
- 1:
– double scalar
-
The degrees of freedom given by , where is the effective number of observations and is the rank of the cross-product matrix .
- 2:
– double array
-
, for
, contains the estimates of the parameters of the regression model,
, estimated for
.
If
,
will contain the estimate corresponding to the intercept and
will contain the coefficient of the
th variate contained in
dat, where
is the
th nonzero value in the array
isx.
If
,
will contain the coefficient of the
th variate contained in
dat, where
is the
th nonzero value in the array
isx.
- 3:
– double array
-
The second dimension of the array
bl will be
if
.
If
,
contains the lower limit of an
confidence interval for
, for
and
.
If
,
bl is not referenced.
The method used for calculating the interval is controlled by the optional parameters
Interval Method and
Bootstrap Interval Method. The size of the interval,
, is controlled by the optional parameter
Significance Level.
- 4:
– double array
-
The second dimension of the array
bu will be
if
.
If
,
contains the upper limit of an
confidence interval for
, for
and
.
If
,
bu is not referenced.
The method used for calculating the interval is controlled by the optional parameters
Interval Method and
Bootstrap Interval Method. The size of the interval,
is controlled by the optional parameter
Significance Level.
- 5:
– double array
-
The last dimension of the array
ch will be
if
and
and at least
if
,
or
and
Depending on the supplied optional parameters,
ch will either not be referenced, hold an estimate of the upper triangular part of the covariance matrix,
, or an estimate of the upper triangular parts of
and
.
If
or
,
ch is not referenced.
If
or
and
,
ch is not referenced.
Otherwise, for
and
:
- If , holds an estimate of the covariance between and .
- If , holds an estimate of the th element of and holds an estimate of the th element of , for .
The method used for calculating
and
is controlled by the optional parameter
Interval Method.
- 6:
– double array
-
The second dimension of the array
res will be
if
.
If
,
holds the (weighted) residuals,
, for
, for
and
.
If
and
, the value of
res will be set to zero for observations with zero weights.
If
,
res is not referenced.
- 7:
– int64int32nag_int array
-
- 8:
– int64int32nag_int array
-
holds additional information concerning the model fitting and confidence limit calculations when
.
Code | Warning |
| Model fitted and confidence limits (if requested) calculated successfully |
| The function did not converge. The returned values are based on the estimate at the last iteration. Try increasing Iteration Limit whilst calculating the parameter estimates or relaxing the definition of convergence by increasing Tolerance. |
| A singular matrix was encountered during the optimization. The model was not fitted for this value of . |
| Some truncation occurred whilst calculating the confidence limits for this value of . See Algorithmic Details for details. The returned upper and lower limits may be narrower than specified. |
| The function did not converge whilst calculating the confidence limits. The returned limits are based on the estimate at the last iteration. Try increasing Iteration Limit. |
| Confidence limits for this value of could not be calculated. The returned upper and lower limits are set to a large positive and large negative value respectively as defined by the optional parameter Big. |
It is possible for multiple warnings to be applicable to a single model. In these cases the value returned in
info is the sum of the corresponding individual nonzero warning codes.
- 9:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
-
-
Constraint: or .
-
-
On entry, was an illegal value.
-
-
On entry,
weight had an illegal value.
-
-
Constraint: .
-
-
Constraint: .
-
-
Constraint: .
-
-
Constraint: .
-
-
Constraint: or for all .
-
-
Constraint: .
-
-
On entry,
ip is not consistent with
isx or
intcpt.
-
-
Constraint: for all .
-
-
Constraint: .
-
-
Constraint: .
-
-
On entry is invalid.
-
-
On entry, either the option arrays have not been initialized or they have been corrupted.
-
-
On entry,
state vector has been corrupted or not initialized.
-
-
A potential problem occurred whilst fitting the model(s).
Additional information has been returned in
info.
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
Not applicable.
Further Comments
nag_correg_quantile_linreg (g02qg) allocates internally approximately the following elements of double storage:
. If then a further
elements are required, and this increases by
if . Where possible, any user-supplied output arrays are used as workspace and so the amount actually allocated may be less. If , , and an internal copy of the input data is avoided and the amount of locally allocated memory is reduced by .
Example
A quantile regression model is fitted to Engels 1857 study of household expenditure on food. The model regresses the dependent variable, household food expenditure, against two explanatory variables, a column of ones and household income. The model is fit for five different values of and the covariance matrix is estimated assuming Normal IID errors. Both the covariance matrix and the residuals are returned.
Open in the MATLAB editor:
g02qg_example
function g02qg_example
fprintf('g02qg example results\n\n');
sorder = int64(1);
c1 = 'y';
weight = 'u';
dat = [ 420.1577; 541.4117; 901.1575; 639.0802; 750.8756; 945.7989;
829.3979; 979.1648; 1309.8789; 1492.3987; 502.8390; 616.7168;
790.9225; 555.8786; 713.4412; 838.7561; 535.0766; 596.4408;
924.5619; 487.7583; 692.6397; 997.8770; 506.9995; 654.1587;
933.9193; 433.6813; 587.5962; 896.4746; 454.4782; 584.9989;
800.7990; 502.4369; 713.5197; 906.0006; 880.5969; 796.8289;
854.8791; 1167.3716; 523.8000; 670.7792; 377.0584; 851.5430;
1121.0937; 625.5179; 805.5377; 558.5812; 884.4005; 1257.4989;
2051.1789; 1466.3330; 730.0989; 2432.3910; 940.9218; 1177.8547;
1222.5939; 1519.5811; 687.6638; 953.1192; 953.1192; 953.1192;
939.0418; 1283.4025; 1511.5789; 1342.5821; 511.7980; 689.7988;
1532.3074; 1056.0808; 387.3195; 387.3195; 410.9987; 499.7510;
832.7554; 614.9986; 887.4658; 1595.1611; 1807.9520; 541.2006;
1057.6767; 800.7990; 1245.6964; 1201.0002; 634.4002; 956.2315;
1148.6010; 1768.8236; 2822.5330; 922.3548; 2293.1920; 627.4726;
889.9809; 1162.2000; 1197.0794; 530.7972; 1142.1526; 1088.0039;
484.6612; 1536.0201; 678.8974; 671.8802; 690.4683; 860.6948;
873.3095; 894.4598; 1148.6470; 926.8762; 839.0414; 829.4974;
1264.0043; 1937.9771; 698.8317; 920.4199; 1897.5711; 891.6824;
889.6784; 1221.4818; 544.5991; 1031.4491; 1462.9497; 830.4353;
975.0415; 1337.9983; 867.6427; 725.7459; 989.0056; 1525.0005;
672.1960; 923.3977; 472.3215; 590.7601; 831.7983; 1139.4945;
507.5169; 576.1972; 696.5991; 650.8180; 949.5802; 497.1193;
570.1674; 724.7306; 408.3399; 638.6713; 1225.7890; 715.3701;
800.4708; 975.5974; 1613.7565; 608.5019; 958.6634; 835.9426;
1024.8177; 1006.4353; 726.0000; 494.4174; 776.5958; 415.4407;
581.3599; 643.3571; 2551.6615; 1795.3226; 1165.7734; 815.6212;
1264.2066; 1095.4056; 447.4479; 1178.9742; 975.8023; 1017.8522;
423.8798; 558.7767; 943.2487; 1348.3002; 2340.6174; 587.1792;
1540.9741; 1115.8481; 1044.6843; 1389.7929; 2497.7860; 1585.3809;
1862.0438; 2008.8546; 697.3099; 571.2517; 598.3465; 461.0977;
977.1107; 883.9849; 718.3594; 543.8971; 1587.3480; 4957.8130;
969.6838; 419.9980; 561.9990; 689.5988; 1398.5203; 820.8168;
875.1716; 1392.4499; 1256.3174; 1362.8590; 1999.2552; 1209.4730;
1125.0356; 1827.4010; 1014.1540; 880.3944; 873.7375; 951.4432;
473.0022; 601.0030; 713.9979; 829.2984; 959.7953; 1212.9613;
958.8743; 1129.4431; 1943.0419; 539.6388; 463.5990; 562.6400;
736.7584; 1415.4461; 2208.7897; 636.0009; 759.4010; 1078.8382;
748.6413; 987.6417; 788.0961; 1020.0225; 1230.9235; 440.5174;
743.0772];
y = [ 255.8394; 310.9587; 485.6800; 402.9974; 495.5608; 633.7978;
630.7566; 700.4409; 830.9586; 815.3602; 338.0014; 412.3613;
520.0006; 452.4015; 512.7201; 658.8395; 392.5995; 443.5586;
640.1164; 333.8394; 466.9583; 543.3969; 317.7198; 424.3209;
518.9617; 338.0014; 419.6412; 476.3200; 386.3602; 423.2783;
503.3572; 354.6389; 497.3182; 588.5195; 654.5971; 550.7274;
528.3770; 640.4813; 401.3204; 435.9990; 276.5606; 588.3488;
664.1978; 444.8602; 462.8995; 377.7792; 553.1504; 810.8962;
1067.9541; 1049.8788; 522.7012; 1424.8047; 517.9196; 830.9586;
925.5795; 1162.0024; 383.4580; 621.1173; 621.1173; 621.1173;
548.6002; 745.2353; 837.8005; 795.3402; 418.5976; 508.7974;
883.2780; 742.5276; 242.3202; 242.3202; 266.0010; 408.4992;
614.7588; 385.3184; 515.6200; 1138.1620; 993.9630; 299.1993;
750.3202; 572.0807; 907.3969; 811.5776; 427.7975; 649.9985;
860.6002; 1143.4211; 2032.6792; 590.6183; 1570.3911; 483.4800;
600.4804; 696.2021; 774.7962; 390.5984; 612.5619; 708.7622;
296.9192; 1071.4627; 496.5976; 503.3974; 357.6411; 430.3376;
624.6990; 582.5413; 580.2215; 543.8807; 588.6372; 627.9999;
712.1012; 968.3949; 482.5816; 593.1694; 1033.5658; 693.6795;
693.6795; 761.2791; 361.3981; 628.4522; 771.4486; 757.1187;
821.5970; 1022.3202; 679.4407; 538.7491; 679.9981; 977.0033;
561.2015; 728.3997; 372.3186; 361.5210; 620.8006; 819.9964;
360.8780; 395.7608; 442.0001; 404.0384; 670.7993; 297.5702;
353.4882; 383.9376; 284.8008; 431.1000; 801.3518; 448.4513;
577.9111; 570.5210; 865.3205; 444.5578; 680.4198; 576.2779;
708.4787; 734.2356; 433.0010; 327.4188; 485.5198; 305.4390;
468.0008; 459.8177; 863.9199; 831.4407; 534.7610; 392.0502;
934.9752; 813.3081; 263.7100; 769.0838; 630.5863; 645.9874;
319.5584; 348.4518; 614.5068; 662.0096; 1504.3708; 406.2180;
692.1689; 588.1371; 511.2609; 700.5600; 1301.1451; 879.0660;
912.8851; 1509.7812; 484.0605; 399.6703; 444.1001; 248.8101;
527.8014; 500.6313; 436.8107; 374.7990; 726.3921; 1827.2000;
523.4911; 334.9998; 473.2009; 581.2029; 929.7540; 591.1974;
637.5483; 674.9509; 776.7589; 959.5170; 1250.9643; 737.8201;
810.6772; 983.0009; 708.8968; 633.1200; 631.7982; 608.6419;
300.9999; 377.9984; 397.0015; 588.5195; 681.7616; 807.3603;
696.8011; 811.1962; 1305.7201; 442.0001; 353.6013; 468.0008;
526.7573; 890.2390; 1318.8033; 331.0005; 416.4015; 596.8406;
429.0399; 619.6408; 400.7990; 775.0209; 772.7611; 306.5191;
522.6019];
isx = [int64(1)];
tau = [0.10; 0.25; 0.50; 0.75; 0.90];
state = zeros(1, 1, 'int64');
ip = 2;
b = zeros(2, 5);
iopts = zeros(100, 1, 'int64');
opts = zeros(100, 1);
[iopts, opts, ifail] = g02zk( ...
'Initialize = g02qg', iopts, opts);
[iopts, opts, ifail] = g02zk( ...
'Return Residuals = Yes', iopts, opts);
[iopts, opts, ifail] = g02zk( ...
'Matrix Returned = Covariance', iopts, opts);
[iopts, opts, ifail] = g02zk( ...
'Interval Method = IID', iopts, opts);
[df, b, bl, bu, ch, res, state, info, ifail] = ...
g02qg( ...
sorder, c1, weight, dat, isx, y, tau, b, iopts, opts, state);
t = '\tau';
fig1 = figure;
hold on;
plot(dat,y,'+r');
tt{1} = 'data';
for l=1:numel(tau)
fprintf('\nQuantile: %6.3f\n\n', tau(l));
fprintf(' Lower Parameter Upper\n');
fprintf(' Limit Estimate Limit\n');
for j=1:2
fprintf('%3d %7.3f %7.3f %7.3f\n', j, bl(j,l), b(j,l), bu(j,l));
end
fprintf('\nCovariance matrix\n');
for i=1:ip
fprintf('%10.3e ', ch(1:i, i, l));
fprintf('\n');
end
fprintf('\n');
plot([0 (2000-b(1,l))/b(2,l)],[b(1,l) 2000]);
tt{l+1} = sprintf('%s = %4.2f',t,tau(l));
end
legend(tt,'Location','SouthEast')
xlabel('Household income');
ylabel('Household food expenditure');
title({'Quantile Regression', ...
' Study of Household Expenditure on Food', ...
'Engels 1857'});
axis([0 5000 0 2000]);
hold off;
if (numel(res) > 0)
fprintf('First 10 Residuals\n');
fprintf(' Quantile\n');
fprintf('Obs. %6.3f %6.3f %6.3f %6.3f %6.3f\n', tau);
for i=1:10
fprintf(' %3d %10.5f %10.5f %10.5f %10.5f %10.5f\n', i, res(i, 1:5));
end
else
fprintf('Residuals not returned\n');
end
g02qg example results
Quantile: 0.100
Lower Parameter Upper
Limit Estimate Limit
1 74.946 110.142 145.337
2 0.370 0.402 0.433
Covariance matrix
3.191e+02
-2.541e-01 2.587e-04
Quantile: 0.250
Lower Parameter Upper
Limit Estimate Limit
1 64.232 95.483 126.735
2 0.446 0.474 0.502
Covariance matrix
2.516e+02
-2.004e-01 2.039e-04
Quantile: 0.500
Lower Parameter Upper
Limit Estimate Limit
1 55.399 81.482 107.566
2 0.537 0.560 0.584
Covariance matrix
1.753e+02
-1.396e-01 1.421e-04
Quantile: 0.750
Lower Parameter Upper
Limit Estimate Limit
1 41.372 62.396 83.421
2 0.625 0.644 0.663
Covariance matrix
1.139e+02
-9.068e-02 9.230e-05
Quantile: 0.900
Lower Parameter Upper
Limit Estimate Limit
1 26.829 67.351 107.873
2 0.650 0.686 0.723
Covariance matrix
4.230e+02
-3.369e-01 3.429e-04
First 10 Residuals
Quantile
Obs. 0.100 0.250 0.500 0.750 0.900
1 -23.10718 -38.84219 -61.00711 -77.14462 -99.86551
2 -16.70358 -41.20981 -73.81193 -100.11463 -127.96277
3 13.48419 -37.04518 -100.61322 -157.07478 -200.13481
4 36.09526 4.52393 -36.48522 -70.97584 -102.95390
5 83.74310 44.08476 -6.54743 -50.41028 -87.11562
6 143.66660 89.90799 22.49734 -37.70668 -82.65437
7 187.39134 142.05288 84.66171 34.21603 -5.80963
8 196.90443 140.73220 70.44951 7.44831 -38.91027
9 194.55254 114.45726 15.70761 -75.01861 -135.36147
10 105.62394 12.32563 -102.13482 -208.16238 -276.22311
Algorithmic Details
By the addition of slack variables the minimization
(1) can be reformulated into the linear programming problem
and its associated dual
where
is a vector of
s. Setting
gives the equivalent formulation
The algorithm introduced by
Portnoy and Koenker (1997) and used by
nag_correg_quantile_linreg (g02qg), uses the primal-dual formulation expressed in equations
(2) and
(4) along with a logarithmic barrier function to obtain estimates for
. The algorithm is based on the predictor-corrector algorithm of
Mehrotra (1992) and further details can be obtained from
Portnoy and Koenker (1997) and
Koenker (2005). A good description of linear programming, interior point algorithms, barrier functions and Mehrotra's predictor-corrector algorithm can be found in
Nocedal and Wright (1999).
Interior Point Algorithm
In this section a brief description of the interior point algorithm used to estimate the model parameters is presented. It should be noted that there are some differences in the equations given here – particularly
(7) and
(9) – compared to those given in
Koenker (2005) and
Portnoy and Koenker (1997).
Central path
Rather than optimize
(4) directly, an additional slack variable
is added and the constraint
is replaced with
, for
.
The positivity constraint on
and
is handled using the logarithmic barrier function
The primal-dual form of the problem is used giving the Lagrangian
whose central path is described by the following first order conditions
where
denotes the diagonal matrix with diagonal elements given by
, similarly with
and
. By enforcing the inequalities on
and
strictly, i.e.,
and
for all
we ensure that
and
are positive definite diagonal matrices and hence
and
exist.
Rather than applying Newton's method to the system of equations given in
(5) to obtain the step directions
and
, Mehrotra substituted the steps directly into
(5) giving the augmented system of equations
where
and
denote the diagonal matrices with diagonal elements given by
and
respectively.
Affine scaling step
The affine scaling step is constructed by setting
in
(5) and applying Newton's method to obtain an intermediate set of step directions
where
.
Initial step sizes for the primal (
) and dual (
) parameters are constructed as
where
is a user-supplied scaling factor. If
then the nonlinearity adjustment, described in
Nonlinearity Adjustment, is not made and the model parameters are updated using the current step size and directions.
Nonlinearity Adjustment
In the nonlinearity adjustment step a new estimate of
is obtained by letting
and estimating
as
This estimate, along with the nonlinear terms (
,
,
and
) from
(6) are calculated using the values of
and
obtained from the affine scaling step.
Given an updated estimate for
and the nonlinear terms the system of equations
are solved and updated values for
and
calculated.
Update and convergence
At each iteration the model parameters
are updated using step directions,
and step lengths
.
Convergence is assessed using the duality gap, that is, the differences between the objective function in the primal and dual formulations. For any feasible point
the duality gap can be calculated from equations
(2) and
(3) as
and the optimization terminates if the duality gap is smaller than the tolerance supplied in the optional parameter
Tolerance.
Additional information
Initial values are required for the parameters
and
. If not supplied by the user, initial values for
are calculated from a least squares regression of
on
. This regression is carried out by first constructing the cross-product matrix
and then using a pivoted
decomposition as performed by
nag_lapack_dgeqp3 (f08bf). In addition, if the cross-product matrix is not of full rank, a rank reduction is carried out and, rather than using the full design matrix,
, a matrix formed from the first
-rank columns of
is used instead, where
is the pivot matrix used during the
decomposition. Parameter estimates, confidence intervals and the rows and columns of the matrices returned in the argument
ch (if any) are set to zero for variables dropped during the rank-reduction. The rank reduction step is performed irrespective of whether initial values are supplied by the user.
Once initial values have been obtained for
, the initial values for
and
are calculated from the residuals. If
then a value of
is used instead, where
is supplied in the optional parameter
Epsilon. The initial values for the
and
are always set to
and
respectively.
The solution for
in both
(7) and
(9) is obtained using a Bunch–Kaufman decomposition, as implemented in
nag_lapack_dsytrf (f07md).
Calculation of Covariance Matrix
nag_correg_quantile_linreg (g02qg) supplies four methods to calculate the covariance matrices associated with the parameter estimates for
. This section gives some additional detail on three of the algorithms, the fourth, (which uses bootstrapping), is described in
Description.
(i) |
Independent, identically distributed (IID) errors
When assuming IID errors, the covariance matrices depend on the sparsity, , which nag_correg_quantile_linreg (g02qg) estimates as follows:
(a) |
Let denote the residuals from the original quantile regression, that is
.
|
(b) |
Drop any residual where is less than , supplied in the optional parameter Epsilon. |
(c) |
Sort and relabel the remaining residuals in ascending order, by absolute value, so that
.
|
(d) |
Select the first values where , for some bandwidth . |
(e) |
Sort and relabel these residuals again, so that
and regress them against a design matrix with two columns () and rows given by
using quantile regression with .
|
(f) |
Use the resulting estimate of the slope as an estimate of the sparsity. |
|
(ii) |
Powell Sandwich
When using the Powell Sandwich to estimate the matrix , the quantity
is calculated. Dependent on the value of and the method used to calculate the bandwidth ( ), it is possible for the quantities to be too large or small, compared to machine precision ( ). More specifically, when , or , a warning flag is raised in info, the value is truncated to or respectively and the covariance matrix calculated as usual. |
(iii) |
Hendricks–Koenker Sandwich
The Hendricks–Koenker Sandwich requires the calculation of the quantity
.
As with the Powell Sandwich, in cases where , or , a warning flag is raised in info, the value truncated to or respectively and the covariance matrix calculated as usual.
In addition, it is required that , in this method. Hence, instead of using
in the calculation of ,
is used instead, where is supplied in the optional parameter Epsilon.
|
Optional Parameters
Several optional parameters in
nag_correg_quantile_linreg (g02qg) control aspects of the optimization algorithm, methodology used, logic or output. Their values are contained in the arrays
iopts and
opts; these must be initialized before calling
nag_correg_quantile_linreg (g02qg) by first calling
nag_correg_optset (g02zk) with
optstr set to
.
Each optional parameter has an associated default value; to set any of them to a non-default value, use
nag_correg_optset (g02zk). The current value of an optional parameter can be queried using
nag_correg_optget (g02zl).
The remainder of this section can be skipped if you wish to use the default values for all optional parameters.
The following is a list of the optional parameters available. A full description of each optional parameter is provided in
Description of the s.
Description of the Optional Parameters
For each option, we give a summary line, a description of the optional parameter and details of constraints.
The summary line contains:
- the keywords, where the minimum abbreviation of each keyword is underlined (if no characters of an optional qualifier are underlined, the qualifier may be omitted);
- a parameter value,
where the letters , and denote options that take character, integer and real values respectively;
- the default value, where the symbol is a generic notation for machine precision (see nag_machine_precision (x02aj)).
Keywords and character values are case and white space insensitive.
Band Width Alpha Default
A multiplier used to construct the parameter
used when calculating the Sheather–Hall bandwidth (see
Description), with
. Here,
is the
Significance Level.
Constraint:
.
Band Width Method Default
The method used to calculate the bandwidth used in the calculation of the asymptotic covariance matrix
and
if
,
or
(see
Description).
Constraint:
or .
Big Default
This parameter should be set to something larger than the biggest value supplied in
dat and
y.
Constraint:
.
Bootstrap Interval Method Default
If
,
Bootstrap Interval Method controls how the confidence intervals are calculated from the bootstrap estimates.
- intervals are calculated. That is, the covariance matrix, is calculated from the bootstrap estimates and the limits calculated as where is the percentage point from a Student's distribution on degrees of freedom, is the effective number of observations and is given by the optional parameter Significance Level.
- Quantile intervals are calculated. That is, the upper and lower limits are taken as the and quantiles of the bootstrap estimates, as calculated using nag_stat_quantiles (g01am).
Constraint:
or .
Bootstrap Iterations Default
The number of bootstrap samples used to calculate the confidence limits and covariance matrix (if requested) when .
Constraint:
.
Bootstrap Monitoring Default
If
and
, then the parameter estimates for each of the bootstrap samples are displayed. This information is sent to the unit number specified by
Unit Number.
Constraint:
or .
Calculate Initial Values Default
If
then the initial values for the regression parameters,
, are calculated from the data. Otherwise they must be supplied in
b.
Constraint:
or .
Defaults
This special keyword is used to reset all optional parameters to their default values.
Drop Zero Weights Default
If a weighted regression is being performed and then observations with zero weight are dropped from the analysis. Otherwise such observations are included.
Constraint:
or .
Epsilon Default
, the tolerance used when calculating the covariance matrix and the initial values for
and
. For additional details see
Calculation of Covariance Matrix and
Additional information respectively.
Constraint:
.
Interval Method Default
The value of
Interval Method controls whether confidence limits are returned in
bl and
bu and how these limits are calculated. This parameter also controls how the matrices returned in
ch are calculated.
- No limits are calculated and bl, bu and ch are not referenced.
- The Powell Sandwich method with a Gaussian kernel is used.
- The Hendricks–Koenker Sandwich is used.
- The errors are assumed to be identical, and independently distributed.
- A bootstrap method is used, where sampling is done on the pair . The number of bootstrap samples is controlled by the parameter Bootstrap Iterations and the type of interval constructed from the bootstrap samples is controlled by Bootstrap Interval Method.
Constraint:
, , , or .
Iteration Limit Default
The maximum number of iterations to be performed by the interior point optimization algorithm.
Constraint:
.
Matrix Returned Default
The value of
Matrix Returned controls the type of matrices returned in
ch. If
, this parameter is ignored and
ch is not referenced. Otherwise:
- No matrices are returned and ch is not referenced.
- The covariance matrices are returned.
- If or , the matrices and are returned. Otherwise no matrices are returned and ch is not referenced.
The matrices returned are calculated as described in
Description, with the algorithm used specified by
Interval Method. In the case of
the covariance matrix is calculated directly from the bootstrap estimates.
Constraint:
, or .
Monitoring Default
If then the duality gap is displayed at each iteration of the interior point optimization algorithm. In addition, the final estimates for are also displayed.
The monitoring information is sent to the unit number specified by
Unit Number.
Constraint:
or .
QR Tolerance Default
The tolerance used to calculate the rank, , of the cross-product matrix, . Letting be the orthogonal matrix obtained from a decomposition of , then the rank is calculated by comparing with .
If the cross-product matrix is rank deficient, then the parameter estimates for the
columns with the smallest values of
are set to zero, along with the corresponding entries in
bl,
bu and
ch, if returned. This is equivalent to dropping these variables from the model. Details on the
decomposition used can be found in
nag_lapack_dgeqp3 (f08bf).
Constraint:
.
Return Residuals Default
If
, the residuals are returned in
res. Otherwise
res is not referenced.
Constraint:
or .
Sigma Default
The scaling factor used when calculating the affine scaling step size (see equation
(8)).
Constraint:
.
Significance Level Default
, the size of the confidence interval whose limits are returned in
bl and
bu.
Constraint:
.
Tolerance Default
Convergence tolerance. The optimization is deemed to have converged if the duality gap is less than
Tolerance (see
Update and convergence).
Constraint:
.
Unit Number Default taken from nag_file_set_unit_advisory (x04ab)
The unit number to which any monitoring information is sent.
Constraint:
.
Description of Monitoring Information
See the description of the optional argument
Monitoring.
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015