library.correg Submodule

Module Summary

Interfaces for the NAG Mark 30.3 correg Chapter.

correg - Correlation and Regression Analysis

This module is concerned with two techniques

  1. correlation analysis and

  2. regression modelling,

both of which are concerned with determining the inter-relationships among two or more variables.

Other modules of the NAG Library which cover similar problems are submodule fit and submodule opt. Submodule fit functions may be used to fit linear models by criteria other than least squares and also for polynomial regression; submodule opt functions may be used to fit nonlinear models and linearly constrained linear models.

See Also

naginterfaces.library.examples.correg :

This subpackage contains examples for the correg module. See also the Examples subsection.

Functionality Index

Correlation-like coefficients

all variables

casewise treatment of missing values: coeffs_zero_miss_case()

no missing values: coeffs_zero()

pairwise treatment of missing values: coeffs_zero_miss_pair()

subset of variables

casewise treatment of missing values: coeffs_zero_subset_miss_case()

no missing values: coeffs_zero_subset()

pairwise treatment of missing values: coeffs_zero_subset_miss_pair()

Generalized linear models

binomial errors: glm_binomial()

computes estimable function: glm_estfunc()

gamma errors: glm_gamma()

Normal errors: glm_normal()

Poisson errors: glm_poisson()

prediction: glm_predict()

transform model parameters: glm_constrain()

Hierarchical mixed effects regression

initiation: mixeff_hier_init()

using maximum likelihood: mixeff_hier_ml()

using restricted maximum likelihood: mixeff_hier_reml()

Least angle regression (includes LASSO)

Additional parameter calculation: lars_param()

Model fitting

Cross-product matrix: lars_xtx()

Raw data: lars()

Linear mixed effects regression

fitting (via REML or ML): lmm_fit()

initiation: lmm_init()

initiation, combine: lmm_init_combine()

via maximum likelihood (ML): mixeff_ml()

via restricted maximum likelihood (REML): mixeff_reml()

Multiple linear regression

from correlation coefficients: linregm_coeffs_const()

from correlation-like coefficients: linregm_coeffs_noconst()

Multiple linear regression/General linear model

add/delete observation from model: linregm_obs_edit()

add independent variable to model: linregm_var_add()

computes estimable function: linregm_estfunc()

delete independent variable from model: linregm_var_del()

general linear regression model: linregm_fit()

regression for new dependent variable: linregm_fit_newvar()

regression parameters from updated model: linregm_update()

transform model parameters: linregm_constrain()

Nearest correlation matrix

fixed elements: corrmat_fixed()

fixed submatrix: corrmat_shrinking()

-factor structure: corrmat_nearest_kfactor()

method of Qi and Sun

element-wise weights: corrmat_h_weight()

unweighted, unbounded: corrmat_nearest()

weighted norm: corrmat_nearest_bounded()

rank-constrained: corrmat_nearest_rank()

shrinkage method: corrmat_target()

Non-parametric rank correlation (Kendall and/or Spearman)

missing values

casewise treatment of missing values

overwriting input data: coeffs_kspearman_miss_case_overwrite()

preserving input data: coeffs_kspearman_miss_case()

pairwise treatment of missing values: coeffs_kspearman_miss_pair()

no missing values

overwriting input data: coeffs_kspearman_overwrite()

preserving input data: coeffs_kspearman()

Partial least squares

calculates predictions given an estimated PLS model: pls_pred()

fits a PLS model for a given number of factors: pls_fit()

orthogonal scores using SVD: pls_svd()

orthogonal scores using Wold’s method: pls_wold()

Product-moment correlation

correlation coefficients, all variables

casewise treatment of missing values: coeffs_pearson_miss_case()

no missing values: coeffs_pearson()

pairwise treatment of missing values: coeffs_pearson_miss_pair()

correlation coefficients, subset of variables

casewise treatment of missing values: coeffs_pearson_subset_miss_case()

no missing values: coeffs_pearson_subset()

pairwise treatment of missing values: coeffs_pearson_subset_miss_pair()

correlation matrix

compute correlation and covariance matrices: corrmat()

compute from sum of squares matrix: ssqmat_to_corrmat()

compute partial correlation and covariance matrices: corrmat_partial()

sum of squares matrix

combine: ssqmat_combine()

compute: ssqmat()

update: ssqmat_update()

Quantile regression

linear

Residuals

Durbin–Watson test: linregm_stat_durbwat()

standardized residuals and influence statistics: linregm_stat_resinf()

Ridge regression

ridge parameter(s) supplied: ridge()

ridge parameter optimized: ridge_opt()

Robust correlation

Huber’s method: robustm_corr_huber()

user-supplied weight function only: robustm_corr_user()

user-supplied weight function plus derivatives: robustm_corr_user_deriv()

Robust regression

compute weights for use with robustm_user(): robustm_wts()

standard -estimates: robustm()

user-supplied weight functions: robustm_user()

variance-covariance matrix following robustm_user(): robustm_user_varmat()

Selecting regression model

all possible regressions: linregm_rssq()

forward selection: linregm_fit_onestep()

and statistics: linregm_rssq_stat()

Service functions

for multiple linear regression

reorder elements from vectors and matrices: linregm_service_reorder()

select elements from vectors and matrices: linregm_service_select()

general option getting function: optget()

general option setting function: optset()

Simple linear regression

no intercept: linregs_noconst()

no intercept with missing values: linregs_noconst_miss()

with intercept: linregs_const()

with intercept and with missing values: linregs_const_miss()

Stepwise linear regression

Clarke’s sweep algorithm: linregm_fit_stepwise()

For full information please refer to the NAG Library document

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g02/g02intro.html

Examples

naginterfaces.library.examples.correg.coeffs_kspearman_miss_case_ex.main()[source]

Example for naginterfaces.library.correg.coeffs_kspearman_miss_case().

Kendall and Spearman rank correlation coefficients.

>>> main()
naginterfaces.library.correg.coeffs_kspearman_miss_case Python Example
  Results.
Kendall and Spearman rank correlation coefficients.
Observations:
[
   1.70,  1.00,  0.50
   2.80,  4.00,  3.00
   0.60,  6.00,  2.50
   1.80,  9.00,  6.00
   0.99,  4.00,  2.50
   1.40,  2.00,  5.50
   1.80,  9.00,  7.50
   2.50,  7.00,  0.00
   0.99,  5.00,  3.00
]
Correlation coefficients:
[
      1.0000,     0.2941,     0.4058
      0.1429,     1.0000,     0.7537
      0.2760,     0.5521,     1.0000
]
naginterfaces.library.examples.correg.corrmat_nearest_ex.main()[source]

Example for naginterfaces.library.correg.corrmat_nearest().

Find a nearest correlation matrix.

>>> main()
naginterfaces.library.correg.corrmat_nearest Python Example Results.
The Frobenius-nearest correlation matrix to a given square matrix.
Symmetric nearest correlation matrix X:
[
    1.00e+00
   -8.08e-01,   1.00e+00
    1.92e-01,  -6.56e-01,   1.00e+00
    1.07e-01,   1.92e-01,  -8.08e-01,   1.00e+00
]
naginterfaces.library.examples.correg.glm_binomial_ex.main()[source]

Example for naginterfaces.library.correg.glm_binomial().

Use k-fold cross validation to estimate the true positive and negative rates of a prediction from a logistic regression model.

The data used in this example was simulated.

>>> main()
naginterfaces.library.correg.glm_binomial Python Example Results.
Use k-fold cross validation to estimate the true positive and
negative rates of a prediction from a logistic regression model.
                      Observed
            --------------------------
Predicted | Negative  Positive   Total
--------------------------------------
Negative  |    19         6        25
Positive  |     3        12        15
Total     |    22        18        40
True Positive Rate (Sensitivity):  0.67
True Negative Rate (Specificity):  0.86
naginterfaces.library.examples.correg.glm_normal_ex.main()[source]

Example for naginterfaces.library.correg.glm_normal().

Fits a generalized linear model with Normal errors.

>>> main()
naginterfaces.library.correg.glm_normal Python Example Results.
Fits a generalized linear model with Normal errors.
Fitted model summary:
RSS is 3.872e-01
Degrees of freedom 3
Term         Estimate   Standard Error
Variable:  0 -2.387e-02   2.779e-03
Variable:  1 6.381e-02   2.638e-03
naginterfaces.library.examples.correg.lars_ex.main()[source]

Example for naginterfaces.library.correg.lars().

Least angle regression.

>>> main()
naginterfaces.library.correg.lars Python Example Results.
Least angle regression.
Step               Parameter Estimate
-----------------------------------------------------------------
  1  0.000  0.000  3.125  0.000  0.000  0.000
  2  0.000  0.000  3.792  0.000  0.000 -0.713
  3 -0.446  0.000  3.998  0.000  0.000 -1.151
  4 -0.628 -0.295  4.098  0.000  0.000 -1.466
  5 -1.060 -1.056  4.110 -0.864  0.000 -1.948
  6 -1.073 -1.132  4.118 -0.935 -0.059 -1.981
-----------------------------------------------------------------
alpha: -50.037
-----------------------------------------------------------------
Step     Sum      RSS       df       Cp       Ck     Step Size
-----------------------------------------------------------------
  1    72.446  8929.855      2    13.355   123.227    72.446
  2   103.385  6404.701      3     7.054    50.781    24.841
  3   126.243  5258.247      4     5.286    30.836    16.225
  4   145.277  4657.051      5     5.309    19.319    11.587
  5   198.223  3959.401      6     5.016    12.266    24.520
  6   203.529  3954.571      7     7.000     0.910     2.198
-----------------------------------------------------------------
sigma^2:   304.198
naginterfaces.library.examples.correg.lars_param_ex.main()[source]

Example for naginterfaces.library.correg.lars_param().

Least angle regression, additional parameter estimates.

>>> main()
naginterfaces.library.correg.lars_param Python Example Results.
Parameter Estimates from lars_xtx
Step               Parameter Estimate
-----------------------------------------------------------------
  1  0.000  0.000  3.125  0.000  0.000  0.000
  2  0.000  0.000  3.792  0.000  0.000 -0.713
  3 -0.446  0.000  3.998  0.000  0.000 -1.151
  4 -0.628 -0.295  4.098  0.000  0.000 -1.466
  5 -1.060 -1.056  4.110 -0.864  0.000 -1.948
  6 -1.073 -1.132  4.118 -0.935 -0.059 -1.981
Additional Parameter Estimates from lars_param
  nk                 Parameter Estimate
-----------------------------------------------------------------
  0.2  0.000  0.000  0.625  0.000  0.000  0.000
  1.2  0.000  0.000  3.258  0.000  0.000 -0.143
  3.2 -0.483 -0.059  4.018  0.000  0.000 -1.214
  4.5 -0.844 -0.676  4.104 -0.432  0.000 -1.707
  5.2 -1.062 -1.071  4.112 -0.878 -0.012 -1.955
naginterfaces.library.examples.correg.linregm_fit_ex.main()[source]

Example for naginterfaces.library.correg.linregm_fit().

Fit a general (multiple) linear regression model.

>>> main()
naginterfaces.library.correg.linregm_fit Python Example Results.
Fit a general (multiple) linear regression model.
Fitted model summary:
Model is not of full rank
Rank: 4
RSS is 2.223e+01
Degrees of freedom 8
Term         Estimate  Standard Error
Intercept:   3.056e+01  3.849e-01
Variable:  1 5.447e+00  8.390e-01
Variable:  2 6.743e+00  8.390e-01
Variable:  3 1.105e+01  8.390e-01
Variable:  4 7.320e+00  8.390e-01
naginterfaces.library.examples.correg.linregm_fit_stepwise_ex.main()[source]

Example for naginterfaces.library.correg.linregm_fit_stepwise().

Stepwise linear regression.

>>> main()
naginterfaces.library.correg.linregm_fit_stepwise Python Example Results.
Stepwise linear regression.
 Starting Stepwise Selection

 Forward Selection
 Variable    1 Variance ratio =    1.260E+01
 Variable    2 Variance ratio =    2.196E+01
 Variable    3 Variance ratio =    4.403E+00
 Variable    4 Variance ratio =    2.280E+01

 Adding variable    4 to model

 Backward Selection
 Variable    4 Variance ratio =    2.280E+01

 Keeping all current variables

 Forward Selection
 Variable    1 Variance ratio =    1.082E+02
 Variable    2 Variance ratio =    1.725E-01
 Variable    3 Variance ratio =    4.029E+01

 Adding variable    1 to model

 Backward Selection
 Variable    1 Variance ratio =    1.082E+02
 Variable    4 Variance ratio =    1.593E+02

 Keeping all current variables

 Forward Selection
 Variable    2 Variance ratio =    5.026E+00
 Variable    3 Variance ratio =    4.236E+00

 Adding variable    2 to model

 Backward Selection
 Variable    1 Variance ratio =    1.540E+02
 Variable    2 Variance ratio =    5.026E+00
 Variable    4 Variance ratio =    1.863E+00

 Dropping variable    4 from model

 Forward Selection
 Variable    3 Variance ratio =    1.832E+00
 Variable    4 Variance ratio =    1.863E+00

 Finished Stepwise Selection
Fitted model summary:
Term         Estimate  Standard Error
Intercept:   5.258e+01  2.294e+00
Variable:  1 1.468e+00  1.213e-01
Variable:  2 6.623e-01  4.585e-02
RMS is 5.790e+00
naginterfaces.library.examples.correg.lmm_init_combine_ex.main()[source]

Example for naginterfaces.library.correg.lmm_init_combine() including calls to naginterfaces.library.blgm.lm_submodel(), naginterfaces.library.blgm.lm_describe_data(), naginterfaces.library.correg.lmm_init(), naginterfaces.library.correg.lmm_fit() and naginterfaces.library.blgm.handle_free()

Multi-level linear mixed effects regression model using restricted maximum likelihood.

The data used in this example was simulated.

>>> main()
naginterfaces.library.correg.lmm_init_combine Python Example Results.
Linear mixed effects regression model using REML.
Random Parameter Estimates
==========================
 Estimate Standard    Label
           Error
   0.683   0.506   V7 . V12 (Lvl 1)
...
   0.504   2.693   V4 (Lvl 3) . V11 (Lvl 3) . V10 (Lvl 2) . V12 (Lvl 3)
Fixed Parameter Estimates
=========================
 Estimate Standard    Label
           Error
   1.643   2.460   Intercept
  -1.622   0.855   V1 (Lvl 2)
  -2.482   1.142   V2 (Lvl 2)
   0.462   1.214   V2 (Lvl 3)
Variance Components
===================
  Estimate      Label
    0.563   V7 . V12
    5.820   V8 . V12
   10.860   V9 . V12
   19.628   V5 . V11 . V12
   40.534   V6 . V11 . V12
   36.323   V3 . V11 . V10 . V12
   12.451   V4 . V11 . V10 . V12
Sigma^2           =           0.003
-2 Log Likelihood =         608.195
naginterfaces.library.examples.correg.pls_ex.main()[source]

Example for naginterfaces.library.correg.pls().

Partial least squares (PLS) regression using singular value decomposition, parameter estimates, predictions.

>>> main()
naginterfaces.library.correg.pls Python Example Results.
PLS regression using SVD; param. estimates; predictions.
Begin regression.
Begin estimation.
Begin prediction.
Predictions:
[
  0.2133
  0.5153
  0.1438
  0.4460
  0.1716
  2.4808
  0.0963
  1.4476
  -0.1546
  -0.5492
  0.5393
  0.2685
  -1.1333
  1.7974
  0.4972
]
naginterfaces.library.examples.correg.quantile_linreg_ex.main()[source]

Example for naginterfaces.library.correg.quantile_linreg().

Multiple linear quantile regression (comprehensive interface.)

>>> main()
naginterfaces.library.correg.quantile_linreg Python Example Results.
Quantile regression model fitted to Engels' 1857 study of
household expenditure on food.
Quantile:  0.100
        Lower   Parameter   Upper
        Limit   Estimate    Limit
  0    74.946   110.142   145.337
  1     0.370     0.402     0.433
Covariance matrix:
[
   3.191e+02
  -2.541e-01,  2.587e-04
]
Quantile:  0.250
        Lower   Parameter   Upper
        Limit   Estimate    Limit
  0    64.232    95.483   126.735
  1     0.446     0.474     0.502
Covariance matrix:
[
   2.516e+02
  -2.004e-01,  2.039e-04
]
Quantile:  0.500
        Lower   Parameter   Upper
        Limit   Estimate    Limit
  0    55.399    81.482   107.566
  1     0.537     0.560     0.584
Covariance matrix:
[
   1.753e+02
  -1.396e-01,  1.421e-04
]
Quantile:  0.750
        Lower   Parameter   Upper
        Limit   Estimate    Limit
  0    41.372    62.396    83.421
  1     0.625     0.644     0.663
Covariance matrix:
[
   1.139e+02
  -9.068e-02,  9.230e-05
]
Quantile:  0.900
        Lower   Parameter   Upper
        Limit   Estimate    Limit
  0    26.829    67.351   107.873
  1     0.650     0.686     0.723
Covariance matrix:
[
   4.230e+02
  -3.369e-01,  3.429e-04
]
First 10 residuals:
                              Quantile
Obs.     0.10000    0.25000    0.50000    0.75000    0.90000
 1     -23.10718  -38.84219  -61.00711  -77.14462  -99.86551
 2     -16.70358  -41.20981  -73.81193 -100.11463 -127.96277
 3      13.48419  -37.04518 -100.61322 -157.07478 -200.13481
 4      36.09526    4.52393  -36.48522  -70.97584 -102.95390
 5      83.74310   44.08476   -6.54743  -50.41028  -87.11562
 6     143.66660   89.90799   22.49734  -37.70668  -82.65437
 7     187.39134  142.05288   84.66171   34.21603   -5.80963
 8     196.90443  140.73220   70.44951    7.44831  -38.91027
 9     194.55254  114.45726   15.70761  -75.01861 -135.36147
10     105.62394   12.32563 -102.13482 -208.16238 -276.22311
naginterfaces.library.examples.correg.ridge_opt_ex.main()[source]

Example for naginterfaces.library.correg.ridge_opt().

Ridge regression, optimizing a ridge regression parameter.

>>> main()
naginterfaces.library.correg.ridge_opt Python Example Results.
Ridge regression optimizing GCV prediction error for a body fat model.
Value of ridge parameter:     0.0712
Sum of squares of residuals:  1.0917e+02
Degrees of freedom: 16
Number of effective parameters:     2.9059
Parameter estimates
1     20.1950
2      9.7934
3      9.9576
4     -2.0125
Number of iterations: 6
Ridge parameter minimises GCV
Estimated prediction errors:
GCV    =     7.4718
UEV    =     6.3862
FPE    =     7.3141
BIC    =     8.2380
LOO CV =     7.5495
Residuals
1     -1.9894
2      3.5469
3     -3.0392
4     -3.0309
5     -0.1899
6     -0.3146
7      0.9775
8      4.0157
9      2.5332
10     -2.3560
11      0.5446
12      2.3989
13     -4.0876
14      3.2778
15      0.2894
16      0.7330
17     -0.7116
18     -0.6092
19     -2.9995
20      1.0110
Variance inflation factors
1      0.2928
2      0.4162
3      0.8089