nag_cp_stat (g02ecc) : NAG Library, Mark 24

3 Description

When selecting a linear regression model for a set of

n

observations a balance has to be found between the number of independent variables in the model and fit as measured by the residual sum of squares. The more variables included the smaller will be the residual sum of squares. Two statistics can help in selecting the best model.

(a)

R^{2}

represents the proportion of variation in the dependent variable that is explained by the independent variables.

R^{2} = \frac{Regression Sum of Squares}{Total Sum of Squares},

where	$Total Sum of Squares = tss = \sum {(y - \bar{y})}^{2}$ (if mean is fitted, otherwise $tss = \sum y^{2}$ ) and
	$Regression Sum of Squares = RegSS = tss - rss$ , where $rss = residual sum of squares = \sum {(y - \hat{y})}^{2}$ .

The

R^{2}

-values can be examined to find a model with a high

R^{2}

-value but with small number of independent variables.

(b)

C_{p}

statistic.

C_{p} = \frac{rss}{{\hat{σ}}^{2}} - (n - 2 p),

where

p

is the number of arguments (including the mean) in the model and

{\hat{σ}}^{2}

is an estimate of the true variance of the errors. This can often be obtained from fitting the full model.

A well fitting model will have

C_{p} ≃ p

C_{p}

is often plotted against

p

to see which models are closest to the

C_{p} = p

line.

nag_cp_stat (g02ecc) may be called after nag_all_regsn (g02eac) which calculates the residual sums of squares for all possible linear regression models.

5 Arguments

1: mean – Nag_IncludeMeanInput

On entry: indicates if a mean term is to be included.

$mean = Nag_MeanInclude$: A mean term, intercept, will be included in the model.
$mean = Nag_MeanZero$: The model will pass through the origin, zero-point.

Constraint:

mean = Nag_MeanInclude

Nag_MeanZero

2: n – IntegerInput

On entry:

n

, the number of observations used in the regression model.

Constraint:

n

must be greater than

2 \times p_{\max}

, where

p_{\max}

is the largest number of independent variables fitted (including the mean if fitted).

3: sigsq – doubleInput

On entry: the best estimate of true variance of the errors,

{\hat{σ}}^{2}

Constraint:

sigsq > 0.0

4: tss – doubleInput

On entry: the total sum of squares for the regression model.

Constraint:

tss > 0.0

5: nmod – IntegerInput

On entry: the number of regression models.

Constraint:

nmod > 0

6: nterms[nmod] – const IntegerInput

On entry:

nterms [i - 1]

must contain the number of independent variables (not counting the mean) fitted to the

i

th model, for

i = 1, 2, \dots, nmod

7: rss[nmod] – const doubleInput

On entry:

rss [i - 1]

must contain the residual sum of squares for the

i

th model.

Constraint:

rss [i - 1] \leq tss

, for

i = 1, 2, \dots, nmod

8: rsq[nmod] – doubleOutput

On exit:

rsq [i - 1]

contains the

R^{2}

-value for the

i

th model, for

i = 1, 2, \dots, nmod

9: cp[nmod] – doubleOutput

On exit:

cp [i - 1]

contains the

C_{p}

-value for the

i

th model, for

i = 1, 2, \dots, nmod

10: fail – NagError *Input/Output

The NAG error argument (see Section 3.6 in the Essential Introduction).

6 Error Indicators and Warnings

NE_BAD_PARAM

On entry, argument

⟨value⟩

had an illegal value.

NE_INT

On entry,

nmod = ⟨value⟩

.
Constraint:

nmod > 0

NE_INTERNAL_ERROR

An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.

NE_MODEL_PARAMETERS

On entry, number of parameters for model

⟨value⟩

is too large for n.

n = ⟨value⟩

, number of parameters

= ⟨value⟩

NE_REAL

On entry,

sigsq = ⟨value⟩

.
Constraint:

sigsq > 0.0

On entry,

tss = ⟨value⟩

.
Constraint:

tss > 0.0

NE_REAL_ARRAY_ELEM_CONS

On entry,

cp [⟨value⟩] = ⟨value⟩

.
Constraint:

cp [i] \geq 0.0

, for all

i

On entry,

rss [⟨value⟩] = ⟨value⟩

and

tss = ⟨value⟩

.
Constraint:

rss [i] \leq tss

, for all

i

The data, from an oxygen uptake experiment, is given by Weisberg (1985). The independent and dependent variables are read and the residual sums of squares for all possible models computed using nag_all_regsn (g02eac). The values of

R^{2}

and

C_{p}

are then computed and printed along with the names of variables in the models.

NAG Library Function Document

nag_cp_stat (g02ecc)

+− Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG Library Function Documentnag_cp_stat (g02ecc)

+− Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG Library Function Document

nag_cp_stat (g02ecc)