naginterfaces.library.anova.dummyvars

naginterfaces.library.anova.dummyvars(typ, levels, ifact, v)[source]

dummyvars computes orthogonal polynomial or dummy variables for a factor or classification variable.

For full information please refer to the NAG Library document for g04ea

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g04/g04eaf.html

Parameters
typstr, length 1

The type of dummy variable to be computed.

If , an orthogonal Polynomial representation is computed.

If , a Helmert matrix representation is computed.

If , the contrasts relative to the First level are computed.

If , the contrasts relative to the Last level are computed.

If , a Complete set of dummy variables is computed.

levelsint

, the number of levels of the factor.

ifactint, array-like, shape

The values of the factor.

vfloat, array-like, shape

Note: the required length for this argument is determined as follows: if : ; otherwise: .

If , the distinct values of the underlying variable for which the orthogonal polynomial is to be computed.

If , is not referenced.

Returns
xfloat, ndarray, shape

The matrix of dummy variables, where if , , or and if .

repfloat, ndarray, shape

The number of replications for each level of the factor, , for .

Raises
NagValueError
(errno )

On entry, .

Constraint: , , , or .

(errno )

On entry, and .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, not all values of are distinct.

(errno )

On entry, not all levels are present in .

(errno )

On entry, , and .

Constraint: .

Warns
NagAlgorithmicWarning
(errno )

The polynomial has all elements zero.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

In the analysis of an experimental design using a general linear model the factors or classification variables that specify the design have to be coded as dummy variables. dummyvars computes dummy variables that can then be used in the fitting of the general linear model using correg.linregm_fit.

If the factor of length has levels then the simplest representation is to define dummy variables, such that if the factor is at level and otherwise for . However, there is usually a mean included in the model and the sum of the dummy variables will be aliased with the mean. To avoid the extra redundant argument dummy variables can be defined as the contrasts between one level of the factor, the reference level, and the remaining levels. If the reference level is the first level then the dummy variables can be defined as if the factor is at level and otherwise, for . Alternatively, the last level can be used as the reference level.

A second way of defining the dummy variables is to use a Helmert matrix in which levels are compared with the average effect of the previous levels. For example if then the contrasts would be:

Thus variable , for is given by

if factor is at level less than

if factor is at level

if factor is at level greater than

where is the number of replicates of level .

If the factor can be considered as a set of values from an underlying continuous variable then the factor can be represented by a set of orthogonal polynomials representing the linear, quadratic etc. effects of the underlying variable. The orthogonal polynomial is computed using Forsythe’s algorithm (Forsythe (1957), see also Cooper (1968)). The values of the underlying continuous variable represented by the factor levels have to be supplied to the function.

The orthogonal polynomials are standardized so that the sum of squares for each dummy variable is one. For the other methods integer () representations are retained except that in the Helmert representation the code of level in dummy variable will be a fraction.

References

Cooper, B E, 1968, Algorithm AS 10. The use of orthogonal polynomials, Appl. Statist. (17), 283–287

Forsythe, G E, 1957, Generation and use of orthogonal polynomials for data fitting with a digital computer, J. Soc. Indust. Appl. Math. (5), 74–88