Integer, Intent (In)	::	n, levels, ifact(n), ldx
Integer, Intent (Inout)	::	ifail
Real (Kind=nag_wp), Intent (In)	::	v(*)
Real (Kind=nag_wp), Intent (Inout)	::	x(ldx,*)
Real (Kind=nag_wp), Intent (Out)	::	rep(levels)
Character (1), Intent (In)	::	typ

C Header Interface

#include <nag.h>

void	g04eaf_ (const char typ, const Integer n, const Integer levels, const Integer ifact[], double x[], const Integer ldx, const double v[], double rep[], Integer *ifail, const Charlen length_typ)

The routine may be called by the names g04eaf or nagf_anova_dummyvars.

3 Description

In the analysis of an experimental design using a general linear model the factors or classification variables that specify the design have to be coded as dummy variables. g04eaf computes dummy variables that can then be used in the fitting of the general linear model using g02daf.

If the factor of length

n

has

k

levels then the simplest representation is to define

k

dummy variables,

X_{j}

such that

X_{j} = 1

if the factor is at level

j

and

0

otherwise for

j = 1, 2, \dots, k

. However, there is usually a mean included in the model and the sum of the dummy variables will be aliased with the mean. To avoid the extra redundant argument

k - 1

dummy variables can be defined as the contrasts between one level of the factor, the reference level, and the remaining levels. If the reference level is the first level then the dummy variables can be defined as

X_{j} = 1

if the factor is at level

j

and

0

otherwise, for

j = 2, 3, \dots, k

. Alternatively, the last level can be used as the reference level.

A second way of defining the

k - 1

dummy variables is to use a Helmert matrix in which levels

2, 3, \dots, k

are compared with the average effect of the previous levels. For example if

k = 4

then the contrasts would be:

\begin{array}{r} 1 & −1 & −1 & −1 \\ 2 & 1 & −1 & −1 \\ 3 & 0 & 2 & −1 \\ 4 & 0 & 0 & 3 \end{array}

Thus variable

j

, for

j = 1, 2, \dots, k - 1

is given by

$X_{j} = −1$ if factor is at level less than $j + 1$
$X_{j} = \sum_{i = 1}^{j} r_{i} / r_{j + 1}$ if factor is at level $j + 1$
$X_{j} = 0$ if factor is at level greater than $j + 1$

where

r_{j}

is the number of replicates of level

j

If the factor can be considered as a set of values from an underlying continuous variable then the factor can be represented by a set of

k - 1

orthogonal polynomials representing the linear, quadratic etc. effects of the underlying variable. The orthogonal polynomial is computed using Forsythe's algorithm (Forsythe (1957), see also Cooper (1968)). The values of the underlying continuous variable represented by the factor levels have to be supplied to the routine.

The orthogonal polynomials are standardized so that the sum of squares for each dummy variable is one. For the other methods integer (

\pm 1

) representations are retained except that in the Helmert representation the code of level

j + 1

in dummy variable

j

will be a fraction.

4 References

Cooper B E (1968) Algorithm AS 10. The use of orthogonal polynomials Appl. Statist. 17 283–287

Forsythe G E (1957) Generation and use of orthogonal polynomials for data fitting with a digital computer J. Soc. Indust. Appl. Math. 5 74–88

5 Arguments

1: $typ$ – Character(1) Input

On entry: the type of dummy variable to be computed.

If $typ ='P'$ , an orthogonal Polynomial representation is computed.
If $typ ='H'$ , a Helmert matrix representation is computed.
If $typ ='F'$ , the contrasts relative to the First level are computed.
If $typ ='L'$ , the contrasts relative to the Last level are computed.
If $typ ='C'$ , a Complete set of dummy variables is computed.

Constraint:

typ ='P'

'H'

'F'

'L'

'C'

2: $n$ – Integer Input

On entry:

n

, the number of observations for which the dummy variables are to be computed.

Constraint:

n \geq levels

3: $levels$ – Integer Input

On entry:

k

, the number of levels of the factor.

Constraint:

levels \geq 2

4: $ifact (n)$ – Integer array Input

On entry: the

n

values of the factor.

Constraint:

1 \leq ifact (i) \leq levels

, for

i = 1, 2, \dots, n

5: $x (ldx, *)$ – Real (Kind=nag_wp) array Output

Note: the second dimension of the array x must be at least

levels - 1

typ ='P'

'H'

'F'

'L'

and at least

levels

typ ='C'

On exit: the

n \times k^{*}

matrix of dummy variables, where

k^{*} = k - 1

typ ='P'

'H'

'F'

'L'

and

k^{*} = k

typ ='C'

6: $ldx$ – Integer Input

On entry: the first dimension of the array x as declared in the (sub)program from which g04eaf is called.

Constraint:

ldx \geq n

7: $v (*)$ – Real (Kind=nag_wp) array Input

Note: the dimension of the array v must be at least

levels

typ ='P'

, and at least

1

otherwise.

On entry: if

typ ='P'

, the

k

distinct values of the underlying variable for which the orthogonal polynomial is to be computed.

typ \neq'P'

, v is not referenced.

Constraint: if

typ ='P'

, the

k

values of v must be distinct.

8: $rep (levels)$ – Real (Kind=nag_wp) array Output

On exit: the number of replications for each level of the factor,

r_{i}

, for

i = 1, 2, \dots, k

9: $ifail$ – Integer Input/Output

On entry: ifail must be set to

0

−1

1

to set behaviour on detection of an error; these values have no effect when no error is detected.

A value of

0

causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of

−1

means that an error message is printed while a value of

1

means that it is not.

If halting is not appropriate, the value

−1

1

is recommended. If message printing is undesirable, then the value

1

is recommended. Otherwise, the value

0

is recommended. When the value $- 1$ or $1$ is used it is essential to test the value of ifail on exit.

On exit:

ifail = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry

ifail = 0

−1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Errors or warnings detected by the routine:

$ifail = 1$: On entry, $ldx = ⟨ value ⟩$ and $n = ⟨ value ⟩$ .
Constraint: $ldx \geq n$ .

On entry, $levels = ⟨ value ⟩$ .
Constraint: $levels \geq 2$ .

On entry, $n = ⟨ value ⟩$ and $levels = ⟨ value ⟩$ .
Constraint: $n \geq levels$ .

On entry, $typ = ⟨ value ⟩$ .
Constraint: $typ ='P'$ , $'H'$ , $'F'$ , $'L'$ or $'C'$ .

$ifail = 2$: On entry, $i = ⟨ value ⟩$ , $ifact (i) = ⟨ value ⟩$ and $levels = ⟨ value ⟩$ .
Constraint: $1 \leq levels (i) \leq levels$ .

On entry, not all levels are present in ifact.

On entry, not all values of v are distinct.

$ifail = 3$: The $⟨ value ⟩$ polynomial has all elements zero. This will be due to some values of v being very close together.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

The computations are stable.

8 Parallelism and Performance

Background information to multithreading can be found in the Multithreading documentation.

g04eaf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.

g04eaf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

Other routines for fitting polynomials can be found in Chapter E02.

10 Example

Data are read in from an experiment with four treatments and three observations per treatment with the treatment coded as a factor. g04eaf is used to compute the required dummy variables and the model is then fitted by g02daf.

g04ea: FL CL CPP AD PY MB

NAG FL Interfaceg04eaf (dummyvars)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG FL Interface
g04eaf (dummyvars)