g04bb:: Analysis of Variance (NAG Toolbox)

In a completely randomized design, experimental material is divided into a number of units, or plots, to which a treatment can be applied. In a randomized block design the units are grouped into blocks so that the variation within blocks is less than the variation between blocks. If every treatment is applied to one plot in each block it is a complete block design. If there are fewer plots per block than treatments then the design will be an incomplete block design and may be balanced or partially balanced.

For a completely randomized design, with

t

treatments and

n_{t}

plots per treatment, the linear model is

y_{i j} = μ + τ_{j} + e_{i j}, j = 1, 2, \dots, t ​ and ​ i = 1, 2, \dots, n_{j},

where

y_{i j}

is the

i

th observation for the

j

th treatment,

μ

is the overall mean,

τ_{j}

is the effect of the

j

th treatment and

e_{i j}

is the random error term. For a randomized block design, with

t

treatments and

b

blocks of

k

plots, the linear model is

y_{i j (l)} = μ + β_{i} + τ_{l} + e_{i j}, i = 1, 2, \dots, b, ​ j = 1, 2, \dots, k ​ and ​ l = 1, 2, \dots, t,

where

β_{i}

is the effect of the

i

th block and the

i j (l)

notation indicates that the

l

th treatment is applied to the

j

th plot in the

i

th block.

The completely randomized design gives rise to a one-way analysis of variance. The treatments do not have to be equally replicated, i.e., do not have to occur the same number of times. First the overall mean,

\hat{μ}

, is computed and subtracted from the observations to give

y_{i j}^{'} = y_{i j} - \hat{μ}

. The estimated treatment effects,

{\hat{τ}}_{j}

are then computed as the treatment means of the mean adjusted observations,

y_{i j}^{'}

, and the treatment sum of squares can be computed from the sum of squares of the treatment totals of the

y_{i j}^{'}

divided by the number of observations per treatment total,

n_{j}

. The final residuals are computed as

r_{i j} = y_{i j}^{'} - {\hat{τ}}_{j}

, and, from the residuals, the residual sum of squares is calculated.

For the randomized block design the mean is computed and subtracted from the observations to give

y_{i j (l)}^{'} = y_{i j (l)} - \hat{μ}

. The estimated block effects, ignoring treatment effects,

{\hat{β}}_{i}

, are then computed using the block means of the

y_{i j (l)}^{'}

and the unadjusted sum of squares computed as the sum of squared block totals for the

y_{i j (l)}^{'}

divided by number of plots per block,

k

. The block adjusted observations are then computed as

y_{i j (l)}^{''} = {y_{i}^{'} j}_{(l)} = {\hat{β}}_{i}

. In the case of the complete block design, with the same replication for each treatment within each block, the blocks and treatments are orthogonal, and so the treatment effects are estimated as the treatment means of the block adjusted observations,

y_{i j (l)}^{''}

. The treatment sum of squares is computed as the sum of squared treatment totals of the

y_{i j (l)}^{''}

divided by the number of replicates to the treatments,

r = b k / t

. Finally the residuals, and hence the residual sum of squares, are given by

r_{i j (l)} = y_{i j (l)}^{''} - {\hat{τ}}_{l}

For a design without the same replication for each treatment within each block the treatments and the blocks will not be orthogonal, so the treatments adjusted for blocks need to be computed. The adjusted treatment effects are found as the solution to the equations

(R - N N^{T} / k) \hat{τ} = q,

where

q

is the vector of the treatment totals for block adjusted observations,

y_{i j (l)}^{''}

R

is a diagonal matrix with

R_{l l}

equal to the number of times the

l

th treatment is replicated, and

n

is the

t

b

incidence matrix, with

N_{l j}

equal to the number of times treatment

l

occurs in block

j

. The solution to the equations can be written as

\hat{τ} = Ω q

where

Ω

is a generalized inverse of

(R - N N^{T} / k)

. The solution is found from the eigenvalue decomposition of

(R - N N^{T} / k)

. The residuals are first calculated by subtracting the estimated treatment effects from the block adjusted observations to give

r_{i j (l)}^{'} = y_{i j (l)}^{''} - {\hat{τ}}_{l}

. However, since only the unadjusted block effects have been removed and blocks and treatments are not orthogonal, the block means of the

r_{i j (l)}^{'}

have to be subtracted to give the correct residuals,

r_{i j (l)}

and residual sum of squares.

The mean squares are computed as the sum of squares divided by the degrees of freedom. The degrees of freedom for the unadjusted blocks is

b - 1

, for the completely randomized and the complete block designs the degrees of freedom for the treatments is

t - 1

. In the general case the degrees of freedom for treatments is the rank of the matrix

Ω

. The

F

-statistic given by the ratio of the treatment mean square to the residual mean square tests the hypothesis

H_{0} : τ_{1} = τ_{2} = \dots = τ_{t} = 0 .

The standard errors for the difference in treatment effects, or treatment means, for the completely randomized or the complete block designs, are given by:

se (τ_{j} - τ_{j *}) = (\frac{1}{n_{j}} + \frac{1}{n_{j *}}) s^{2}

where

s^{2}

is the residual mean square and

n_{j} = n_{j *} = b

in the complete block design. In the general case the variances of the treatment effects are given by

In the complete block design all the information on the treatment effects is given by the within block analysis. In other designs there may be a loss of information due to the non-orthogonality of treatments and blocks. The efficiency of the within block analysis in these cases is given by the (canonical) efficiency factors, these are the nonzero eigenvalues of the matrix

(R - N N^{T} / k)

, divided by the number of replicates in the case of equal replication, or by the mean of the number of replicates in the unequally replicated case, see John (1987). If more than one eigenvalue is zero then the design is said to be disconnected and some treatments can only be compared using a between block analysis.

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

Accuracy

Further Comments

To estimate missing values the Healy and Westmacott procedure or its derivatives may be used, see John and Quenouille (1977). This is an iterative procedure in which estimates of the missing values are adjusted by subtracting the corresponding values of the residuals. The new estimates are then used in the analysis of variance. This process is repeated until convergence. A suitable initial value may be the grand mean

\hat{μ}

. When using this procedure irdf should be set to the number of missing values plus one to obtain the correct degrees of freedom for the residual sum of squares.

For designs such as Graeco–Latin squares one or more of the blocking factors has to be removed in a preliminary analysis before the final analysis using calls to nag_anova_random (g04bb) or nag_anova_rowcol (g04bc). The residuals from the preliminary analysis are then input to nag_anova_random (g04bb). In these cases irdf should be set to the difference between n and the residual degrees of freedom from preliminary analysis. Care should be taken when using this approach as there is no check on the orthogonality of the two analyses.

For analysis of covariance the residuals are obtained from an analysis of variance of both the response variable and the covariates. The residuals from the response variable are then regressed on the residuals from the covariates using, say, nag_correg_linregs_noconst (g02cb) or nag_correg_linregm_fit (g02da). The results from those functions can be used to test for the significance of the covariates. To test the significance of the treatment effects after fitting the covariate, the residual sum of squares from the regression should be compared with the residual sum of squares obtained from the equivalent regression but using the residuals from fitting blocks only.

Example

The data, given by John and Quenouille (1977), are for a balanced incomplete block design with

10

blocks and

6

treatments and with

3

plots per block. The observations are the degree of pain experienced and the treatments are penicillin of different potency. The data is input and the analysis of variance table and treatment means are printed.

function g04bb_example


fprintf('g04bb example results\n\n');

% Data
y = [1;  5;  4;  5; 10;  6;  2;  9;  3;  4;  8;
     6;  2;  4;  7;  6;  7;  5;  5;  7;  2;  7;
     2;  4;  8;  4;  2; 10;  8;  7];

iblock = int64(10);
% Plot information
nt = int64(6);
it = [int64(1);  2;  3;  1;  2;  4;  1;  3;  5;  1;
              4;   6;  1;  5;  6;  2;  3;  6;  2;  4;
              5;   2;  5;  6;  3;  4;  5;  3;  4;  6];

tol = 5e-06;
irdf = int64(0);

% Calculate ANOVA table
[gmean, bmean, tmean, table, c, irep, r, ef, ifail] = ...
  g04bb( ...
         y, iblock, nt, it, tol, irdf);

% Display results
fprintf('ANOVA table\n\n');
fprintf(' Source        df         SS          MS          F        Prob\n\n');
fmt5 = '%s%5.0f%12.2f%12.2f%12.2f%11.4f\n';
fmt3 = '%s%5.0f%12.2f%12.2f\n';
fmt2 = '%s%5.0f%12.2f\n';
fprintf(fmt5, 'Blocks      ', table(1,1:5));
fprintf(fmt5, 'Treatments  ', table(2,1:5));
fprintf(fmt3, 'Residual    ', table(3,1:3));
fprintf(fmt2, 'Total       ', table(4,1:2));
fprintf('\nEfficiency Factors\n\n');
for j = 1:8:nt
  fprintf('%8.4f', ef(j:min(j+7,nt)));
  fprintf('\n');
end
fprintf('\n Grand Mean %10.2f\n\n', gmean);
fprintf('Treatment Means\n\n');
for j = 1:8:nt
  fprintf('%8.4f', tmean(j:min(j+7,nt)));
  fprintf('\n');
end
fprintf('\n');
[ifail] = x04ca( ...
                 'Lower', 'B', c, ...
                 'Standard errors of differences between means');

g04bb example results

ANOVA table

 Source        df         SS          MS          F        Prob

Blocks          9       60.00        6.67        4.79     0.0039
Treatments      5      101.78       20.36       14.62     0.0000
Residual       15       20.89        1.39
Total          29      182.67

Efficiency Factors

  0.0000  0.8000  0.8000  0.8000  0.8000  0.8000

 Grand Mean       5.33

Treatment Means

  2.5000  7.2500  8.0833  5.9167  2.9167  5.3333

 Standard errors of differences between means
          1       2       3       4       5       6
 1
 2   0.8344
 3   0.8344  0.8344
 4   0.8344  0.8344  0.8344
 5   0.8344  0.8344  0.8344  0.8344
 6   0.8344  0.8344  0.8344  0.8344  0.8344

On entry,	$n < 2$ ,
or	$nt \leq 0$ ,
or	$nt = 1$ and $abs (iblock) \leq 1$ ,
or	$ldtabl < 4$ ,
or	$ldc < nt$ ,
or	$tol < 0.0$ ,
or	$irdf < 0$ .

On entry,	$it (i) < 1$ or $it (i) > nt$ for some $i$ when $nt \geq 2$ ,
or	no value of $it = j$ for some $j = 1, 2, \dots, nt$ , when $nt \geq 2$ .

NAG Toolbox: nag_anova_random (g04bb)

▸▿ Contents

Purpose

Syntax

Description

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Accuracy

Further Comments

Example