nag_anova_hier2 (g04ag) performs an analysis of variance for a two-way hierarchical classification with subgroups of possibly unequal size, and also computes the treatment group and subgroup means. A fixed effects model is assumed.

Syntax

[ngp, gbar, sgbar, gm, ss, idf, f, fp, ifail] = g04ag(y, lsub, nobs, 'n', n, 'k', k, 'l', l)

[ngp, gbar, sgbar, gm, ss, idf, f, fp, ifail] = nag_anova_hier2(y, lsub, nobs, 'n', n, 'k', k, 'l', l)

Description

In a two-way hierarchical classification, there are

k

(

\geq 2

) treatment groups, the

i

th of which is subdivided into

l_{i}

treatment subgroups. The

j

th subgroup of group

i

contains

n_{i j}

observations, which may be denoted by

y_{1 i j}, y_{2 i j}, \dots, y_{n_{i j} i j} .

The general observation is denoted by

y_{m i j}

, being the

m

th observation in subgroup

j

of group

i

, for

1 \leq i \leq k

1 \leq j \leq l_{i}

1 \leq m \leq n_{i j}

The following quantities are computed

(i)

The subgroup means

{\bar{y}}_{. i j} = \frac{\sum_{m = 1}^{n_{i j}} y_{m i j}}{n_{i j}}

(ii)

The group means

{\bar{y}}_{. i .} = \frac{\sum_{j = 1}^{l_{i}} \sum_{m = 1}^{n_{i j}} y_{m i j}}{\sum_{j = 1}^{l_{i}} n_{i j}}

(iii)

The grand mean

{\bar{y}}_{\dots} = \frac{\sum_{i = 1}^{k} \sum_{j = 1}^{l_{i}} \sum_{m = 1}^{n_{i j}} y_{m i j}}{\sum_{i = 1}^{k} \sum_{j = 1}^{l_{i}} n_{i j}}

(iv)

The number of observations in each group

n_{i .} = \sum_{j = 1}^{l_{i}} n_{i j}

(v)

Sums of squares

\begin{array}{l} Between groups & = {SS}_{g} = \sum_{i = 1}^{k} n_{i .} {({\bar{y}}_{. i .} - {\bar{y}}_{\dots})}^{2} \\ Between subgroups within groups & = {SS}_{s g} = \sum_{i = 1}^{k} \sum_{j = 1}^{l_{i}} n_{i j} {(y_{. i j} - {\bar{y}}_{. i .})}^{2} \\ Residual (within subgroups) & = {SS}_{res} = \sum_{i = 1}^{k} \sum_{j = 1}^{l_{i}} \sum_{m = 1}^{n_{i j}} {(y_{m i j} - {\bar{y}}_{. i j})}^{2} = {SS}_{tot} - {SS}_{g} - {SS}_{s g} \\ Corrected total & = {SS}_{tot} = \sum_{i = 1}^{k} \sum_{j = 1}^{l_{i}} \sum_{m = 1}^{n_{i j}} {(y_{m i j} - {\bar{y}}_{\dots})}^{2} \end{array}

(vi)

Degrees of freedom of variance components

Between groups:	$k - 1$
Subgroups within groups:	$l - k$
Residual:	$n - l$
Total:	$n - 1$

where

$l = \sum_{i = 1}^{k} l_{i}$ ,
$n = \sum_{i = 1}^{k} n_{i .}$

(vii)

F

ratios. These are the ratios of the group and subgroup mean squares to the residual mean square.

Groups	$F_{1} = \frac{Between groups sum of squares / (k - 1)}{Residual sum of squares / (n - l)} = \frac{{SS}_{g} / (k - 1)}{{SS}_{res} / (n - l)}$
Subgroups	$F_{2} = \frac{Between subgroups (within group) sum of squares / (l - k)}{Residual sum of squares / (n - l)} = \frac{{SS}_{s g} / (l - k)}{{SS}_{res} / (n - l)}$

If either

F

ratio exceeds

9999.0

, the value

9999.0

is assigned instead.

(viii)

f significances. The probability of obtaining a value from the appropriate

F

-distribution which exceeds the computed mean square ratio.

Groups	$p_{1} = Prob (F_{(k - 1), (n - l)} > F_{1})$
Subgroups	$p_{2} = Prob (F_{(l - k), (n - l)} > F_{2})$

where

F_{ν_{1}, ν_{2}}

denotes the central

F

-distribution with degrees of freedom

ν_{1}

and

ν_{2}

If any

F_{i} = 9999.0

, then

p_{i}

is set to zero,

i = 1, 2

References

Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin

Moore P G, Shirley E A and Edwards D E (1972) Standard Statistical Calculations Pitman

Parameters

Compulsory Input Parameters

1: $y (n)$ – double array

The elements of y must contain the observations

y_{mij}

in the following order:

y_{111}, y_{211}, \dots, y_{n_{11} 11}, y_{112}, y_{212}, \dots, y_{n_{12} 12}, \dots, y_{11 l_{1}}, \dots,

y_{n_{1 l_{1}} 1 l_{1}}, \dots, y_{1 i j}, \dots, y_{n_{i j} i j}, \dots, y_{1 k l_{k}}, \dots, y_{n_{k l_{k}} k l_{k}} .

In words, the ordering is by group, and within each group is by subgroup, the members of each subgroup being in consecutive locations in y.

2: $lsub (k)$ – int64int32nag_int array

The number of subgroups within group

i

l_{i}

, for

i = 1, 2, \dots, k

Constraint:

lsub (i) > 0

, for

i = 1, 2, \dots, k

3: $nobs (l)$ – int64int32nag_int array

The numbers of observations in each subgroup,

n_{i j}

, in the following order:

n_{11}, n_{12}, \dots, n_{1 l_{1}}, n_{21}, \dots, n_{2 l_{2}}, \dots, n_{k 1}, \dots, n_{k l_{k}}

Constraint:

n = \sum_{i = 1}^{k} \sum_{j = 1}^{l_{i}} n_{i j}

, that is

n = \sum_{i = 1}^{l} nobs (i)

and

nobs (i) > 0

, for

i = 1, 2, \dots, l

Optional Input Parameters

1: $n$ – int64int32nag_int scalar: Default: the dimension of the array y.
$n$ , the total number of observations.
2: $k$ – int64int32nag_int scalar: Default: the dimension of the array lsub.
$k$ , the number of groups.

Constraint: $k \geq 2$ .
3: $l$ – int64int32nag_int scalar: Default: the dimension of the array nobs.
$l$ , the total number of subgroups.

Constraint: $l = \sum_{i = 1}^{k} lsub (i)$ .

Output Parameters

1: $ngp (k)$ – int64int32nag_int array

The total number of observations in group

i

n_{i .}

, for

i = 1, 2, \dots, k

2: $gbar (k)$ – double array

The mean for group

i

{\bar{y}}_{. i .}

, for

i = 1, 2, \dots, k

3: $sgbar (l)$ – double array

The subgroup means,

{\bar{y}}_{. i j}

, in the following order:

{\bar{y}}_{.11}, {\bar{y}}_{.12}, \dots, {\bar{y}}_{{.1 l}_{1}}, {\bar{y}}_{.21}, {\bar{y}}_{.22}, \dots, {\bar{y}}_{{.2 l}_{2}}, \dots, {\bar{y}}_{. k 1}, {\bar{y}}_{. k 2}, \dots, {\bar{y}}_{{. k l}_{k}} .

4: $gm$ – double scalar

The grand mean,

{\bar{y}}_{\dots}

5: $ss (4)$ – double array

Contains the sums of squares for the analysis of variance, as follows;

$ss (1) =$ Between group sum of squares, ${ss}_{g}$ ,
$ss (2) =$ Between subgroup within groups sum of squares, ${ss}_{s g}$ ,
$ss (3) =$ Residual sum of squares, ${ss}_{res}$ ,
$ss (4) =$ Corrected total sum of squares, ${ss}_{tot}$ .

6: $idf (4)$ – int64int32nag_int array

Contains the degrees of freedom attributable to each sum of squares in the analysis of variance, as follows:

$idf (1) =$ Degrees of freedom for between group sum of squares,
$idf (2) =$ Degrees of freedom for between subgroup within groups sum of squares,
$idf (3) =$ Degrees of freedom for residual sum of squares,
$idf (4) =$ Degrees of freedom for corrected total sum of squares.

7: $f (2)$ – double array

Contains the mean square ratios,

F_{1}

and

F_{2}

, for the between groups variation, and the between subgroups within groups variation, with respect to the residual, respectively.

8: $fp (2)$ – double array

Contains the significances of the mean square ratios,

p_{1}

and

p_{2}

respectively.

9: $ifail$ – int64int32nag_int scalar

ifail = 0

unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

$ifail = 1$

On entry,

k \leq 1

$ifail = 2$

On entry,

lsub (i) \leq 0

, for some

i = 1, 2, \dots, k

$ifail = 3$

On entry,

l \neq \sum_{i = 1}^{k} lsub (i)

$ifail = 4$

On entry,

nobs (i) \leq 0

, for some

i = 1, 2, \dots, l

$ifail = 5$

On entry,

n \neq \sum_{i = 1}^{l} nobs (i)

W $ifail = 6$: The total corrected sum of squares is zero, indicating that all the data values are equal. The means returned are therefore all equal, and the sums of squares are zero. No assignments are made to idf, f, and fp.

W $ifail = 7$: The residual sum of squares is zero. This arises when either each subgroup contains exactly one observation, or the observations within each subgroup are equal. The means, sums of squares, and degrees of freedom are computed, but no assignments are made to f and fp.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.

$ifail = - 999$: Dynamic memory allocation failed.

Accuracy

The computations are believed to be stable.

Further Comments

The time taken by nag_anova_hier2 (g04ag) increases approximately linearly with the total number of observations,

n

Example

This example has two groups, the first of which consists of five subgroups, and the second of three subgroups. The numbers of observations in each subgroup are not equal. The data represent the percentage stretch in the length of samples of sack kraft drawn from consignments (subgroups) received over two years (groups). For details see Moore et al. (1972).

Open in the MATLAB editor: g04ag_example

function g04ag_example


fprintf('g04ag example results\n\n');

% Observations in two groups with 5 and 3 sub-groups respectively
y = [2.1     2.4     2       2       2   ...
     2.4     2.1     2.2     ...
     2.4     2.2     2.6     ...
     2.4     2.4     2.5     ...
     1.9     1.7     ...
     ...
     2.1     1.5     2       ...
     1.9     1.7     1.9     1.9     1.9  ...
     2       2.1     2.3];
k    = 2;
lsub = [int64(5);                 3];
nobs = [int64(5); 3; 3; 3; 2;     3;5;3];
n = sum(nobs);

fprintf('Data values\n\n Group  Subgroup  Observations\n');
nsub = 0;
nlo  = 1;
for i = 1:k
  for j = 1:lsub(i)
    nsub = nsub + 1;
    nhi = nlo + nobs(nsub) - 1;
    fprintf('%5d%9d   ', i, j);
    fprintf('%4.1f',y(nlo:nhi));
    fprintf('\n');
    nlo = nhi + 1;
  end
end

% Perform ANOVA
[ngp, gbar, sgbar, gm, ss, idf, f, fp, ifail] = ...
  g04ag(y, lsub, nobs);

% Display results
fprintf('\nSubgroup means\n\n');
fprintf('   Group  Subgroup  Mean\n');
ii = 0;
for i = 1:k
  for j = 1:lsub(i)
    ii = ii + 1;
    fprintf('%6d%8d%10.2f\n', i, j, sgbar(ii));
  end
end
fprintf('\n');
fprintf('%s%5.2f%s%2d%s\n', '    Group 1 mean =', gbar(1), ...
	'   (', ngp(1), ' observations)');
fprintf('%s%5.2f%s%2d%s\n', '    Group 2 mean =', gbar(2), ...
	'   (', ngp(2), ' observations)');
fprintf('%s%5.2f%s%2d%s\n', '    Grand mean   =', gm, ...
	'   (', n, ' observations)');
fprintf('\nAnalysis of variance table\n\n');
fprintf('   Source                SS    DF  F ratio  Sig\n\n');
fprintf('%s%6.3f%5d%7.2f%8.3f\n', 'Between groups        ', ...
	ss(1), idf(1), f(1), fp(1));
fprintf('%s%6.3f%5d%7.2f%8.3f\n', 'Between subgroups     ', ...
	ss(2), idf(2), f(2), fp(2));
fprintf('%s%6.3f%5d\n', 'Residual              ', ss(3), idf(3));
fprintf('\n%s%6.3f%5d\n', 'Total                 ', ss(4), idf(4));

g04ag example results

Data values

 Group  Subgroup  Observations
    1        1    2.1 2.4 2.0 2.0 2.0
    1        2    2.4 2.1 2.2
    1        3    2.4 2.2 2.6
    1        4    2.4 2.4 2.5
    1        5    1.9 1.7
    2        1    2.1 1.5 2.0
    2        2    1.9 1.7 1.9 1.9 1.9
    2        3    2.0 2.1 2.3

Subgroup means

   Group  Subgroup  Mean
     1       1      2.10
     1       2      2.23
     1       3      2.40
     1       4      2.43
     1       5      1.80
     2       1      1.87
     2       2      1.86
     2       3      2.13

    Group 1 mean = 2.21   (16 observations)
    Group 2 mean = 1.94   (11 observations)
    Grand mean   = 2.10   (27 observations)

Analysis of variance table

   Source                SS    DF  F ratio  Sig

Between groups         0.475    1  16.15   0.001
Between subgroups      0.816    6   4.63   0.005
Residual               0.559   19

Total                  1.850   26

PDF version (NAG web site, 64-bit version, 64-bit version)

Chapter Contents

Chapter Introduction

NAG Toolbox