void	g04gac (Nag_ICCModelType mtype, Nag_ICCReliabilityType rtype, Integer nrep, Integer nsubj, Integer nrater, const double score[], Nag_MissingType mscore, double smiss, double alpha, double icc, double lci, double uci, double fstat, double df1, double df2, double pvalue, NagError fail)

The function may be called by the names: g04gac or nag_anova_icc.

3 Description

Many scientific investigations involve assigning a value (score) to a number of objects of interest (subjects). In most instances the method used to score the subject will be affected by measurement error which can affect the analysis and interpretation of the data. When the score is based on the subjective opinion of one or more individuals (raters) the measurement error can be high and, therefore, it is important to be able to assess its magnitude. One way of doing this is to run a reliability study and calculate the intraclass correlation (ICC).

In a typical reliability study each of a random sample of

n_{s}

subjects are scored, independently, by

n_{r}

raters. Each rater scores the same subject

m

times (i.e., there are

m

replicate scores). The scores,

y_{i j k}

, for

i = 1, 2, \dots, n_{s}

j = 1, 2, \dots, n_{r}

and

k = 1, 2, \dots, m

can be arranged into

m

data tables, with the

n_{s}

rows of the table, labelled

1, 2, \dots, n_{s}

, corresponding to the subjects and the

n_{r}

columns of the table, labelled

1, 2, \dots, n_{r}

, to the raters. For example the following data, taken from Shrout and Fleiss (1979), shows a typical situation where four raters (

n_{r} = 4

) have scored six subjects (

n_{s} = 6

) once, i.e., there has been no replication (

m = 1

	Rater
Subject	$1$	$2$	$3$	$4$
$1$	$9$	$2$	$5$	$8$
$2$	$6$	$1$	$3$	$2$
$3$	$8$	$4$	$6$	$8$
$4$	$7$	$1$	$2$	$6$
$5$	$10$	$5$	$6$	$9$
$6$	$6$	$2$	$4$	$7$

The term intraclass correlation is a general one and can mean either a measure of interrater reliability, i.e., a measure of how similar the raters are, or intrarater reliability, i.e., a measure of how consistent each rater is.

There are a numerous different versions of the ICC, six of which can be calculated using g04gac. The different versions of the ICC can lead to different conclusions when applied to the same data, it is, therefore, essential to choose the most appropriate based on the design of the reliability study and whether inter- or intrarater reliability is of interest. The six measures of the ICC are split into three different types of studies, denoted:

ICC (1, 1)

ICC (2, 1)

and

ICC (3, 1)

. This notation ties up with that used by Shrout and Fleiss (1979). Each class of study results in two forms of the ICC, depending on whether inter- or intrarater reliability is of interest.

3.1 $ICC (1, 1)$ : One-Factor Design

The one-factor designs differ, depending on whether inter- or intrarater reliability is of interest:

3.1.1 Interrater reliability

In a one-factor design to measure interrater reliability, each subject is scored by a different set of raters randomly selected from a larger population of raters. Therefore, even though they use the same set of labels each row of the data table is associated with a different set of raters.

A model of the following form is assumed:

y_{i j k} = μ + s_{i} + ε_{i j k}

where

s_{i}

is the subject effect and

ε_{i j k}

is the error term, with

s_{i} \sim N (0, σ_{s}^{2})

and

ε_{i j k} \sim N (0, σ_{ε}^{2})

The measure of the interrater reliability,

ρ

, is then given by:

ρ = \frac{{\hat{σ}}_{s}^{2}}{{\hat{σ}}_{s}^{2} + {\hat{σ}}_{ε}^{2}}

where

{\hat{σ}}_{s}

and

{\hat{σ}}_{ε}

are the estimated values of

σ_{s}

and

σ_{ε}

respectively.

3.1.2 Intrarater reliability

In a one-factor design to measure intrarater reliability, each rater scores a different set of subjects. Therefore, even though they use the same set of labels, each column of the data table is associated with a different set of subjects.

A model of the following form is assumed:

y_{i j k} = μ + r_{j} + ε_{i j k}

where

r_{i}

is the rater effect and

ε_{i j k}

is the error term, with

r_{j} \sim N (0, σ_{r}^{2})

and

ε_{i j k} \sim N (0, σ_{ε}^{2})

The measure of the intrarater reliability,

γ

, is then given by:

γ = \frac{{\hat{σ}}_{r}^{2}}{{\hat{σ}}_{r}^{2} + {\hat{σ}}_{ε}^{2}}

where

{\hat{σ}}_{r}

and

{\hat{σ}}_{ε}

are the estimated values of

σ_{r}

and

σ_{ε}

respectively.

3.2 $ICC (2, 1)$ : Random Factorial Design

In a random factorial design, each subject is scored by the same set of raters. The set of raters have been randomly selected from a larger population of raters.

A model of the following form is assumed:

y_{i j k} = μ + s_{i} + r_{j} + {(s r)}_{i j} + ε_{i j k}

where

s_{i}

is the subject effect,

r_{i}

is the rater effect,

{(s r)}_{i j}

is the subject-rater interaction effect and

ε_{i j k}

is the error term, with

s_{i} \sim N (0, σ_{s}^{2})

r_{j} \sim N (0, σ_{r}^{2})

{(s r)}_{i j} \sim N (0, σ_{s r}^{2})

and

ε_{i j k} \sim N (0, σ_{ε}^{2})

3.2.1 Interrater reliability

The measure of the interrater reliability,

ρ

, is given by:

ρ = \frac{{\hat{σ}}_{s}^{2}}{{\hat{σ}}_{s}^{2} + {\hat{σ}}_{r}^{2} + {\hat{σ}}_{s r}^{2} + {\hat{σ}}_{ε}^{2}}

where

{\hat{σ}}_{s}

{\hat{σ}}_{r}

{\hat{σ}}_{s r}

and

{\hat{σ}}_{ε}

are the estimated values of

σ_{s}

σ_{r}

σ_{s r}

and

σ_{ε}

respectively.

3.2.2 Intrarater reliability

The measure of the intrarater reliability,

γ

, is given by:

γ = \frac{{\hat{σ}}_{r}^{2}}{{\hat{σ}}_{s}^{2} + {\hat{σ}}_{r}^{2} + {\hat{σ}}_{s r}^{2} + {\hat{σ}}_{ε}^{2}}

where

{\hat{σ}}_{s}

{\hat{σ}}_{r}

{\hat{σ}}_{s r}

and

{\hat{σ}}_{ε}

are the estimated values of

σ_{s}

σ_{r}

σ_{s r}

and

σ_{ε}

respectively.

3.3 $ICC (3, 1)$ : Mixed Factorial Design

In a mixed factorial design, each subject is scored by the same set of raters and these are the only raters of interest.

A model of the following form is assumed:

y_{i j k} = μ + s_{i} + r_{j} + {(s r)}_{i j} + ε_{i j k}

where

s_{i}

is the subject effect,

r_{i}

is the fixed rater effect,

{(s r)}_{i j}

is the subject-rater interaction effect and

ε_{i j k}

is the error term, with

s_{i} \sim N (0, σ_{s}^{2})

Σ_{j = 1}^{n_{r}} r_{j} = 0

{(s r)}_{i j} \sim N (0, σ_{s r}^{2})

Σ_{j = 1}^{n_{r}} {(s r)}_{i j} = 0

and

ε_{i j k} \sim N (0, σ_{ε}^{2})

3.3.1 Interrater reliability

The measure of the interrater reliability,

ρ

, is then given by:

ρ = \frac{{\hat{σ}}_{s}^{2} - {\hat{σ}}_{s r}^{2} / (r - 1)}{{\hat{σ}}_{s}^{2} + {\hat{σ}}_{s r}^{2} + {\hat{σ}}_{ε}^{2}}

where

{\hat{σ}}_{s}

{\hat{σ}}_{s r}

and

{\hat{σ}}_{ε}

are the estimated values of

σ_{s}

σ_{s r}

and

σ_{ε}

respectively.

3.3.2 Intrarater reliability

The measure of the intrarater reliability,

γ

, is then given by:

γ = \frac{{\hat{σ}}_{s}^{2} + {\hat{σ}}_{s r}^{2}}{{\hat{σ}}_{s}^{2} + {\hat{σ}}_{s r}^{2} + {\hat{σ}}_{ε}^{2}}

where

{\hat{σ}}_{s}

{\hat{σ}}_{s r}

and

{\hat{σ}}_{ε}

are the estimated values of

σ_{s}

σ_{s r}

and

σ_{ε}

respectively.

As well as an estimate of the ICC, g04gac returns an approximate

(1 - α) %

confidence interval for the ICC and an

F

-statistic,

f

, associated degrees of freedom (

ν_{1}

and

ν_{2}

) and p-value,

p

, for testing that the ICC is zero.

Details on the formula used to calculate the confidence interval,

f

ν_{1}

ν_{2}

{\hat{σ}}_{s}^{2}

{\hat{σ}}_{r}^{2}

{\hat{σ}}_{s r}^{2}

and

{\hat{σ}}_{ε}^{2}

are given in Gwet (2014). In the case where there are no missing data these should tie up with the formula presented in Shrout and Fleiss (1979).

In some circumstances, the formula presented in Gwet (2014) for calculating

{\hat{σ}}_{s}^{2}

{\hat{σ}}_{r}^{2}

{\hat{σ}}_{s r}^{2}

and

{\hat{σ}}_{ε}^{2}

can result in a negative value being calculated. In such instances,

fail . code =

NW_POTENTIAL_PROBLEM, the offending estimate is set to zero and the calculations continue as normal.

It should be noted that Shrout and Fleiss (1979) also present methods for calculating the ICC based on average scores, denoted

ICC (1, k)

ICC (2, k)

and

ICC (3, k)

. These are not supplied here as multiple replications are allowed (

m > 1

) hence there is no need to average the scores prior to calculating ICC when using g04gac.

4 References

Gwet K L (2014) Handbook of Inter-rater Reliability Fourth Edition Advanced Analytics LLC

Shrout P E and Fleiss J L (1979) Intraclass Correlations: Uses in Assessing Rater Reliability Pyschological Bulletin, Vol 86 2 420–428

5 Arguments

1: $mtype$ – Nag_ICCModelType Input

On entry: indicates which model is to be used.

$mtype = Nag_ICC_1$: The reliability study is a one-factor design, $ICC (1, 1)$ .
$mtype = Nag_ICC_2$: The reliability study is a random factorial design, $ICC (2, 1)$ .
$mtype = Nag_ICC_3$: The reliability study is a mixed factorial design, $ICC (3, 1)$ .

Constraint:

mtype = Nag_ICC_1

Nag_ICC_2

Nag_ICC_3

2: $rtype$ – Nag_ICCReliabilityType Input

On entry: indicates which type of reliability is required.

$rtype = Nag_Inter$: Interrater reliability is required.
$rtype = Nag_Intra$: Intrarater reliability is required.

Constraint:

rtype = Nag_Inter

Nag_Intra

3: $nrep$ – Integer Input

On entry:

m

, the number of replicates.

Constraints:

if $mtype = Nag_ICC_2$ or $Nag_ICC_3$ and $rtype = Nag_Intra$ , $nrep \geq 2$ ;
otherwise $nrep \geq 1$ .

4: $nsubj$ – Integer Input

On entry:

n_{s}

, the number of subjects.

Constraint:

nsubj \geq 2

5: $nrater$ – Integer Input

On entry:

n_{r}

, the number of raters.

Constraint:

nrater \geq 2

6: $score [\dim]$ – const double Input

Note: the dimension, dim, of the array score must be at least

nrep \times nsubj \times nrater

where

SCORE (k, i, j)

appears in this document, it refers to the array element

score [(j - 1) \times nrep \times nsubj + (i - 1) \times nrep + k - 1]

On entry: the matrix of scores, with

SCORE (k, i, j)

being the score given to the

i

th subject by the

j

th rater in the

k

th replicate.

If rater

j

did not rate subject

i

at replication

k

, the corresponding element of score,

SCORE (k, i, j)

, should be set to smiss.

7: $mscore$ – Nag_MissingType Input

On entry: indicates how missing scores are handled.

$mscore = Nag_NoMissing$: There are no missing scores.
$mscore = Nag_DropMissing$: Missing scores in score have been set to smiss.

Constraint:

mscore = Nag_NoMissing

Nag_DropMissing

8: $smiss$ – double Input

On entry: the value used to indicate a missing score.

If $mscore = Nag_NoMissing$ , smiss is not referenced and need not be set.
If $mscore = Nag_DropMissing$ , the value used to indicate a missing score.

Care should be taken in the selection of the value used to indicate a missing score. g04gac will treat any score in the inclusive range

(1 \pm {0.1}^{(nag_decimal_digits - 2)}) \times smiss

as missing. Alternatively, a NaN (Not A Number) can be used to indicate missing values, in which case the value of smiss and any missing values of score can be set through a call to x07bbc.

9: $alpha$ – double Input

On entry:

α

, the significance level used in the construction of the confidence intervals for icc.

Constraint:

0.0 < alpha < 1.0

10: $icc$ – double * Output

On exit: an estimate of the intraclass correlation to measure either the interrater reliability,

ρ

, or intrarater reliability,

γ

, as specified by mtype and rtype.

11: $lci$ – double * Output

On exit: an approximate lower limit for the

100 (1 - α) %

confidence interval for the ICC.

12: $uci$ – double * Output

On exit: an approximate upper limit for the

100 (1 - α %)

confidence interval for the ICC.

In some circumstances it is possible for the estimate of the intraclass correlation to fall outside the region of the approximate confidence intervals. In these cases g04gac returns all calculated values, but raises the warning

fail . code =

NW_POTENTIAL_PROBLEM.

13: $fstat$ – double * Output

On exit:

f

, the

F

-statistic associated with icc.

14: $df1$ – double * Output

15: $df2$ – double * Output

On exit:

ν_{1}

and

ν_{2}

, the degrees of freedom associated with

f

16: $pvalue$ – double * Output

On exit:

P (F \geq f : ν_{1}, ν_{1})

, the upper tail probability from an

F

distribution.

17: $fail$ – NagError * Input/Output

The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_ALLOC_FAIL: Dynamic memory allocation failed.
See Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
NE_BAD_PARAM: On entry, argument $⟨ value ⟩$ had an illegal value.
NE_DEGENERATE: On entry, after adjusting for missing data, $nrater = ⟨ value ⟩$ .
Constraint: $nrater \geq 2$ .

On entry, after adjusting for missing data, $nrep = ⟨ value ⟩$ .
Constraint: $nrep \geq 1$ .

On entry, after adjusting for missing data, $nrep = ⟨ value ⟩$ .
Constraint: when $mtype = Nag_ICC_2$ or $Nag_ICC_3$ and $rtype = Nag_Intra$ , $nrep \geq 2$ .

On entry, after adjusting for missing data, $nsubj = ⟨ value ⟩$ .
Constraint: $nsubj \geq 2$ .

Unable to calculate the ICC due to a division by zero.
This is often due to degenerate data, for example all scores being the same.
NE_INT: On entry, $nrater = ⟨ value ⟩$ .
Constraint: $nrater \geq 2$ .

On entry, $nrep = ⟨ value ⟩$ .
Constraint: $nrep \geq 1$ .

On entry, $nrep = ⟨ value ⟩$ .
Constraint: when $mtype = Nag_ICC_2$ or $Nag_ICC_3$ and $rtype = Nag_Intra$ , $nrep \geq 2$ .

On entry, $nsubj = ⟨ value ⟩$ .
Constraint: $nsubj \geq 2$ .
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
NE_NO_LICENCE: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library CL Interface for further information.
NE_REAL: On entry, $alpha = ⟨ value ⟩$ .
alpha is too close to either zero or one.
This error is unlikely to occur.

On entry, $alpha = ⟨ value ⟩$ .
Constraint: $0.0 < alpha < 1.0$ .
NW_POTENTIAL_PROBLEM: icc does not fall into the interval $[lci, uci]$ .
All output quantities have been calculated.

On entry, a replicate, subject or rater contained all missing data.
All output quantities have been calculated using the reduced problem size.

The estimate of at least one variance component was negative.
Negative estimates were set to zero and all output quantities calculated as documented.

7 Accuracy

Not applicable.

8 Parallelism and Performance

g04gac is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.

g04gac makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

None.

10 Example

This example calculates and displays the measure of interrater reliability,

ρ

, for a one-factor design,

ICC (1, 1)

. In addition the

95 %

confidence interval,

F

-statistic, degrees of freedom and p-value are presented.

The data is taken from table 2 of Shrout and Fleiss (1979), which has four raters scoring six subjects.

g04ga: FL CL CPP AD

NAG CL Interfaceg04gac (icc)

▸▿ Contents

1 Purpose

2 Specification

3 Description

3.1 ICC(1,1): One-Factor Design

3.1.1 Interrater reliability

3.1.2 Intrarater reliability

3.2 ICC(2,1): Random Factorial Design

3.2.1 Interrater reliability

3.2.2 Intrarater reliability

3.3 ICC(3,1): Mixed Factorial Design

3.3.1 Interrater reliability

3.3.2 Intrarater reliability

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG CL Interface
g04gac (icc)

3.1 $ICC (1, 1)$ : One-Factor Design

3.2 $ICC (2, 1)$ : Random Factorial Design

3.3 $ICC (3, 1)$ : Mixed Factorial Design