NAG Library Routine Document
g04gaf (icc)
1
Purpose
g04gaf calculates the intraclass correlation (ICC).
2
Specification
Fortran Interface
Subroutine g04gaf ( 
mtype, rtype, nrep, nsubj, nrater, score, mscore, smiss, alpha, icc, lci, uci, fstat, df1, df2, pvalue, ifail) 
Integer, Intent (In)  ::  mtype, rtype, nrep, nsubj, nrater, mscore  Integer, Intent (Inout)  ::  ifail  Real (Kind=nag_wp), Intent (In)  ::  score(nrep,nsubj,nrater), smiss, alpha  Real (Kind=nag_wp), Intent (Out)  ::  icc, lci, uci, fstat, df1, df2, pvalue 

C Header Interface
#include nagmk26.h
void 
g04gaf_ (const Integer *mtype, const Integer *rtype, const Integer *nrep, const Integer *nsubj, const Integer *nrater, const double score[], const Integer *mscore, const double *smiss, const double *alpha, double *icc, double *lci, double *uci, double *fstat, double *df1, double *df2, double *pvalue, Integer *ifail) 

3
Description
Many scientific investigations involve assigning a value (score) to a number of objects of interest (subjects). In most instances the method used to score the subject will be affected by measurement error which can affect the analysis and interpretation of the data. When the score is based on the subjective opinion of one or more individuals (raters) the measurement error can be high and therefore it is important to be able to assess its magnitude. One way of doing this is to run a reliability study and calculate the intraclass correlation (ICC).
In a typical reliability study each of a random sample of
${n}_{s}$ subjects are scored, independently, by
${n}_{r}$ raters. Each rater scores the same subject
$m$ times (i.e., there are
$m$ replicate scores). The scores,
${y}_{\mathit{i}\mathit{j}\mathit{k}}$, for
$\mathit{i}=1,2,\dots ,{n}_{s}$,
$\mathit{j}=1,2,\dots ,{n}_{r}$ and
$\mathit{k}=1,2,\dots ,m$ can be arranged into
$m$ data tables, with the
${n}_{s}$ rows of the table, labelled
$1,2,\dots ,{n}_{s}$, corresponding to the subjects and the
${n}_{r}$ columns of the table, labelled
$1,2,\dots ,{n}_{r}$, to the raters. For example the following data, taken from
Shrout and Fleiss (1979), shows a typical situation where four raters (
${n}_{r}=4$) have scored six subjects (
${n}_{s}=6$) once, i.e., there has been no replication (
$m=1$).
 Rater 
Subject  $1$  $2$  $3$  $4$ 
$1$  $9$  $2$  $5$  $8$ 
$2$  $6$  $1$  $3$  $2$ 
$3$  $8$  $4$  $6$  $8$ 
$4$  $7$  $1$  $2$  $6$ 
$5$  $10$  $5$  $6$  $9$ 
$6$  $6$  $2$  $4$  $7$ 
The term intraclass correlation is a general one and can mean either a measure of interrater reliability, i.e., a measure of how similar the raters are, or intrarater reliability, i.e., a measure of how consistent each rater is.
There are a numerous different versions of the ICC, six of which can be calculated using
g04gaf. The different versions of the ICC can lead to different conclusions when applied to the same data, it is therefore essential to choose the most appropriate based on the design of the reliability study and whether inter or intrarater reliability is of interest. The six measures of the ICC are split into three different types of studies, denoted:
$\text{ICC}\left(1,1\right)$,
$\text{ICC}\left(2,1\right)$ and
$\text{ICC}\left(3,1\right)$. This notation ties up with that used by
Shrout and Fleiss (1979). Each class of study results in two forms of the ICC, depending on whether inter or intrarater reliability is of interest.
3.1
$\text{ICC}\left(1,1\right)$: OneFactor Design
The onefactor designs differ, depending on whether inter or intrarater reliability is of interest:
3.1.1
Interrater reliability
In a onefactor design to measure interrater reliability, each subject is scored by a different set of raters randomly selected from a larger population of raters. Therefore, even though they use the same set of labels each row of the data table is associated with a different set of raters.
A model of the following form is assumed:
where
${s}_{i}$ is the subject effect and
${\epsilon}_{ijk}$ is the error term, with
${s}_{i}\sim N\left(0,{\sigma}_{s}^{2}\right)$ and
${\epsilon}_{ijk}\sim N\left(0,{\sigma}_{\epsilon}^{2}\right)$.
The measure of the interrater reliability,
$\rho $, is then given by:
where
${\hat{\sigma}}_{s}$ and
${\hat{\sigma}}_{\epsilon}$ are the estimated values of
${\sigma}_{s}$ and
${\sigma}_{\epsilon}$ respectively.
3.1.2
Intrarater reliability
In a onefactor design to measure intrarater reliability, each rater scores a different set of subjects. Therefore, even though they use the same set of labels, each column of the data table is associated with a different set of subjects.
A model of the following form is assumed:
where
${r}_{i}$ is the rater effect and
${\epsilon}_{ijk}$ is the error term, with
${r}_{j}\sim N\left(0,{\sigma}_{r}^{2}\right)$ and
${\epsilon}_{ijk}\sim N\left(0,{\sigma}_{\epsilon}^{2}\right)$.
The measure of the intrarater reliability,
$\gamma $, is then given by:
where
${\hat{\sigma}}_{r}$ and
${\hat{\sigma}}_{\epsilon}$ are the estimated values of
${\sigma}_{r}$ and
${\sigma}_{\epsilon}$ respectively.
3.2
$\text{ICC}\left(2,1\right)$: Random Factorial Design
In a random factorial design, each subject is scored by the same set of raters. The set of raters have been randomly selected from a larger population of raters.
A model of the following form is assumed:
where
${s}_{i}$ is the subject effect,
${r}_{i}$ is the rater effect,
${\left(sr\right)}_{ij}$ is the subjectrater interaction effect and
${\epsilon}_{ijk}$ is the error term, with
${s}_{i}\sim N\left(0,{\sigma}_{s}^{2}\right)$,
${r}_{j}\sim N\left(0,{\sigma}_{r}^{2}\right)$,
${\left(sr\right)}_{ij}\sim N\left(0,{\sigma}_{sr}^{2}\right)$ and
${\epsilon}_{ijk}\sim N\left(0,{\sigma}_{\epsilon}^{2}\right)$.
3.2.1
Interrater reliability
The measure of the interrater reliability,
$\rho $, is given by:
where
${\hat{\sigma}}_{s}$,
${\hat{\sigma}}_{r}$,
${\hat{\sigma}}_{sr}$ and
${\hat{\sigma}}_{\epsilon}$ are the estimated values of
${\sigma}_{s}$,
${\sigma}_{r}$,
${\sigma}_{sr}$ and
${\sigma}_{\epsilon}$ respectively.
3.2.2
Intrarater reliability
The measure of the intrarater reliability,
$\gamma $, is given by:
where
${\hat{\sigma}}_{s}$,
${\hat{\sigma}}_{r}$,
${\hat{\sigma}}_{sr}$ and
${\hat{\sigma}}_{\epsilon}$ are the estimated values of
${\sigma}_{s}$,
${\sigma}_{r}$,
${\sigma}_{sr}$ and
${\sigma}_{\epsilon}$ respectively.
3.3
$\text{ICC}\left(3,1\right)$: Mixed Factorial Design
In a mixed factorial design, each subject is scored by the same set of raters and these are the only raters of interest.
A model of the following form is assumed:
where
${s}_{i}$ is the subject effect,
${r}_{i}$ is the fixed rater effect,
${\left(sr\right)}_{ij}$ is the subjectrater interaction effect and
${\epsilon}_{ijk}$ is the error term, with
${s}_{i}\sim N\left(0,{\sigma}_{s}^{2}\right)$,
${\Sigma}_{j=1}^{{n}_{r}}{r}_{j}=0$,
${\left(sr\right)}_{ij}\sim N\left(0,{\sigma}_{sr}^{2}\right)$,
${\Sigma}_{j=1}^{{n}_{r}}{\left(sr\right)}_{ij}=0$ and
${\epsilon}_{ijk}\sim N\left(0,{\sigma}_{\epsilon}^{2}\right)$.
3.3.1
Interrater reliability
The measure of the interrater reliability,
$\rho $, is then given by:
where
${\hat{\sigma}}_{s}$,
${\hat{\sigma}}_{sr}$ and
${\hat{\sigma}}_{\epsilon}$ are the estimated values of
${\sigma}_{s}$,
${\sigma}_{sr}$ and
${\sigma}_{\epsilon}$ respectively.
3.3.2
Intrarater reliability
The measure of the intrarater reliability,
$\gamma $, is then given by:
where
${\hat{\sigma}}_{s}$,
${\hat{\sigma}}_{sr}$ and
${\hat{\sigma}}_{\epsilon}$ are the estimated values of
${\sigma}_{s}$,
${\sigma}_{sr}$ and
${\sigma}_{\epsilon}$ respectively.
As well as an estimate of the ICC, g04gaf returns an approximate $\left(1\alpha \right)\%$ confidence interval for the ICC and an $F$statistic, $f$, associated degrees of freedom (${\nu}_{1}$ and ${\nu}_{2}$) and pvalue, $p$, for testing that the ICC is zero.
Details on the formula used to calculate the confidence interval,
$f$,
${\nu}_{1}$,
${\nu}_{2}$,
${\hat{\sigma}}_{s}^{2}$,
${\hat{\sigma}}_{r}^{2}$,
${\hat{\sigma}}_{sr}^{2}$ and
${\hat{\sigma}}_{\epsilon}^{2}$ are given in
Gwet (2014). In the case where there are no missing data these should tie up with the formula presented in
Shrout and Fleiss (1979).
In some circumstances, the formula presented in
Gwet (2014) for calculating
${\hat{\sigma}}_{s}^{2}$,
${\hat{\sigma}}_{r}^{2}$,
${\hat{\sigma}}_{sr}^{2}$ and
${\hat{\sigma}}_{\epsilon}^{2}$ can result in a negative value being calculated. In such instances,
${\mathbf{ifail}}={\mathbf{102}}$, the offending estimate is set to zero and the calculations continue as normal.
It should be noted that
Shrout and Fleiss (1979) also present methods for calculating the ICC based on average scores, denoted
$\text{ICC}\left(1,k\right)$,
$\text{ICC}\left(2,k\right)$ and
$\text{ICC}\left(3,k\right)$. These are not supplied here as multiple replications are allowed (
$m>1$) hence there is no need to average the scores prior to calculating ICC when using
g04gaf.
4
References
Gwet K L (2014) Handbook of Interrater Reliability Fourth Edition Advanced Analytics LLC
Shrout P E and Fleiss J L (1979) Intraclass Correlations: Uses in Assessing Rater Reliability Pyschological Bulletin, Vol 86 2 420–428
5
Arguments
 1: $\mathbf{mtype}$ – IntegerInput

On entry: indicates which model is to be used.
 ${\mathbf{mtype}}=1$
 The reliability study is a onefactor design, $\text{ICC}\left(1,1\right)$.
 ${\mathbf{mtype}}=2$
 The reliability study is a random factorial design, $\text{ICC}\left(2,1\right)$.
 ${\mathbf{mtype}}=3$
 The reliability study is a mixed factorial design, $\text{ICC}\left(3,1\right)$.
Constraint:
${\mathbf{mtype}}=1$, $2$ or $3$.
 2: $\mathbf{rtype}$ – IntegerInput

On entry: indicates which type of reliability is required.
 ${\mathbf{rtype}}=1$
 Interrater reliability is required.
 ${\mathbf{rtype}}=2$
 Intrarater reliability is required.
Constraint:
${\mathbf{rtype}}=1$ or $2$.
 3: $\mathbf{nrep}$ – IntegerInput

On entry: $m$, the number of replicates.
Constraints:
 if ${\mathbf{mtype}}=2$ or $3$ and ${\mathbf{rtype}}=2$, ${\mathbf{nrep}}\ge 2$;
 otherwise ${\mathbf{nrep}}\ge 1$.
 4: $\mathbf{nsubj}$ – IntegerInput

On entry: ${n}_{s}$, the number of subjects.
Constraint:
${\mathbf{nsubj}}\ge 2$.
 5: $\mathbf{nrater}$ – IntegerInput

On entry: ${n}_{r}$, the number of raters.
Constraint:
${\mathbf{nrater}}\ge 2$.
 6: $\mathbf{score}\left({\mathbf{nrep}},{\mathbf{nsubj}},{\mathbf{nrater}}\right)$ – Real (Kind=nag_wp) arrayInput

On entry: the matrix of scores, with
${\mathbf{score}}\left(k,i,j\right)$ being the score given to the
$i$th subject by the
$j$th rater in the
$k$th replicate.
If rater
$j$ did not rate subject
$i$ at replication
$k$, the corresponding element of
score,
${\mathbf{score}}\left(k,i,j\right)$, should be set to
smiss.
 7: $\mathbf{mscore}$ – IntegerInput

On entry: indicates how missing scores are handled.
 ${\mathbf{mscore}}=1$
 There are no missing scores.
 ${\mathbf{mscore}}=2$
 Missing scores in score have been set to smiss.
Constraint:
${\mathbf{mscore}}=1$ or $2$.
 8: $\mathbf{smiss}$ – Real (Kind=nag_wp)Input

On entry: the value used to indicate a missing score.
 If ${\mathbf{mscore}}=1$, smiss is not referenced and need not be set.
 If ${\mathbf{mscore}}=2$, care should be taken in the selection of smiss, the value used to indicate a missing score. g04gaf will treat any score in the inclusive range $\left(1\pm {0.1}^{\left({\mathbf{x02bef}}2\right)}\right)\times {\mathbf{smiss}}$ as missing. Alternatively, a NaN (Not A Number) can be used to indicate missing values, in which case the value of smiss and any missing values of score can be set through a call to x07bbf.
 9: $\mathbf{alpha}$ – Real (Kind=nag_wp)Input

On entry:
$\alpha $, the significance level used in the construction of the confidence intervals for
icc.
Constraint:
$0<{\mathbf{alpha}}<1$.
 10: $\mathbf{icc}$ – Real (Kind=nag_wp)Output

On exit: an estimate of the intraclass correlation to measure either the interrater reliability,
$\rho $, or intrarater reliability,
$\gamma $, as specified by
mtype and
rtype.
 11: $\mathbf{lci}$ – Real (Kind=nag_wp)Output

On exit: an approximate lower limit for the $100\left(1\alpha \right)\%$ confidence interval for the ICC.
 12: $\mathbf{uci}$ – Real (Kind=nag_wp)Output

On exit: an approximate upper limit for the $100\left(1\alpha \%\right)$ confidence interval for the ICC.
In some circumstances it is possible for the estimate of the intraclass correlation to fall outside the region of the approximate confidence intervals. In these cases g04gaf returns all calculated values, but raises the warning ${\mathbf{ifail}}={\mathbf{101}}$.
 13: $\mathbf{fstat}$ – Real (Kind=nag_wp)Output

On exit:
$f$, the
$F$statistic associated with
icc.
 14: $\mathbf{df1}$ – Real (Kind=nag_wp)Output
 15: $\mathbf{df2}$ – Real (Kind=nag_wp)Output

On exit: ${\nu}_{1}$ and ${\nu}_{2}$, the degrees of freedom associated with $f$.
 16: $\mathbf{pvalue}$ – Real (Kind=nag_wp)Output

On exit: $P\left(F\ge f:{\nu}_{1},{\nu}_{1}\right)$, the upper tail probability from an $F$ distribution.
 17: $\mathbf{ifail}$ – IntegerInput/Output

On entry:
ifail must be set to
$0$,
$1\text{ or}1$. If you are unfamiliar with this argument you should refer to
Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{ or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, if you are not familiar with this argument, the recommended value is
$0$.
When the value $\mathbf{1}\text{ or}\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit:
${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
${\mathbf{ifail}}=0$ or
$1$, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
 ${\mathbf{ifail}}=11$

On entry, ${\mathbf{mtype}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{mtype}}=1$, $2$ or $3$.
 ${\mathbf{ifail}}=21$

On entry, ${\mathbf{rtype}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{rtype}}=1$ or $2$.
 ${\mathbf{ifail}}=31$

On entry, ${\mathbf{nrep}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{nrep}}\ge 1$.
 ${\mathbf{ifail}}=32$

On entry, ${\mathbf{nrep}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: when ${\mathbf{mtype}}=2$ or $3$ and ${\mathbf{rtype}}=2$, ${\mathbf{nrep}}\ge 2$.
 ${\mathbf{ifail}}=33$

On entry, after adjusting for missing data, ${\mathbf{nrep}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{nrep}}\ge 1$.
 ${\mathbf{ifail}}=34$

On entry, after adjusting for missing data, ${\mathbf{nrep}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: when ${\mathbf{mtype}}=2$ or $3$ and ${\mathbf{rtype}}=2$, ${\mathbf{nrep}}\ge 2$.
 ${\mathbf{ifail}}=41$

On entry, ${\mathbf{nsubj}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{nsubj}}\ge 2$.
 ${\mathbf{ifail}}=42$

On entry, after adjusting for missing data, ${\mathbf{nsubj}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{nsubj}}\ge 2$.
 ${\mathbf{ifail}}=51$

On entry, ${\mathbf{nrater}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{nrater}}\ge 2$.
 ${\mathbf{ifail}}=52$

On entry, after adjusting for missing data, ${\mathbf{nrater}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{nrater}}\ge 2$.
 ${\mathbf{ifail}}=61$

Unable to calculate the ICC due to a division by zero.
This is often due to degenerate data, for example all scores being the same.
 ${\mathbf{ifail}}=62$

On entry, a replicate, subject or rater contained all missing data.
All output quantities have been calculated using the reduced problem size.
 ${\mathbf{ifail}}=71$

On entry, ${\mathbf{mscore}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{mscore}}=1$ or $2$.
 ${\mathbf{ifail}}=91$

On entry, ${\mathbf{alpha}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: $0<{\mathbf{alpha}}<1$.
 ${\mathbf{ifail}}=92$

On entry,
${\mathbf{alpha}}=\u2329\mathit{\text{value}}\u232a$.
alpha is too close to either zero or one.
This error is unlikely to occur.
 ${\mathbf{ifail}}=101$

icc does not fall into the interval
$\left[{\mathbf{lci}},{\mathbf{uci}}\right]$.
All output quantities have been calculated.
 ${\mathbf{ifail}}=102$

The estimate of at least one variance component was negative.
Negative estimates were set to zero and all output quantities calculated as documented.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 3.9 in How to Use the NAG Library and its Documentation for further information.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
See
Section 3.8 in How to Use the NAG Library and its Documentation for further information.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
See
Section 3.7 in How to Use the NAG Library and its Documentation for further information.
7
Accuracy
Not applicable.
8
Parallelism and Performance
g04gaf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g04gaf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementationspecific information.
None.
10
Example
This example calculates and displays the measure of interrater reliability, $\rho $, for a onefactor design, $\text{ICC}\left(1,1\right)$. In addition the $95\%$ confidence interval, $F$statistic, degrees of freedom and pvalue are presented.
The data is taken from table 2 of
Shrout and Fleiss (1979), which has four raters scoring six subjects.
10.1
Program Text
Program Text (g04gafe.f90)
10.2
Program Data
Program Data (g04gafe.d)
10.3
Program Results
Program Results (g04gafe.r)