naginterfaces.library.anova.icc

naginterfaces.library.anova.icc(mtype, score, rtype=1, smiss=None, alpha=0.05)[source]

icc calculates the intraclass correlation (ICC).

For full information please refer to the NAG Library document for g04ga

https://support.nag.com/numeric/nl/nagdoc_30/flhtml/g04/g04gaf.html

Parameters
mtypeint

Indicates which model is to be used.

The reliability study is a one-factor design, .

The reliability study is a random factorial design, .

The reliability study is a mixed factorial design, .

scorefloat, array-like, shape

The matrix of scores, with being the score given to the th subject by the th rater in the th replicate.

If rater did not rate subject at replication , the corresponding element of , , should be set to .

rtypeint, optional

Indicates which type of reliability is required.

Interrater reliability is required.

Intrarater reliability is required.

smissNone or float, optional

The value used to indicate a missing score.

Care should be taken in the selection of the value used to indicate a missing score. icc will treat any score in the inclusive range as missing.

Alternatively, a NaN (Not A Number) can be used to indicate missing values, in which case the value of and any missing values of can be set through a call to ieee.create_nan.

alphafloat, optional

, the significance level used in the construction of the confidence intervals for .

Returns
iccfloat

An estimate of the intraclass correlation to measure either the interrater reliability, , or intrarater reliability, , as specified by and .

lcifloat

An approximate lower limit for the confidence interval for the ICC.

ucifloat

An approximate upper limit for the confidence interval for the ICC.

fstatfloat

, the -statistic associated with .

df1float

and , the degrees of freedom associated with .

df2float

and , the degrees of freedom associated with .

pvaluefloat

, the upper tail probability from an distribution.

Raises
NagValueError
(errno )

On entry, .

Constraint: , or .

(errno )

On entry, .

Constraint: or .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: when or and , .

(errno )

On entry, after adjusting for missing data, .

Constraint: .

(errno )

On entry, after adjusting for missing data, .

Constraint: when or and , .

(errno )

On entry, .

Constraint: .

(errno )

On entry, after adjusting for missing data, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, after adjusting for missing data, .

Constraint: .

(errno )

Unable to calculate the ICC due to a division by zero.

This is often due to degenerate data, for example all scores being the same.

(errno )

On entry, .

Constraint: or .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

is too close to either zero or one.

This error is unlikely to occur.

Warns
NagAlgorithmicWarning
(errno )

On entry, a replicate, subject or rater contained all missing data.

All output quantities have been calculated using the reduced problem size.

(errno )

does not fall into the interval .

All output quantities have been calculated.

(errno )

The estimate of at least one variance component was negative.

Negative estimates were set to zero and all output quantities calculated as documented.

Notes

Many scientific investigations involve assigning a value (score) to a number of objects of interest (subjects). In most instances the method used to score the subject will be affected by measurement error which can affect the analysis and interpretation of the data. When the score is based on the subjective opinion of one or more individuals (raters) the measurement error can be high and, therefore, it is important to be able to assess its magnitude. One way of doing this is to run a reliability study and calculate the intraclass correlation (ICC).

In a typical reliability study each of a random sample of subjects are scored, independently, by raters. Each rater scores the same subject times (i.e., there are replicate scores). The scores, , for , for , for can be arranged into data tables, with the rows of the table, labelled , corresponding to the subjects and the columns of the table, labelled , to the raters. For example the following data, taken from Shrout and Fleiss (1979), shows a typical situation where four raters () have scored six subjects () once, i.e., there has been no replication ().

[table omitted]

The term intraclass correlation is a general one and can mean either a measure of interrater reliability, i.e., a measure of how similar the raters are, or intrarater reliability, i.e., a measure of how consistent each rater is.

There are a numerous different versions of the ICC, six of which can be calculated using icc. The different versions of the ICC can lead to different conclusions when applied to the same data, it is, therefore, essential to choose the most appropriate based on the design of the reliability study and whether inter - or intrarater reliability is of interest. The six measures of the ICC are split into three different types of studies, denoted: , and . This notation ties up with that used by Shrout and Fleiss (1979). Each class of study results in two forms of the ICC, depending on whether inter - or intrarater reliability is of interest.

: One-Factor Design

The one-factor designs differ, depending on whether inter - or intrarater reliability is of interest:

Interrater reliability

In a one-factor design to measure interrater reliability, each subject is scored by a different set of raters randomly selected from a larger population of raters. Therefore, even though they use the same set of labels each row of the data table is associated with a different set of raters.

A model of the following form is assumed:

where is the subject effect and is the error term, with and .

The measure of the interrater reliability, , is then given by:

where and are the estimated values of and respectively.

Intrarater reliability

In a one-factor design to measure intrarater reliability, each rater scores a different set of subjects. Therefore, even though they use the same set of labels, each column of the data table is associated with a different set of subjects.

A model of the following form is assumed:

where is the rater effect and is the error term, with and .

The measure of the intrarater reliability, , is then given by:

where and are the estimated values of and respectively.

: Random Factorial Design

In a random factorial design, each subject is scored by the same set of raters. The set of raters have been randomly selected from a larger population of raters.

A model of the following form is assumed:

where is the subject effect, is the rater effect, is the subject-rater interaction effect and is the error term, with , , and .

Interrater reliability

The measure of the interrater reliability, , is given by:

where , , and are the estimated values of , , and respectively.

Intrarater reliability

The measure of the intrarater reliability, , is given by:

where , , and are the estimated values of , , and respectively.

: Mixed Factorial Design

In a mixed factorial design, each subject is scored by the same set of raters and these are the only raters of interest.

A model of the following form is assumed:

where is the subject effect, is the fixed rater effect, is the subject-rater interaction effect and is the error term, with , , , and .

Interrater reliability

The measure of the interrater reliability, , is then given by:

where , and are the estimated values of , and respectively.

Intrarater reliability

The measure of the intrarater reliability, , is then given by:

where , and are the estimated values of , and respectively.

As well as an estimate of the ICC, icc returns an approximate confidence interval for the ICC and an -statistic, , associated degrees of freedom ( and ) and p-value, , for testing that the ICC is zero.

Details on the formula used to calculate the confidence interval, , , , , , and are given in Gwet (2014). In the case where there are no missing data these should tie up with the formula presented in Shrout and Fleiss (1979).

In some circumstances, the formula presented in Gwet (2014) for calculating , , and can result in a negative value being calculated. In such instances, = 102, the offending estimate is set to zero and the calculations continue as normal.

It should be noted that Shrout and Fleiss (1979) also present methods for calculating the ICC based on average scores, denoted , and . These are not supplied here as multiple replications are allowed () hence there is no need to average the scores prior to calculating ICC when using icc.

References

Gwet, K L, 2014, Handbook of Inter-rater Reliability, Fourth Edition, Advanced Analytics LLC

Shrout, P E and Fleiss, J L, 1979, Intraclass Correlations: Uses in Assessing Rater Reliability, Pyschological Bulletin, Vol 86 (2), 420–428