Toolkit for Analyzing Reliability of a Diagnostic Test or a Measurement

1. What is reliability?
It refers to how the results of a test or a measurement are consistent when obtained repeatedly.
We use the term “reliability” in this page as an umbrella term to cover various concepts such as reproducibility, repeatability, and agreement except when we distinguish “reliability parameter” and “agreement parameter” which is explained below.

2. Repeatability vs. Reproducibility?
According to QIBA Technical Performance Working Group (2015)
Repeatability is often defined as “coffeebreak”, testretest or scanrescan experiments, which repeatedly measures the same object from identical or nearidentical conditions.
Reproducibility is referred to a technical assessment, which is based on reproducibility algorithm process. It might be required in multiple sites or clinical trial due to the reliability of the Quantitative Imaging Biomarkers (QIB) measuring system.
Reference : QIBA Technical Performance Working Group. Quantitative Imaging Biomarkers: A Review of Statistical Methods for Technical Performance Assessment. Statistical methods in medical research. 2015;24(1):2767. doi:10.1177/0962280214537344.

3. What statistical tests or parameters should be used?
Dichotomous or nominal data Ordinal data Continuous data Kappa
Proportion of agreement
Weighted kappa
Intraclass correlation coefficient (ICC)
Reliability parameters:
Intraclass correlation coefficient (ICC)
Concordance correlation coefficient (CCC)
Agreement parameters:
Withinsubject standard deviation (wSD)
Repeatability coefficient (RC) and Reproducibility coefficient (RDC)
Coefficient of variation (CV)
BlandAltman limits of agreement (LOA)

4. Reliability parameters vs. Agreement parameters?
These are crucial parameters for a test or a measurement to be used to monitor changes of a particular disease/health state over time.
Reliability parameters It is referred to the intraclass correlation coefficient (ICC) or concordance correlation coefficient, describes whether the differences among subjects in the study can be clearly defined. For example; ICC = betweensubject variability / (betweensubject variability + withinsubject variability)
Agreement parameters It accesses exactly the closeness among outcomes from the repeated measurements, so the relative comparison of reliability and assessment of absolute measurement uncertainties prefer to use this.
Agreement parameters For more information, please refer to "J Clin Epidemiol 2006;59:10331039."
Agreement parameters
Exactly how close are the repeated measurements?
These show the measurement variability/error in absolute terms.
These are crucial parameters for a test or a measurement to be used to monitor changes of a particular disease/health state over time
For more information, please refer to “J Clin Epidemiol 2006;59:10331039.”

5. Repeatability coefficient (RC) calculator
REMEMBER: RC is an agreement parameter that is essential for a quantitative biomarker to monitor changes of a particular disease/health state in a longitudinal follow up as it is the smallest change that is detectable.
Use this easy and fast RC calculator for your analysis! RC can be calculated not only for two sets of repeated measurements but also for more than two sets of repeated measurements.
Table note: ICC has three different models including oneway random, twoway random, and twoway mixed models, and can use either consistency or absolute agreement assumptions.
As ICC value for the same set of data may change according to the model and the assumption used, it is desirable to describe the model and the assumption.