Toolkit for Analyzing Reliability of a Diagnostic Test or a Measurement

  • 1. What is reliability?

    It refers to how the results of a test or a measurement are consistent when obtained repeatedly.

    We use the term “reliability” in this page as an umbrella term to cover various concepts such as reproducibility, repeatability, and agreement except when we distinguish “reliability parameter” and “agreement parameter” which is explained below.

  • 2. Repeatability vs. Reproducibility?

    According to QIBA Technical Performance Working Group (2015)

    Repeatability is often defined as “coffee-break”, test-retest or scan-rescan experiments, which repeatedly measures the same object from identical or near-identical conditions.

    Reproducibility is referred to a technical assessment, which is based on reproducibility algorithm process. It might be required in multiple sites or clinical trial due to the reliability of the Quantitative Imaging Biomarkers (QIB) measuring system.

    Reference : QIBA Technical Performance Working Group. Quantitative Imaging Biomarkers: A Review of Statistical Methods for Technical Performance Assessment. Statistical methods in medical research. 2015;24(1):27-67. doi:10.1177/0962280214537344.

  • 3. What statistical tests or parameters should be used?

    Dichotomous or nominal data Ordinal data Continuous data


    Proportion of agreement

    Weighted kappa

    Intraclass correlation coefficient (ICC)

    Reliability parameters:

    Intraclass correlation coefficient (ICC)

    Concordance correlation coefficient (CCC)

    Agreement parameters:

    Within-subject standard deviation (wSD)

    Repeatability coefficient (RC) and Reproducibility coefficient (RDC)

    Coefficient of variation (CV)

    Bland-Altman limits of agreement (LOA)

    Table note: ICC has three different models including one-way random, two-way random, and two-way mixed models, and can use either consistency or absolute agreement assumptions.
    As ICC value for the same set of data may change according to the model and the assumption used, it is desirable to describe the model and the assumption.

  • 4. Reliability parameters vs. Agreement parameters?

    These are crucial parameters for a test or a measurement to be used to monitor changes of a particular disease/health state over time.

    Reliability parameters It is referred to the intraclass correlation coefficient (ICC) or concordance correlation coefficient, describes whether the differences among subjects in the study can be clearly defined. For example;      ICC = between-subject variability / (between-subject variability + within-subject variability)

    Agreement parameters   It accesses exactly the closeness among outcomes from the repeated measurements, so the relative comparison of reliability and assessment of absolute measurement uncertainties prefer to use this.

    Agreement parameters   For more information, please refer to "J Clin Epidemiol 2006;59:1033-1039."

    Agreement parameters

    Exactly how close are the repeated measurements?

    These show the measurement variability/error in absolute terms.

    These are crucial parameters for a test or a measurement to be used to monitor changes of a particular disease/health state over time

    For more information, please refer to “J Clin Epidemiol 2006;59:1033-1039.”

  • 5. Repeatability coefficient (RC) calculator

    REMEMBER: RC is an agreement parameter that is essential for a quantitative biomarker to monitor changes of a particular disease/health state in a longitudinal follow up as it is the smallest change that is detectable.
    Use this easy and fast RC calculator for your analysis! RC can be calculated not only for two sets of repeated measurements but also for more than two sets of repeated measurements.

    Upload your data as an Excel file. [Download Sample File]

    Select the number of decimal places for the output.