The online version of this article (doi:10.1186/1471-2288-14-100) contains supplementary material, which is available to authorized users.
The authors declare that they have no competing interests.
VS was involved in conceptualization, literature search, writing, data analysis and creating charts for the study. SIB was involved in conceptualization, writing and data interpretation of the study. Both authors read and approved the final manuscript.
Various measures of observer agreement have been proposed for 2x2 tables. We examine the behavior of alternative measures of observer agreement for 2x2 tables.
The alternative measures of observer agreement and the corresponding agreement chart were calculated under various scenarios of marginal distributions (symmetrical or not, balanced or not) and of degree of diagonal agreement, and their behaviors are compared. Specifically, two specific paradoxes previously identified for kappa were examined: (1) low kappa values despite high observed agreement under highly symmetrically imbalanced marginals, and (2) higher kappa values for asymmetrical imbalanced marginal distributions.
Kappa and alpha behave similarly and are affected by the marginal distributions more so than the B-statistic, AC1-index and delta measures. Delta and kappa provide values that are similar when the marginal totals are asymmetrically imbalanced or symmetrical but not excessively imbalanced. The AC1-index and B-statistics provide closer results when the marginal distributions are symmetrically imbalanced and the observed agreement is greater than 50%. Also, the B-statistic and the AC1-index provide values closer to the observed agreement when the subjects are classified mostly in one of the diagonal cells. Finally, the B-statistic is seen to be consistent and more stable than kappa under both types of paradoxes studied.
The B-statistic behaved better under all scenarios studied as well as with varying prevalences, sensitivities and specificities than the other measures, we recommend using B-statistic along with its corresponding agreement chart as an alternative to kappa when assessing agreement in 2x2 tables.
Banerjee M, Capozzoli M, McSweeney L, Sinha D: Beyond kappa: A review of interrater agreement measures. Can J Stat. 1999, 27 (1): 3-23. 10.2307/3315487. CrossRef
Kraemer HC, Periyakoil VS, Noda A: Kappa coefficients in medical research. Stat Med. 2002, 21 (14): 2109-2129. 10.1002/sim.1180. CrossRef
Landis JR, King TS, Choi JW, Chinchilli VM, Koch GG: Measures of agreement and concordance with clinical research applications. Stat Biopharma Res. 2011, 3 (2): doi:10.1198/sbr.2011.10019
Cohen J: A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960, 20: 37-46. 10.1177/001316446002000104. CrossRef
Brennan RL, Prediger DJ: Coefficient kappa: Some uses, misuses, and alternatives. Educ Psychol Meas. 1981, 41 (3): 687-699. 10.1177/001316448104100307. CrossRef
Kraemer HC: Ramifications of a population model forκ as a coefficient of reliability. Psychometrika. 1979, 44 (4): 461-472. 10.1007/BF02296208. CrossRef
Bangdiwala SI: The Agreement Chart. 1988, Chapel Hill: The University of North Carolina
Andres AM, Femia-Marzo P: Chance-corrected measures of reliability and validity in 2× 2 tables. Commun Stat Theory Met. 2008, 37 (5): 760-772. 10.1080/03610920701669884. CrossRef
Andrés AM, Marzo PF: Delta: a new measure of agreement between two raters. Brit J Math Stat Psychol. 2004, 57 (1): 1-19. 10.1348/000711004849268. CrossRef
Gwet KL: Computing inter‒rater reliability and its variance in the presence of high agreement. Brit J Math Stat Psychol. 2008, 61 (1): 29-48. 10.1348/000711006X126600. CrossRef
Bangdiwala SI: A Graphical Test for Observer Agreement. 45th International Statistical Institute Meeting, 1985. 1985, Amsterdam, 307-308.
Meyer D, Zeileis A, Hornik K, Meyer MD, KernSmooth S: The vcd package. Retrieved October. 2007, 3: 2007-
Friendly M: Visualizing Categorical Data. 2000, Cary, NC: SAS Institute
Guggenmoos‒Holzmann I: How reliable are change‒corrected measures of agreement?. Stat Med. 1993, 12 (23): 2191-2205. 10.1002/sim.4780122305. CrossRef
- Observer agreement paradoxes in 2x2 tables: comparison of agreement measures
Shrikant I Bangdiwala
- BioMed Central
Neu im Fachgebiet AINS
Meistgelesene Bücher aus dem Fachgebiet AINS
Mail Icon II