Abstract
We propose a coefficient of agreement to assess the degree of concordance between two independent groups of raters classifying items on a nominal scale. This coefficient, defined on a population-based model, extends the classical Cohen’s kappa coefficient for quantifying agreement between two raters. Weighted and intraclass versions of the coefficient are also given and their sampling variance is determined by the Jackknife method. The method is illustrated on medical education data which motivated the research.
Similar content being viewed by others
References
Barnhart, H.X., & Williamson, J.M. (2002). Weighted least squares approach for comparing correlated kappa. Biometrics, 58, 1012–1019.
Bland, A.C., Kreiter, C.D., & Gordon, J.A. (2005). The psychometric properties of five scoring methods applied to the Script Concordance Test. Academic Medicine, 80, 395–399.
Charlin, B., Gagnon, R., Sibert, L., & Van der Vleuten, C. (2002). Le test de concordance de script: un instrument d’évaluation du raisonnement clinique. Pédagogie Médicale, 3, 135–144.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement of partial credit. Psychological Bulletin, 70, 213–220.
Efron, B., & Tibshirani, R.J. (1993). An introduction to the bootstrap. New York: Chapman and Hall.
Feigin, P.D., & Alvo, M. (1986). Intergroup diversity and concordance for ranking data: an approach via metrics for permutations. The Annals of Statistics, 14, 691–707.
Fleiss, J.L. (1981). Statistical methods for rates and proportions (2nd ed.) New York: Wiley.
Fleiss, J.L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measure of reliability. Educational and Psychological Measurement, 33, 613–619.
Hollander, M., & Sethuraman, J. (1978). Testing for agreement between two groups of judges. Biometrika, 65, 403–411.
Kraemer, H.C. (1979). Ramifications of a population model for κ as a coefficient of reliability. Psychometrika, 44, 461–472.
Kraemer, H.C. (1981). Intergroup concordance: definition and estimation. Biometrika, 68, 641–646.
Kraemer, H.C., Vyjeyanthi, S.P., & Noda, A. (2004). Agreement statistics. In D’Agostino, R.B. (Ed.), Tutorial in Biostatistics (vol. 1, pp. 85–105). New York: Wiley.
Lipsitz, S.R., Williamson, J., Klar, N., Ibrahim, J., & Parzen, M. (2001). A simple method for estimating a regression model for κ between a pair of raters. Journal of the Royal Statistical Society Series A, 164, 449–465.
Raine, R., Sanderson, C., Hutchings, A., Carter, S., Larking, K., & Black, N. (2004). An experimental study of determinants of group judgments in clinical guideline development. Lancet, 364, 429–437.
Schouten, H.J.A. (1982). Measuring pairwise interobserver agreement when all subjects are judged by the same observers. Statistica Neerlandica, 36, 45–61.
Schucany, W.R., & Frawley, W.H. (1973). A rank test for two group concordance. Psychometrika, 38, 249–258.
van Hoeij, M.J., Haarhuis, J.C., Wierstra, R.F., & van Beukelen, P. (2004). Developing a classification tool based on Bloom’s taxonomy to assess the cognitive level of short essay questions. Journal of Veterinary Medical Education, 31, 261–267.
Vanbelle, S., Massart, V., Giet, G., & Albert, A. (2007). Test de concordance de script: un nouveau mode d’établissement des scores limitant l’effet du hasard. Pédagogie Médicale, 8, 71–81.
Vanbelle, S., & Albert, A. (2009). Agreement between an isolated rater and a group of raters. Statistica Neerlandica, 63, 82–100.
Williamson, J.M., Lipsitz, S.R., & Manatunga, A.K. (2000). Modeling kappa for measuring dependent categorical agreement data. Biostatistics, 1, 191–202.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vanbelle, S., Albert, A. Agreement between Two Independent Groups of Raters. Psychometrika 74, 477–491 (2009). https://doi.org/10.1007/s11336-009-9116-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-009-9116-1