Agreement between Two Independent Groups of Raters

Vanbelle, Sophie; Albert, Adelin

doi:10.1007/s11336-009-9116-1

Agreement between Two Independent Groups of Raters

Theory and Methods
Published: 20 March 2009

Volume 74, pages 477–491, (2009)
Cite this article

Psychometrika Aims and scope Submit manuscript

Sophie Vanbelle¹ &
Adelin Albert¹

608 Accesses
38 Citations
Explore all metrics

Abstract

We propose a coefficient of agreement to assess the degree of concordance between two independent groups of raters classifying items on a nominal scale. This coefficient, defined on a population-based model, extends the classical Cohen’s kappa coefficient for quantifying agreement between two raters. Weighted and intraclass versions of the coefficient are also given and their sampling variance is determined by the Jackknife method. The method is illustrated on medical education data which motivated the research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Barnhart, H.X., & Williamson, J.M. (2002). Weighted least squares approach for comparing correlated kappa. Biometrics, 58, 1012–1019.
Article PubMed Google Scholar
Bland, A.C., Kreiter, C.D., & Gordon, J.A. (2005). The psychometric properties of five scoring methods applied to the Script Concordance Test. Academic Medicine, 80, 395–399.
Article PubMed Google Scholar
Charlin, B., Gagnon, R., Sibert, L., & Van der Vleuten, C. (2002). Le test de concordance de script: un instrument d’évaluation du raisonnement clinique. Pédagogie Médicale, 3, 135–144.
Article Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Article Google Scholar
Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement of partial credit. Psychological Bulletin, 70, 213–220.
Article PubMed Google Scholar
Efron, B., & Tibshirani, R.J. (1993). An introduction to the bootstrap. New York: Chapman and Hall.
Google Scholar
Feigin, P.D., & Alvo, M. (1986). Intergroup diversity and concordance for ranking data: an approach via metrics for permutations. The Annals of Statistics, 14, 691–707.
Article Google Scholar
Fleiss, J.L. (1981). Statistical methods for rates and proportions (2nd ed.) New York: Wiley.
Google Scholar
Fleiss, J.L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measure of reliability. Educational and Psychological Measurement, 33, 613–619.
Article Google Scholar
Hollander, M., & Sethuraman, J. (1978). Testing for agreement between two groups of judges. Biometrika, 65, 403–411.
Article Google Scholar
Kraemer, H.C. (1979). Ramifications of a population model for κ as a coefficient of reliability. Psychometrika, 44, 461–472.
Article Google Scholar
Kraemer, H.C. (1981). Intergroup concordance: definition and estimation. Biometrika, 68, 641–646.
Article Google Scholar
Kraemer, H.C., Vyjeyanthi, S.P., & Noda, A. (2004). Agreement statistics. In D’Agostino, R.B. (Ed.), Tutorial in Biostatistics (vol. 1, pp. 85–105). New York: Wiley.
Chapter Google Scholar
Lipsitz, S.R., Williamson, J., Klar, N., Ibrahim, J., & Parzen, M. (2001). A simple method for estimating a regression model for κ between a pair of raters. Journal of the Royal Statistical Society Series A, 164, 449–465.
Google Scholar
Raine, R., Sanderson, C., Hutchings, A., Carter, S., Larking, K., & Black, N. (2004). An experimental study of determinants of group judgments in clinical guideline development. Lancet, 364, 429–437.
Article PubMed Google Scholar
Schouten, H.J.A. (1982). Measuring pairwise interobserver agreement when all subjects are judged by the same observers. Statistica Neerlandica, 36, 45–61.
Article Google Scholar
Schucany, W.R., & Frawley, W.H. (1973). A rank test for two group concordance. Psychometrika, 38, 249–258.
Article Google Scholar
van Hoeij, M.J., Haarhuis, J.C., Wierstra, R.F., & van Beukelen, P. (2004). Developing a classification tool based on Bloom’s taxonomy to assess the cognitive level of short essay questions. Journal of Veterinary Medical Education, 31, 261–267.
Article PubMed Google Scholar
Vanbelle, S., Massart, V., Giet, G., & Albert, A. (2007). Test de concordance de script: un nouveau mode d’établissement des scores limitant l’effet du hasard. Pédagogie Médicale, 8, 71–81.
Article Google Scholar
Vanbelle, S., & Albert, A. (2009). Agreement between an isolated rater and a group of raters. Statistica Neerlandica, 63, 82–100.
Article Google Scholar
Williamson, J.M., Lipsitz, S.R., & Manatunga, A.K. (2000). Modeling kappa for measuring dependent categorical agreement data. Biostatistics, 1, 191–202.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Medical Informatics and Biostatistics, University of Liege, CHU Sart Tilman (B23), 4000, Liege, Belgium
Sophie Vanbelle & Adelin Albert

Authors

Sophie Vanbelle
View author publications
You can also search for this author in PubMed Google Scholar
Adelin Albert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sophie Vanbelle.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vanbelle, S., Albert, A. Agreement between Two Independent Groups of Raters. Psychometrika 74, 477–491 (2009). https://doi.org/10.1007/s11336-009-9116-1

Download citation

Received: 17 December 2007
Revised: 10 December 2008
Accepted: 19 February 2009
Published: 20 March 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s11336-009-9116-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Agreement between Two Independent Groups of Raters

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

Literature reviews as independent studies: guidelines for academic practice

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Agreement between Two Independent Groups of Raters

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

Literature reviews as independent studies: guidelines for academic practice

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation