Skip to main content
Log in

A New Interpretation of the Weighted Kappa Coefficients

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Reliability and agreement studies are of paramount importance. They do contribute to the quality of studies by providing information about the amount of error inherent to any diagnosis, score or measurement. Guidelines for reporting reliability and agreement studies were recently provided. While the use of the kappa-like family is advised for categorical and ordinal scales, no further guideline in the choice of a weighting scheme is given. In the present paper, a new simple and practical interpretation of the linear- and quadratic-weighted kappa coefficients is given. This will help researchers in motivating their choice of a weighting scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Brenner, H., & Kliebsch, U. (1996). Dependence of weighed kappa coefficients on the number of categories. Epidemiology, 7, 199–202.

    Article  PubMed  Google Scholar 

  • Byrt, T., Bishop, J., & Carlin, J. B. (1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology, 46, 423–429.

    Article  PubMed  Google Scholar 

  • Cicchetti, D., & Allison, T. (1971). A new procedure for assessing reliability of scoring eeg sleep recordings. American Journal EEG Technology, 11, 101–109.

    Google Scholar 

  • Cicchetti, D. V., & Feinstein, A. R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551–558.

    Article  PubMed  Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.

    Article  Google Scholar 

  • Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.

    Article  PubMed  Google Scholar 

  • Feinstein, A. R., & Cicchetti, D. V. (1990). High agreement but low kappa: I. The problem of two paradoxes. Journal of Clinical Epidemiology, 43, 543–549.

    Article  PubMed  Google Scholar 

  • Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measure of reliability. Educational and Psychological Measurement, 33, 613–619.

    Article  Google Scholar 

  • Kottner, J., Audigé, L., Brorson, S., Donner, A., Gajewski, B., Hröbjartsson, A., et al. (2011). Guidelines for reporting reliability and agreement studies (grras) were proposed. Journal of Clinical Epidemiology, 64, 96–106.

    Article  PubMed  Google Scholar 

  • Kraemer, H. C. (1979). Ramifications of a population model for \(\kappa \) as a coefficient of reliability. Psychometrika, 44, 461–472.

    Article  Google Scholar 

  • Kraemer, H. C., Vyjeyanthi, S. P., & Noda, A. (2004). Dynamic ambient paradigms. In R. B. D’Agostino (Ed.), Tutorial in Biostatistics (Vol. 1, pp. 85–105). New York: Wiley.

    Google Scholar 

  • Lipsitz, S. R. (1992). Methods for estimating the parameters of a linear model for ordered categorical data. Biometrics, 48, 271–281.

    Article  PubMed  Google Scholar 

  • McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46.

    Article  Google Scholar 

  • Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London, Series A, 195, 1–45.

    Article  Google Scholar 

  • Rogot, E., & Goldberg, I. D. (1966). A proposed index for measuring agreement in test–retest studies. Journal of Chronic Diseases, 19, 991–1006.

    Article  PubMed  Google Scholar 

  • Schuster, C. (2004). A note on the interpretation of weighted kappa and its relation to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243–253.

    Article  Google Scholar 

  • Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420–428.

    Article  PubMed  Google Scholar 

  • Stine, W. (1989). Interobserver relational agreement. Psychological Bulletin, 106, 341–347.

    Article  Google Scholar 

  • Vach, W. (2005). The dependence of Cohen’s kappa on the prevalence does not matter. Journal of Clinical Epidemiology, 58, 655–661.

    Article  PubMed  Google Scholar 

  • Vanbelle, S. (2013). Clinical agreement in qualitative measurements: The kappa coefficient in clinical research. In S. Doi & G. Williams (Eds.), Methods of clinical epidemiology, Springer series on epidemiology and public health (pp. 3–38). Heidelberg: Springer.

    Google Scholar 

  • Vanbelle, S., & Albert, A. (2009). A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology, 6, 157–163.

    Article  Google Scholar 

  • Warrens, M. (2013a). Conditional inequalities between cohen’s kappa and weighted kappas. Statistical Methodology, 10, 14–22.

    Article  Google Scholar 

  • Warrens, M. (2014). Corrected zegers-ten berge coefficients are special cases of Cohen’s weighted kappa. Journal of Classification, 31, 179–193.

    Article  Google Scholar 

  • Warrens, M. J. (2011). Cohen’s linearly weighted kappa is a weighted average of \(2\times 2\) kappas. Psychometrika, 76, 471–486.

    Article  Google Scholar 

  • Warrens, M. J. (2012). Cohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables. Statistical Methodology, 9, 440–444.

    Article  Google Scholar 

  • Warrens, M. J. (2013b). The Cicchetti\(-\)Allison weighting matrix is positive definite. Computational Statistics & Data Analysis, 59, 180–182.

    Article  Google Scholar 

  • Warrens, M. J. (2013c). Some paradoxical results for the quadratically weighted kappa. Psychometrika, 77, 315–323.

    Article  Google Scholar 

  • Warrens, M. J. (2013). Weighted kappas for \(3\times 3\) tables. Journal of Probability and Statistics.

  • Yang, J., & Chinchilli, V. M. (2009). Fixed-effects modeling of Cohen’s kappa for bivariate multinomial data. Communications in Statistics: Theory and Methods, 38, 3634–3653.

    Article  Google Scholar 

  • Yang, J., & Chinchilli, V. M. (2011). Fixed-effects modeling of Cohen’s weighted kappa for bivariate multinomial data. Computational Statistics & Data Analysis, 55, 1061–1070.

    Article  Google Scholar 

Download references

Acknowledgments

This research is part of project 451-13-002 funded by the Netherlands Organisation for Scientific Research. The author thanks three anonymous reviewers and the associate editor for their helpful comments and valuable suggestions on a earlier version of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sophie Vanbelle.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vanbelle, S. A New Interpretation of the Weighted Kappa Coefficients. Psychometrika 81, 399–410 (2016). https://doi.org/10.1007/s11336-014-9439-4

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-014-9439-4

Keywords

Navigation