The online version of this article (doi:10.1186/1471-2288-14-121) contains supplementary material, which is available to authorized users.
The authors declare that they have no competing interests.
ACI wrote the computer simulation programs, carried out the simulations, and participated in the data analysis. MCP wrote the MLS program. KKD wrote the GCI program, derived the Bayesian priors, participated in the data analysis, and wrote the first draft. All authors participated in conceptual development and writing. All authors read and approved the final manuscript.
The intraclass correlation coefficient (ICC) is widely used in biomedical research to assess the reproducibility of measurements between raters, labs, technicians, or devices. For example, in an inter-rater reliability study, a high ICC value means that noise variability (between-raters and within-raters) is small relative to variability from patient to patient. A confidence interval or Bayesian credible interval for the ICC is a commonly reported summary. Such intervals can be constructed employing either frequentist or Bayesian methodologies.
This study examines the performance of three different methods for constructing an interval in a two-way, crossed, random effects model without interaction: the Generalized Confidence Interval method (GCI), the Modified Large Sample method (MLS), and a Bayesian method based on a noninformative prior distribution (NIB). Guidance is provided on interval construction method selection based on study design, sample size, and normality of the data. We compare the coverage probabilities and widths of the different interval methods.
We show that, for the two-way, crossed, random effects model without interaction, care is needed in interval method selection because the interval estimates do not always have properties that the user expects. While different methods generally perform well when there are a large number of levels of each factor, large differences between the methods emerge when the number of one or more factors is limited. In addition, all methods are shown to lack robustness to certain hard-to-detect violations of normality when the sample size is limited.
Decision rules and software programs for interval construction are provided for practical implementation in the two-way, crossed, random effects model without interaction. All interval methods perform similarly when the data are normal and there are sufficient numbers of levels of each factor. The MLS and GCI methods outperform the NIB when one of the factors has a limited number of levels and the data are normally distributed or nearly normally distributed. None of the methods work well if the number of levels of a factor are limited and data are markedly non-normal. The software programs are implemented in the popular R language.
Additional file 1: Supplement includes additional discussion, simulations, data analysis details, figures and tables.(PDF 855 KB)
Additional file 2: Supplement presents the mean and standard deviation of the point estimates of the ICCb for different models and designs presented in the main paper.(XLS 22 KB)
Additional file 3: Supplement presents complete tables of the core simulations simulations from which the tables in the paper are a subset.(XLS 62 KB)
Additional file 4: Supplement is a table of results for designs with 8, 10, 12, 14 and 16 factor levels for laboratories.(XLS 31 KB)
Additional file 5: Supplement is a table of results for designs with 4, 5, 6 and 7 factor levels for laboratories.(XLS 26 KB)
Donner A: The use of correlation and regression in the analysis of family resemblance. Am J Epidemiol. 1979, 110 (3): 335-342. PubMed
Wolak M, Fairbairn D, Paulsen Y: Guidelines for estimating repeatability. Methods Ecol Evol. 2012, 3 (1): 129-137. 10.1111/j.2041-210X.2011.00125.x. CrossRef
Gisev N, Bell J, Chen T: Interrate agreement and interrater reliability: key concepts, approaches, and applications. Res Soc Admin Pharm. 2013, 9 (3): 330-338. 10.1016/j.sapharm.2012.04.004. CrossRef
Berger J: Statistical Decision Theory and Bayesian Analysis. 1985, New York: Springer-Verlag, 2 CrossRef
Carlin B, Louis T: Bayesian Methods for Data Analysis. 2009, Boca Raton, FL: Chapman and Hall, 3
Little R: Calibrated Bayes: a Bayes/frequentist roadmap. Am Stat. 2006, 60: 213-223. 10.1198/000313006X117837. CrossRef
Rubin D: Bayesianly justifiable and relevant frequency calculations for applied statisticians. Ann Stat. 1984, 12: 1151-1172. 10.1214/aos/1176346785. CrossRef
Box G: Sampling and Bayes inference in scientific modeling and robustness. J Royal Stat Soc A. 1980, 143: 383-430. 10.2307/2982063. CrossRef
Browne W, Draper D: A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Anal. 2006, 1 (3): 473-514.
Yin G: Bayesian generalized method of moments. Bayesian Anal. 2009, 4: 191-208. 10.1214/09-BA407. CrossRef
Leonard D: Estimating a bivariate linear relationship. Bayesian Anal. 2011, 6: 727-754. CrossRef
Bingham M, Vardeman S, Nordman D: Bayes one-sample and one-way random effects analyses for 3-D orientations with application to materials science. Bayesian Anal. 2009, 4: 607-630. 10.1214/09-BA423. CrossRef
Samaniego F: A Comparison of the Bayesian and Frequentist Approaches to Estimation. 2010, New York: Springer CrossRef
Barzman D, Mossman D, Sonnier L, Sorter M: Brief rating of aggression by children and adolescents (BRACHA): a reliability study. J Am Acad Psychiatry Law. 2012, 40: 374-382. PubMed
Dobbin K, Beer D, Meyerson M, Yeatman T, Gerald W, Jacobson J, Conley B, Buetow K, Heiskanen M, Simon RM, Minna JD, Girard L, Misek DE, Taylor JM, Hanash S, Naoki K, Hayes DN, Ladd-Acosta C, Enkemann SA, Viale A, Giordano TJ: Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin Cancer Res. 2005, 11: 565-572. PubMed
McShane LM, Aamodt R, Cordon-Cardo C, Cote R, Faraggi D, Fradet Y, Grossman HB, Peng A, Taube SE, Waldman FM: Reproducibility of p53 immunohistochemistry in bladder tumors. National cancer institute, bladder tumor marker network. Clin Cancer Res. 2000, 6 (5): 1854-1864. PubMed
Chen C, Barnhart HX: Comparison of ICC and CCC for assessing agreement for data without and with replications. Comput Stat Data Anal. 2008, 53: 554-564. 10.1016/j.csda.2008.09.026. CrossRef
Lin LI, Hedayat AS, Wu WM: Statistical Tools for Measuring Agreement. 2012, New York: Springer CrossRef
Montgomery D: Design and Analysis of Experiments. 2013, New York: Wiley, 8
Searle S, Fawcett R: Expected mean squares in variance components models having finite populations. Biometrics. 1970, 26 (2): 243-254. 10.2307/2529072. CrossRef
Lin LI, Hedayat AS, Wu WM: A unified approach for assessing agreement for continuous and categorical data. Biopharm Stat. 2007, 17 (4): 629-652. 10.1080/10543400701376498. CrossRef
Graybill F, Wang C: Confidence intervals for nonnegative linear combinations of variances. J Am Stat Assoc. 1980, 75: 869-873. 10.1080/01621459.1980.10477565. CrossRef
Burdick R, Borror C, Montgomery D: Design and Analysis of Gauge R&R Studies: Making Decisions with Confidence Intervals in Random and Mixed ANOVA Models. 2005, Alexandria, Virginia: ASA and SIAM CrossRef
Arteaga C, Jeyaratnam S, Graybill F: Confidence intervals for proportions of total variance in the two-way cross component of variance model. Commun Stat Theor Methods. 1982, 11: 1643-1658. 10.1080/03610928208828338. CrossRef
Weerahandi S: Generalized confidence intervals. J Am Stat Assoc. 1993, 88 (423): 899-905. 10.1080/01621459.1993.10476355. CrossRef
Robert C, Casella G: Monte Carlo Statistical Methods. 2010, New York: Springer
Gelfand A, Smith A: Sampling based approaches to calculating marginal densities. J Am Stat Assoc. 1990, 85: 398-409. 10.1080/01621459.1990.10476213. CrossRef
Tierney L: Markov chains for exploring posterior distributions. Ann Stat. 1991, 22: 1701-1762. CrossRef
Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E: Equations of state calculations by fast computing machines. J Chem Phys. 1953, 21: 1087-1092. 10.1063/1.1699114. CrossRef
Thomas A, O’Hara B, Ligges U, Sturtz S: Making BUGS open. R News. 2006, 6: 12-17.
Lunn D, Thomas A, Best N: WinBUGS – a Bayesian modeling framework: concepts, structure and extensibility. Stat Comput. 2000, 10: 325-337. 10.1023/A:1008929526011. CrossRef
Weerahandi S: Exact Statistical Methods for Data Analysis. 2003, New York: Springer-Verlag
Gelman A: Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 2006, 1 (3): 515-533.
Hadfield J: MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. J Stat Software. 2010, 33 (2): 1-22. CrossRef
Box G, Cox D: An analysis of transformations (with discussion). J Royal Stat Soc B. 1964, 26: 211-252.
John J, Draper N: An alternative family of transformations. Appl Stat. 1980, 29: 190-197. 10.2307/2986305. CrossRef
Muller P, Quintana F: Nonparametric Bayesian data analysis. Statistical Science. 2004, 19 (1): 95-110. 10.1214/088342304000000017. CrossRef
Lehman E, Cassella G: Theory of Point Estimation. 1998, New York: Springer
- Comparison of confidence interval methods for an intra-class correlation coefficient (ICC)
Alexei C Ionan
Mei-Yin C Polley
Lisa M McShane
Kevin K Dobbin
- BioMed Central
Neu im Fachgebiet AINS
Meistgelesene Bücher aus dem Fachgebiet AINS
Mail Icon II