Skip to main content
Erschienen in: Prevention Science 3/2018

04.04.2016

Principled Missing Data Treatments

verfasst von: Kyle M. Lang, Todd D. Little

Erschienen in: Prevention Science | Ausgabe 3/2018

Einloggen, um Zugang zu erhalten

Abstract

We review a number of issues regarding missing data treatments for intervention and prevention researchers. Many of the common missing data practices in prevention research are still, unfortunately, ill-advised (e.g., use of listwise and pairwise deletion, insufficient use of auxiliary variables). Our goal is to promote better practice in the handling of missing data. We review the current state of missing data methodology and recent missing data reporting in prevention research. We describe antiquated, ad hoc missing data treatments and discuss their limitations. We discuss two modern, principled missing data treatments: multiple imputation and full information maximum likelihood, and we offer practical tips on how to best employ these methods in prevention research. The principled missing data treatments that we discuss are couched in terms of how they improve causal and statistical inference in the prevention sciences. Our recommendations are firmly grounded in missing data theory and well-validated statistical principles for handling the missing data issues that are ubiquitous in biosocial and prevention research. We augment our broad survey of missing data analysis with references to more exhaustive resources.
Literatur
Zurück zum Zitat Allison, P. D. (2002). Missing data. Thousand Oaks, CA: Sage Publications.CrossRef Allison, P. D. (2002). Missing data. Thousand Oaks, CA: Sage Publications.CrossRef
Zurück zum Zitat Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Zurück zum Zitat Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Chichester, West Sussex: Wiley. Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Chichester, West Sussex: Wiley.
Zurück zum Zitat Diggle, P., & Kenward, M. G. (1994). Informative dropout in longitudinal data analysis (with discussion). Applied Statistics, 43, 49–94.CrossRef Diggle, P., & Kenward, M. G. (1994). Informative dropout in longitudinal data analysis (with discussion). Applied Statistics, 43, 49–94.CrossRef
Zurück zum Zitat Enders, C. K. (2001). The performance of the full information maximum likelihood estimator in multiple regression models with missing data. Educational and Psychological Measurement, 61, 713–740. doi:10.1177/00131640121971482.CrossRef Enders, C. K. (2001). The performance of the full information maximum likelihood estimator in multiple regression models with missing data. Educational and Psychological Measurement, 61, 713–740. doi:10.​1177/​0013164012197148​2.CrossRef
Zurück zum Zitat Enders, C. K. (2010). Applied missing data analysis. New York: Guilford. Enders, C. K. (2010). Applied missing data analysis. New York: Guilford.
Zurück zum Zitat Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457. doi:10.1207/S15328007SEM0803_5.CrossRef Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457. doi:10.​1207/​S15328007SEM0803​_​5.CrossRef
Zurück zum Zitat Goldstein, H., Carpenter, J., & Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 177, 553–564. doi:10.1111/rssa.12022.CrossRef Goldstein, H., Carpenter, J., & Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 177, 553–564. doi:10.​1111/​rssa.​12022.CrossRef
Zurück zum Zitat Graham, J. (2012). Missing data: analysis and design. New York: Springer.CrossRef Graham, J. (2012). Missing data: analysis and design. New York: Springer.CrossRef
Zurück zum Zitat Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. The Annals of Economic and Social Measurement, 5, 475–492. Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. The Annals of Economic and Social Measurement, 5, 475–492.
Zurück zum Zitat Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: a program for missing data. Journal of Statistical Software, 45, 1–47.CrossRef Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: a program for missing data. Journal of Statistical Software, 45, 1–47.CrossRef
Zurück zum Zitat Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125–134. doi:10.2307/2290705. Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125–134. doi:10.​2307/​2290705.
Zurück zum Zitat Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. Hoboken, NJ: John Wiley & Sons.CrossRef Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. Hoboken, NJ: John Wiley & Sons.CrossRef
Zurück zum Zitat Little, T. D., Lang, K. M., Wu, W., & Rhemtulla, M. (2016). Missing data. In D. Cicchetti (Ed.), Developmental Psychopathology: Vol. 1. Theory and method (3rd ed., pp. 760–796). New York: Wiley. Little, T. D., Lang, K. M., Wu, W., & Rhemtulla, M. (2016). Missing data. In D. Cicchetti (Ed.), Developmental Psychopathology: Vol. 1. Theory and method (3rd ed., pp. 760–796). New York: Wiley.
Zurück zum Zitat Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: a review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525–556. doi:10.3102/00346543074004525.CrossRef Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: a review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525–556. doi:10.​3102/​0034654307400452​5.CrossRef
Zurück zum Zitat Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27, 85–96. Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27, 85–96.
Zurück zum Zitat Rubin, D. B. (1978). Multiple imputations in sample surveys—a phenomenological Bayesian approach to nonresponse (Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 30–34). Rubin, D. B. (1978). Multiple imputations in sample surveys—a phenomenological Bayesian approach to nonresponse (Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 30–34).
Zurück zum Zitat Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.CrossRef Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.CrossRef
Zurück zum Zitat Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York: Chapman Hall.CrossRef Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York: Chapman Hall.CrossRef
Zurück zum Zitat Schafer, J. L., & Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics., 11, 437–457. doi:10.1198/106186002760180608.CrossRef Schafer, J. L., & Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics., 11, 437–457. doi:10.​1198/​1061860027601806​08.CrossRef
Zurück zum Zitat van Buuren, S. (2011). Multiple imputation of multilevel data. In J. Hox & J. Roberts (Eds.), Handbook of advanced multilevel analysis (pp. 173–196). Milton Park, UK: Routledge. van Buuren, S. (2011). Multiple imputation of multilevel data. In J. Hox & J. Roberts (Eds.), Handbook of advanced multilevel analysis (pp. 173–196). Milton Park, UK: Routledge.
Zurück zum Zitat van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press.CrossRef van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press.CrossRef
Zurück zum Zitat van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–67.CrossRef van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–67.CrossRef
Zurück zum Zitat van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 1049–1064. doi:10.1080/10629360600810434.CrossRef van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 1049–1064. doi:10.​1080/​1062936060081043​4.CrossRef
Zurück zum Zitat Yucel, R. M. (2008). Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philosophical Transactions of the Royal Society A, 366, 2389–2403. doi:10.1098/rsta.2008.0038.CrossRef Yucel, R. M. (2008). Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philosophical Transactions of the Royal Society A, 366, 2389–2403. doi:10.​1098/​rsta.​2008.​0038.CrossRef
Zurück zum Zitat Zhao, J. H., & Schafer, J. L. (2013). pan: multiple imputation for multivariate panel or clustered data (Version 0.9) [R Package]. Zhao, J. H., & Schafer, J. L. (2013). pan: multiple imputation for multivariate panel or clustered data (Version 0.9) [R Package].
Zurück zum Zitat Zhao, E., & Yucel, R. M. (2009). Performance of sequential imputation method in multilevel applications. In the Proceedings of the American Statistical Association Survey Research Methods Section (pp. 2800–2810). Zhao, E., & Yucel, R. M. (2009). Performance of sequential imputation method in multilevel applications. In the Proceedings of the American Statistical Association Survey Research Methods Section (pp. 2800–2810).
Metadaten
Titel
Principled Missing Data Treatments
verfasst von
Kyle M. Lang
Todd D. Little
Publikationsdatum
04.04.2016
Verlag
Springer US
Erschienen in
Prevention Science / Ausgabe 3/2018
Print ISSN: 1389-4986
Elektronische ISSN: 1573-6695
DOI
https://doi.org/10.1007/s11121-016-0644-5

Weitere Artikel der Ausgabe 3/2018

Prevention Science 3/2018 Zur Ausgabe