The online version of this article (doi:10.1186/1471-2288-14-75) contains supplementary material, which is available to authorized users.
The authors declare that they have no competing interests.
This research was conceived by IRW and PR. All authors contributed to the design and interpretation of simulation studies. TPM performed the simulations and the illustrative analysis, and drafted the manuscript. All authors have approved the submitted version.
Multiple imputation is a commonly used method for handling incomplete covariates as it can provide valid inference when data are missing at random. This depends on being able to correctly specify the parametric model used to impute missing values, which may be difficult in many realistic settings. Imputation by predictive mean matching (PMM) borrows an observed value from a donor with a similar predictive mean; imputation by local residual draws (LRD) instead borrows the donor’s residual. Both methods relax some assumptions of parametric imputation, promising greater robustness when the imputation model is misspecified.
We review development of PMM and LRD and outline the various forms available, and aim to clarify some choices about how and when they should be used. We compare performance to fully parametric imputation in simulation studies, first when the imputation model is correctly specified and then when it is misspecified.
In using PMM or LRD we strongly caution against using a single donor, the default value in some implementations, and instead advocate sampling from a pool of around 10 donors. We also clarify which matching metric is best. Among the current MI software there are several poor implementations.
PMM and LRD may have a role for imputing covariates (i) which are not strongly associated with outcome, and (ii) when the imputation model is thought to be slightly but not grossly misspecified. Researchers should spend efforts on specifying the imputation model correctly, rather than expecting predictive mean matching or local residual draws to do the work.
Rubin DB: Inference and missing data. Biometrika. 1976, 63: 581-592. 10.1093/biomet/63.3.581. CrossRef
Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: John Wiley and Sons CrossRef
Little RJA: Missing-data adjustments in large surveys. J Business & Econ Stat. 1988, 6: 287-296.
David M, Little RJA, Samuhel ME, Triest RK: Alternative methods for CPS income imputation. J Am Stat Assoc. 1986, 81 (393): 29-41. 10.1080/01621459.1986.10478235. CrossRef
Rubin DB, Schenker N: Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc. 1986, 81: 366-374. 10.1080/01621459.1986.10478280. CrossRef
van Buuren S, Groothuis-Oudshoorn K: Mice: Multivariate Imputation by Chained Equations. February 2014, Netherlands Organisation for Applied Scientific Research TNO
Meinfelder F: BaBooN: Bayesian Bootstrap Predictive Mean Matching – Multiple and single imputation for discrete data. March 2011, Universität Bamberg
Gelman A, Hill J, Su YS, Yajima M, Pittau MG: mi: Missing Data Imputation and Model Checking. August 2013, Columbia University
SAS Institute Inc: Predictive mean matching method for monotone missing data. February 2014, http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mi_sect020.htm,
Solas for Missing Data Analysis: Predictive mean matching method. February 2014, http://www.statsols.com/predictive-mean-matching-method/,
SPSS: Predictive mean matching (multiple imputation algorithms). February 2014, http://pic.dhe.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.%20statistics.help%2Falg_multiple_imputation_univariate_pmm.htm,
StataCorp: mi impute pmm. February 2014, http://www.stata.com/manuals13/mimiimputepmm.pdf,
Schenker N, Taylor JMG: Partially parametric techniques for multiple imputation. Comput Stat & Data Anal. 1996, 22 (4): 425-446. 10.1016/0167-9473(95)00057-7. CrossRef
Heitjan DF, Little RJA: Multiple imputation for the fatal accident reporting system. J R Stat Soc Series C (Appl Stat). 1991, 40 (1): 13-29.
Royston P: Multiple imputation of missing values: update. Stata J. 2005, 5: 527-536.
Harrell FE: Hmisc: Harrell Miscellaneous. January 2014, Vanderbilt University
Heitjan DF, Landis RJ: Assessing secular trends in blood pressure: a multiple-imputation approach. J Am Stat Assoc. 1994, 89 (427): 750-759. 10.1080/01621459.1994.10476808. CrossRef
Horton NJ, Lipsitz SR: Multiple imputation in practice: comparison of software packages for regression models with missing variables. Am Stat. 2001, 55: 244-254. 10.1198/000313001317098266. CrossRef
Siddique J, Harel O: MIDAS: a SAS macro for multiple imputation using distance-aided selection of donors. J Stat Softw. 2009, 29 (9): 1-18. CrossRef
Moriarity C, Scheuren F: A note on rubin’s statistical matching using file concatenation with adjusted weights and multiple imputations. J Business & Econ Stat. 2003, 21 (1): 65-73. 10.1198/073500102288618766. CrossRef
Durrant GB, Skinner C: Using missing data methods to correct for measurement error in a distribution function. Surv Methodol. 2006, 32 (1): 25-36.
StataCorp: Stata Statistical Software: Release 13. 2013, College Station, TX: Stata Press
Bartlett JW, Seaman SR, White IR, Carpenter JR, for the Alzheimer’sDiseaseNeuroimagingInitiative*: Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Stat Methods Med Res. 2014, 0962280214521348+- http://smm.sagepub.com/content/early/2014/03/31/0962280214521348,
Cox DR: Regression models and life tables. J R Stat Soc series B. 1972, 34: 187-220.
Dardanoni V, Modica S, Peracchi F: Regression with imputed covariates: A generalized missing-indicator approach. J Econom. 2011, 162 (2): 362-368. 10.1016/j.jeconom.2011.02.005. CrossRef
Vink G, van Buuren S: Multiple imputation of squared terms. Sociol Methods & Res. 2013, 42 (4): 598-607. 10.1177/0049124113502943. CrossRef
- Tuning multiple imputation by predictive mean matching and local residual draws
Tim P Morris
Ian R White
- BioMed Central
Neu im Fachgebiet AINS
Meistgelesene Bücher aus dem Fachgebiet AINS
Mail Icon II