Propensity score analysis is a popular method to control for confounding in observational studies. A challenge in propensity methods is missing values in confounders. Several strategies for handling missing values exist, but guidance in choosing the best method is needed. In this simulation study, we compared four strategies of handling missing covariate values in propensity matching and propensity weighting. These methods include: complete case analysis, missing indicator method, multiple imputation and combining multiple imputation and missing indicator method. Concurrently, we aimed to provide guidance in choosing the optimal strategy. Simulated scenarios varied regarding missing mechanism, presence of effect modification or unmeasured confounding. Additionally, we demonstrated how missingness graphs help clarifying the missing structure. When no effect modification existed, complete case analysis yielded valid causal treatment effects even when data were missing not at random. In some situations, complete case analysis was also able to partially correct for unmeasured confounding. Multiple imputation worked well if the data were missing (completely) at random, and if the imputation model was correctly specified. In the presence of effect modification, more complex imputation models than default options of commonly used statistical software were required. Multiple imputation may fail when data are missing not at random. Here, combining multiple imputation and the missing indicator method reduced the bias as the missing indicator variable can be a proxy for unobserved confounding. The optimal way to handle missing values in covariates of propensity score models depends on the missing data structure and the presence of effect modification. When effect modification is present, default settings of imputation methods may yield biased results even if data are missing at random.
Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res. 2011;46(3):399–424. https://doi.org/10.1080/00273171.2011.568786. CrossRef
Austin PC. The performance of different propensity-score methods for estimating relative risks. J Clin Epidemiol. 2008;61(6):537–45. https://doi.org/10.1016/j.jclinepi.2007.07.011. CrossRefPubMed
d’Agostino RB. Tutorial in biostatistics: propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med. 1998;17(19):2265–81. CrossRef
Donders AR, van der Heijden GJ, Stijnen T, Moons KG. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol. 2006;59(10):1087–91. https://doi.org/10.1016/j.jclinepi.2006.01.014. CrossRefPubMed
Benchimol EI, Smeeth L, Guttmann A, Harron K, Moher D, Petersen I, et al. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLOS Med. 2015;12(10):e1001885. https://doi.org/10.1371/journal.pmed.1001885. CrossRefPubMedPubMedCentral
D’Agostino RB, Rubin DB. Estimating and using propensity scores with partially missing data. J Am Stat Assoc. 2000;95(451):749–59. https://doi.org/10.1080/01621459.2000.10474263. CrossRef
Greenland S, Finkle WD. A critical look at methods for handling missing covariates in epidemiologic regression analyses. Am J Epidemiol. 1995;142(12):1255–64. https://doi.org/10.1093/oxfordjournals.aje.a117592. CrossRefPubMed
unvan der Heijden GJ, Donders AR, Stijnen T, Moons KG. Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol. 2006;59(10):1102–9. https://doi.org/10.1016/j.jclinepi.2006.01.015. CrossRefPubMed
Knol MJ, Janssen KJM, Donders ART, Egberts ACG, Heerdink ER, Grobbee DE, et al. Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example. J Clin Epidemiol. 2010;63(7):728–36. https://doi.org/10.1016/j.jclinepi.2009.08.028. CrossRefPubMed
Mohan K, Pearl J, Tian J. Graphical models for inference with missing data. Adv Neural Inf Process Syst. 2013;26:1277–85.
Thoemmes F, Mohan K. Graphical representation of missing data problems. Struct Equ Model Multidiscip J. 2015;22(4):631–42. https://doi.org/10.1080/10705511.2014.937378. CrossRef
Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Softw. 2010;45:1–68.
Tilling K, Williamson EJ, Spratt M, Sterne JAC, Carpenter JR. Appropriate inclusion of interactions was needed to avoid bias in multiple imputation. J Clin Epidemiol. 2016;80:107–15. https://doi.org/10.1016/j.jclinepi.2016.07.004. CrossRefPubMedPubMedCentral
Moons KGM, Donders RART, Stijnen T, Harrell FE. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006;59(10):1092–101. https://doi.org/10.1016/j.jclinepi.2006.01.009. CrossRef
Seaman S, White I. Inverse probability weighting with missing predictors of treatment assignment or missingness. Commun Stat Theory Methods. 2014;43(16):3499–515. https://doi.org/10.1080/03610926.2012.700371. CrossRef
King G, Nielsen R. Why propensity scores should not be used for matching. Copy at http://j.mp/1sexgVw. Download Citation BibTex Tagged XML Download Paper. 2016;378. Accessed 15 May 2018.
Hernan MA, Robins JM. Causal inference. Boca Raton: CRC; 2010.
Meng X-L. Multiple-imputation inferences with uncongenial sources of input. Stat Sci. 1994;9(4):538–58. CrossRef
- A comparison of different methods to handle missing data in the context of propensity score analysis
Olaf M. Dekkers
Saskia le Cessie
- Springer Netherlands
European Journal of Epidemiology
The official journal of the European Epidemiology Federation
Print ISSN: 0393-2990
Elektronische ISSN: 1573-7284