Skip to main content
Erschienen in: BMC Medical Research Methodology 1/2010

Open Access 01.12.2010 | Research article

Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study

verfasst von: Andrea Marshall, Douglas G Altman, Roger L Holder

Erschienen in: BMC Medical Research Methodology | Ausgabe 1/2010

Einloggen, um Zugang zu erhalten

Abstract

Background

The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model.

Methods

Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained.

Results

CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness.

Conclusions

Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness.
Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Burton A, Altman DG: Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. British Journal of Cancer. 2004, 91 (1): 4-8. 10.1038/sj.bjc.6601907.CrossRefPubMedPubMedCentral Burton A, Altman DG: Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. British Journal of Cancer. 2004, 91 (1): 4-8. 10.1038/sj.bjc.6601907.CrossRefPubMedPubMedCentral
2.
Zurück zum Zitat Herring AH, Ibrahim JG: Likelihood-based methods for missing covariates in the Cox proportional hazards model. Journal of the American Statistical Association. 2001, 96 (453): 292-302. 10.1198/016214501750332866.CrossRef Herring AH, Ibrahim JG: Likelihood-based methods for missing covariates in the Cox proportional hazards model. Journal of the American Statistical Association. 2001, 96 (453): 292-302. 10.1198/016214501750332866.CrossRef
3.
Zurück zum Zitat Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: John Wiley and SonsCrossRef Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: John Wiley and SonsCrossRef
4.
Zurück zum Zitat Schafer JL: Analysis of Incomplete Multivariate Data. 1997, New York: Chapman and HallCrossRef Schafer JL: Analysis of Incomplete Multivariate Data. 1997, New York: Chapman and HallCrossRef
5.
Zurück zum Zitat van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine. 1999, 18 (6): 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R.CrossRefPubMed van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine. 1999, 18 (6): 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R.CrossRefPubMed
6.
Zurück zum Zitat Marshall A, Altman D, Royston P, Holder R: Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol. 2010, 10 (1): 7-10.1186/1471-2288-10-7.CrossRefPubMedPubMedCentral Marshall A, Altman D, Royston P, Holder R: Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol. 2010, 10 (1): 7-10.1186/1471-2288-10-7.CrossRefPubMedPubMedCentral
7.
Zurück zum Zitat Murphy SP, Perera T: Successes and failures in UK/US development of simulation. Simulation Practice and Theory. 2002, 9: 333-348. 10.1016/S0928-4869(01)00048-9.CrossRef Murphy SP, Perera T: Successes and failures in UK/US development of simulation. Simulation Practice and Theory. 2002, 9: 333-348. 10.1016/S0928-4869(01)00048-9.CrossRef
8.
Zurück zum Zitat Schafer J, Ezzati-Rice T, Johnson W, Khare M, Little R, Rubin D: The NHANES III multiple imputation project. Proceedings of the Survey Research Methods Section of the American Statistical Association. Chicago, Illnois. 1996, 28-37. Schafer J, Ezzati-Rice T, Johnson W, Khare M, Little R, Rubin D: The NHANES III multiple imputation project. Proceedings of the Survey Research Methods Section of the American Statistical Association. Chicago, Illnois. 1996, 28-37.
9.
Zurück zum Zitat Schafer JL, Olsen MK: Modelling and imputation of semicontinuous survey variables. 2000, The Methodology Center, Penn State University, USA Schafer JL, Olsen MK: Modelling and imputation of semicontinuous survey variables. 2000, The Methodology Center, Penn State University, USA
10.
Zurück zum Zitat Ezzati-Rice T, Johnson W, Khare M, Little R, Rubin D, Schafer J: A simulation study to evaluate the performance of model-based multiple imputations in NCHS health examination surveys. Proceedings of the Bureau of the Census Annual Research Conference. Washington, DC. 1995, 257-266. Ezzati-Rice T, Johnson W, Khare M, Little R, Rubin D, Schafer J: A simulation study to evaluate the performance of model-based multiple imputations in NCHS health examination surveys. Proceedings of the Bureau of the Census Annual Research Conference. Washington, DC. 1995, 257-266.
11.
Zurück zum Zitat Concato J, Peduzzi P, Holford TR, Feinstein AR: Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. Journal of Clinical Epidemiology. 1995, 48 (12): 1495-1501. 10.1016/0895-4356(95)00510-2.CrossRefPubMed Concato J, Peduzzi P, Holford TR, Feinstein AR: Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. Journal of Clinical Epidemiology. 1995, 48 (12): 1495-1501. 10.1016/0895-4356(95)00510-2.CrossRefPubMed
12.
Zurück zum Zitat Efron B, Tibshirani RJ: An introduction to the bootstrap. 1993, London: Chapman and Hall/CRCCrossRef Efron B, Tibshirani RJ: An introduction to the bootstrap. 1993, London: Chapman and Hall/CRCCrossRef
13.
Zurück zum Zitat Xia Z: Sampling with and without replacement. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley & Sons, 3944-3945. Xia Z: Sampling with and without replacement. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley & Sons, 3944-3945.
14.
Zurück zum Zitat Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, Petticrew M, Altman DG: Empirical evaluation of the ability of case-mix adjustment methodologies to control for selection bias. Health Technology Assessment. 2003, 7 (27): 63-86.CrossRef Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, Petticrew M, Altman DG: Empirical evaluation of the ability of case-mix adjustment methodologies to control for selection bias. Health Technology Assessment. 2003, 7 (27): 63-86.CrossRef
15.
Zurück zum Zitat Gray RG, Kerr DJ, McConkey CC, Williams NS, Hills RK, On behalf of the Quasar Collaborative group: Comparison of flurouracil with additional levamisole, higher-dose folinic acid, or both, as adjuvant chemotherapy for colorectal cancer: a randomised trial. Lancet. 2000, 355 (9215): 1588-1596. 10.1016/S0140-6736(00)02214-5.CrossRef Gray RG, Kerr DJ, McConkey CC, Williams NS, Hills RK, On behalf of the Quasar Collaborative group: Comparison of flurouracil with additional levamisole, higher-dose folinic acid, or both, as adjuvant chemotherapy for colorectal cancer: a randomised trial. Lancet. 2000, 355 (9215): 1588-1596. 10.1016/S0140-6736(00)02214-5.CrossRef
16.
Zurück zum Zitat Quasar Collaborative Group, Gray R, Barnwell J, McConkey C, Hills R, Williams N, Kerr D: Adjuvant chemotherapy versus observation in patients with colorectal cancer: a randomised study. Lancet. 2007, 370 (9604): 2020-2029. 10.1016/S0140-6736(07)61866-2.CrossRef Quasar Collaborative Group, Gray R, Barnwell J, McConkey C, Hills R, Williams N, Kerr D: Adjuvant chemotherapy versus observation in patients with colorectal cancer: a randomised study. Lancet. 2007, 370 (9604): 2020-2029. 10.1016/S0140-6736(07)61866-2.CrossRef
17.
Zurück zum Zitat Burton A, Altman DG, Royston P, Holder RL: The design of simulation studies in medical statistics. Statistics in Medicine. 2006, 25 (24): 4279-4292. 10.1002/sim.2673.CrossRefPubMed Burton A, Altman DG, Royston P, Holder RL: The design of simulation studies in medical statistics. Statistics in Medicine. 2006, 25 (24): 4279-4292. 10.1002/sim.2673.CrossRefPubMed
18.
Zurück zum Zitat Clark TG, Stewart ME, Altman DG, Gabra H, Smyth JF: A prognostic model for ovarian cancer. British Journal of Cancer. 2001, 85 (7): 944-952. 10.1054/bjoc.2001.2030.CrossRefPubMedPubMedCentral Clark TG, Stewart ME, Altman DG, Gabra H, Smyth JF: A prognostic model for ovarian cancer. British Journal of Cancer. 2001, 85 (7): 944-952. 10.1054/bjoc.2001.2030.CrossRefPubMedPubMedCentral
19.
Zurück zum Zitat Little RJA, Rubin DB: Statistical Analysis with Missing Data. 2002, New York: John Wiley and Sons, Second Little RJA, Rubin DB: Statistical Analysis with Missing Data. 2002, New York: John Wiley and Sons, Second
20.
Zurück zum Zitat van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB: Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation. 2006, 76 (12): 1049-1064. 10.1080/10629360600810434.CrossRef van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB: Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation. 2006, 76 (12): 1049-1064. 10.1080/10629360600810434.CrossRef
21.
Zurück zum Zitat Harrell FE: Hmisc: Harrell Miscellaneous library for R statistical software. R package 2. 2004, 2-3. Harrell FE: Hmisc: Harrell Miscellaneous library for R statistical software. R package 2. 2004, 2-3.
22.
Zurück zum Zitat Rubin DB: Multiple Imputation for Nonresponse in Surveys. 2004, New York: John Wiley and Sons Rubin DB: Multiple Imputation for Nonresponse in Surveys. 2004, New York: John Wiley and Sons
23.
Zurück zum Zitat Royston P, Altman DG: Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Journal of the Royal Statistical Society Series C-Applied Statistics. 1994, 43 (3): 429-467. Royston P, Altman DG: Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Journal of the Royal Statistical Society Series C-Applied Statistics. 1994, 43 (3): 429-467.
24.
Zurück zum Zitat Ambler G, Brenner A: mfp: Multiple Fractional Polynomials library. R package version 1.2.2. 2004 Ambler G, Brenner A: mfp: Multiple Fractional Polynomials library. R package version 1.2.2. 2004
25.
Zurück zum Zitat Harrell FE: Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. 2001, New York: Springer-Verlag Harrell FE: Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. 2001, New York: Springer-Verlag
26.
Zurück zum Zitat Royston P, Sauerbrei W: A new measure of prognostic separation in survival data. Statistics in Medicine. 2004, 23 (5): 723-748. 10.1002/sim.1621.CrossRefPubMed Royston P, Sauerbrei W: A new measure of prognostic separation in survival data. Statistics in Medicine. 2004, 23 (5): 723-748. 10.1002/sim.1621.CrossRefPubMed
27.
Zurück zum Zitat Schafer JL, Graham JW: Missing data: our view of the state of the art. Psychological Methods. 2002, 7 (2): 147-177. 10.1037/1082-989X.7.2.147.CrossRefPubMed Schafer JL, Graham JW: Missing data: our view of the state of the art. Psychological Methods. 2002, 7 (2): 147-177. 10.1037/1082-989X.7.2.147.CrossRefPubMed
28.
Zurück zum Zitat Marshall A, Altman D, Holder R, Royston P: Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Medical Research Methodology. 2009, 9 (1): 57-10.1186/1471-2288-9-57.CrossRefPubMedPubMedCentral Marshall A, Altman D, Holder R, Royston P: Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Medical Research Methodology. 2009, 9 (1): 57-10.1186/1471-2288-9-57.CrossRefPubMedPubMedCentral
29.
Zurück zum Zitat Clark TG, Altman DG: Developing a prognostic model in the presence of missing data. an ovarian cancer case study. Journal of Clinical Epidemiology. 2003, 56 (1): 28-37. 10.1016/S0895-4356(02)00539-5.CrossRefPubMed Clark TG, Altman DG: Developing a prognostic model in the presence of missing data. an ovarian cancer case study. Journal of Clinical Epidemiology. 2003, 56 (1): 28-37. 10.1016/S0895-4356(02)00539-5.CrossRefPubMed
30.
Zurück zum Zitat Barzi F, Woodward M: Imputations of missing values in practice: Results from imputations of serum cholesterol in 28 cohort studies. American Journal of Epidemiology. 2004, 160 (1): 34-45. 10.1093/aje/kwh175.CrossRefPubMed Barzi F, Woodward M: Imputations of missing values in practice: Results from imputations of serum cholesterol in 28 cohort studies. American Journal of Epidemiology. 2004, 160 (1): 34-45. 10.1093/aje/kwh175.CrossRefPubMed
31.
Zurück zum Zitat Little RJ: Missing data. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley and Sons, 2622-2635. Little RJ: Missing data. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley and Sons, 2622-2635.
32.
Zurück zum Zitat Vach W, Blettner M: Missing data in epidemiologic studies. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley & Sons, 2641-2654. Vach W, Blettner M: Missing data in epidemiologic studies. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley & Sons, 2641-2654.
33.
Zurück zum Zitat Demissie S, LaValley MP, Horton NJ, Glynn RJ, Cupples LA: Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model. Statistics in Medicine. 2003, 22 (4): 545-557. 10.1002/sim.1340.CrossRefPubMed Demissie S, LaValley MP, Horton NJ, Glynn RJ, Cupples LA: Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model. Statistics in Medicine. 2003, 22 (4): 545-557. 10.1002/sim.1340.CrossRefPubMed
34.
Zurück zum Zitat Schenker N, Taylor JMG: Partially parametric techniques for multiple imputation. Computational Statistics & Data Analysis. 1996, 22 (4): 425-446.CrossRef Schenker N, Taylor JMG: Partially parametric techniques for multiple imputation. Computational Statistics & Data Analysis. 1996, 22 (4): 425-446.CrossRef
35.
Zurück zum Zitat Durrant GB: Imputation methods for handling item-nonresponse in the social sciences: a methodological review. 2005, Southampton: University of Southampton Durrant GB: Imputation methods for handling item-nonresponse in the social sciences: a methodological review. 2005, Southampton: University of Southampton
36.
Zurück zum Zitat Yu LM, Burton A, Rivero-Arias O: Evaluation of software for multiple imputation of semi-continuous data. Statistical Methods in Medical Research. 2007, 16 (3): 243-258. 10.1177/0962280206074464.CrossRefPubMed Yu LM, Burton A, Rivero-Arias O: Evaluation of software for multiple imputation of semi-continuous data. Statistical Methods in Medical Research. 2007, 16 (3): 243-258. 10.1177/0962280206074464.CrossRefPubMed
37.
Zurück zum Zitat Kenward MG, Carpenter J: Multiple imputation: current perspectives. Statistical Methods in Medical Research. 2007, 16 (3): 199-218. 10.1177/0962280206075304.CrossRefPubMed Kenward MG, Carpenter J: Multiple imputation: current perspectives. Statistical Methods in Medical Research. 2007, 16 (3): 199-218. 10.1177/0962280206075304.CrossRefPubMed
38.
Zurück zum Zitat Meng XL: Multiple-imputation inferences with uncongenial sources of input. Statistical Science. 1994, 9 (4): 538-558. Meng XL: Multiple-imputation inferences with uncongenial sources of input. Statistical Science. 1994, 9 (4): 538-558.
39.
40.
Zurück zum Zitat van Buuren S, Oudshoorn CGM: mice: Multivariate Imputation by Chained Equations library. R package version 1.13.1. 2005 van Buuren S, Oudshoorn CGM: mice: Multivariate Imputation by Chained Equations library. R package version 1.13.1. 2005
Metadaten
Titel
Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
verfasst von
Andrea Marshall
Douglas G Altman
Roger L Holder
Publikationsdatum
01.12.2010
Verlag
BioMed Central
Erschienen in
BMC Medical Research Methodology / Ausgabe 1/2010
Elektronische ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-10-112

Weitere Artikel der Ausgabe 1/2010

BMC Medical Research Methodology 1/2010 Zur Ausgabe