Skip to main content
Erschienen in: International Journal of Diabetes in Developing Countries 4/2016

30.04.2016 | Original Article

Impact of selected pre-processing techniques on prediction of risk of early readmission for diabetic patients in India

verfasst von: Reena Duggal, Suren Shukla, Sarika Chandra, Balvinder Shukla, Sunil Kumar Khatri

Erschienen in: International Journal of Diabetes in Developing Countries | Ausgabe 4/2016

Einloggen, um Zugang zu erhalten

Abstract

Diabetes is associated with increased risk of hospital readmission. Predicting risk of readmission of diabetic patients can facilitate implementing appropriate plans to prevent these readmissions. But the real-world medical data is noisy, inconsistent, and incomplete. So before building the prediction model, it is essential to pre-process the data efficiently and make it appropriate for predictive modelling. The objective of this study is to assess the impact of selected pre-processing techniques on the prediction of risk of 30-day readmission among patients with diabetes in India. De-identified electronic medical records data was used from a reputed hospital in the National Capital Region in India and included diabetes patients ≥18 years old discharged from hospital in 2012 to 2015 (n = 9381). This paper focused on data pre-processing steps to improve readmission prediction outcomes. The impact of different pre-processing choices including feature selection, missing value imputation and data balancing on the classifier performance of logistic regression, Naïve Bayes, and decision tree was assessed on various performance metrics such as area under curve, precision, recall, and accuracy. This comprehensive experimental study, first time done from Indian healthcare perspective, offered empirical evidence that most proposed models with pre-processing techniques significantly outperform the baseline methods (without any pre-processing) with respect to selected evaluation criteria. Area under curve (AUC) was highly increased with the use of oversampling technique as data is skewed on class label Readmission. Recall was the biggest gainer with range increasing from 0.02–0.23 to 0.78–0.85, and there was also an increase in AUC from range 0.56–0.68 to 0.83–0.86 by using pre-processing approach. Data pre-processing has a significant effect on hospital readmission predictive accuracy for patients with diabetes, with certain schemes proving inferior to competitive approaches. In addition, it is found that the impact of pre-processing schemes varies by technique, signifying formulation of different best practices to aid better results of a specific technique.
Literatur
1.
Zurück zum Zitat Dungan KM. The effect of diabetes on hospital readmissions. J Diabet Sci Technol. 2012;6(5):1045–52.CrossRef Dungan KM. The effect of diabetes on hospital readmissions. J Diabet Sci Technol. 2012;6(5):1045–52.CrossRef
3.
Zurück zum Zitat Silverstein MD, Qin H, Mercer SQ, Fong J, Haydar Z. Risk factors for 30-day hospital readmission in patients? 65 years of age. In Baylor University Medical Center. Proceedings 2008; 21 Suppl 4:363. Baylor University Medical Center. Silverstein MD, Qin H, Mercer SQ, Fong J, Haydar Z. Risk factors for 30-day hospital readmission in patients? 65 years of age. In Baylor University Medical Center. Proceedings 2008; 21 Suppl 4:363. Baylor University Medical Center.
4.
Zurück zum Zitat Strack B, DeShazo JP, Gennings C, Olmo JL, Ventura S, Cios KJ, Clore JN. Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed Res Int. 2014;3:2014. Strack B, DeShazo JP, Gennings C, Olmo JL, Ventura S, Cios KJ, Clore JN. Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed Res Int. 2014;3:2014.
5.
Zurück zum Zitat Eby E, Hardwick C, Yu M, Gelwicks S, Deschamps K, Xie J, George T. Predictors of 30 day hospital readmission in patients with type 2 diabetes: a retrospective, case–control, database study. Curr Med Res Opin. 2015;31(1):107–14.CrossRefPubMed Eby E, Hardwick C, Yu M, Gelwicks S, Deschamps K, Xie J, George T. Predictors of 30 day hospital readmission in patients with type 2 diabetes: a retrospective, case–control, database study. Curr Med Res Opin. 2015;31(1):107–14.CrossRefPubMed
6.
Zurück zum Zitat Rubin DJ. Hospital readmission of patients with diabetes. Curr Diabet Rep. 2015;15(4):1–9.CrossRef Rubin DJ. Hospital readmission of patients with diabetes. Curr Diabet Rep. 2015;15(4):1–9.CrossRef
7.
Zurück zum Zitat Rubin DJ, Donnell-Jackson K, Jhingan R, Golden SH, Paranjape A. Early readmission among patients with diabetes: a qualitative assessment of contributing factors. J Diabet Complicat. 2014;28(6):869–73.CrossRef Rubin DJ, Donnell-Jackson K, Jhingan R, Golden SH, Paranjape A. Early readmission among patients with diabetes: a qualitative assessment of contributing factors. J Diabet Complicat. 2014;28(6):869–73.CrossRef
8.
Zurück zum Zitat Billings J, Dixon J, Mijanovich T, Wennberg D. Case finding for patients at risk of readmission to hospital: development of algorithm to identify high risk patients. BMJ. 2006;333(7563):327.CrossRefPubMedPubMedCentral Billings J, Dixon J, Mijanovich T, Wennberg D. Case finding for patients at risk of readmission to hospital: development of algorithm to identify high risk patients. BMJ. 2006;333(7563):327.CrossRefPubMedPubMedCentral
10.
Zurück zum Zitat Donnan PT, Dorward DW, Mutch B, Morris AD. Development and validation of a model for predicting emergency admissions over the next year (PEONY): a UK historical cohort study. Arch Int Med. 2008;168(13):1416–22.CrossRef Donnan PT, Dorward DW, Mutch B, Morris AD. Development and validation of a model for predicting emergency admissions over the next year (PEONY): a UK historical cohort study. Arch Int Med. 2008;168(13):1416–22.CrossRef
11.
Zurück zum Zitat van Walraven C, Wong J, Hawken S, Forster AJ. Comparing methods to calculate hospital-specific rates of early death or urgent readmission. Can Med Assoc J. 2012;184(15):E810–7.CrossRef van Walraven C, Wong J, Hawken S, Forster AJ. Comparing methods to calculate hospital-specific rates of early death or urgent readmission. Can Med Assoc J. 2012;184(15):E810–7.CrossRef
12.
Zurück zum Zitat Donzé J, Aujesky D, Williams D, Schnipper JL. Potentially avoidable 30-day hospital readmissions in medical patients: derivation and validation of a prediction model. JAMA Int Med. 2013;173(8):632–8.CrossRef Donzé J, Aujesky D, Williams D, Schnipper JL. Potentially avoidable 30-day hospital readmissions in medical patients: derivation and validation of a prediction model. JAMA Int Med. 2013;173(8):632–8.CrossRef
13.
Zurück zum Zitat van Walraven C, Dhalla IA, Bell C, Etchells E, Stiell IG, Zarnke K, Austin PC, Forster AJ. Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. Can Med Assoc J. 2010;182(6):551–7.CrossRef van Walraven C, Dhalla IA, Bell C, Etchells E, Stiell IG, Zarnke K, Austin PC, Forster AJ. Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. Can Med Assoc J. 2010;182(6):551–7.CrossRef
14.
Zurück zum Zitat Billings J, Blunt I, Steventon A, Georghiou T, Lewis G, Bardsley M. Development of a predictive model to identify inpatients at risk of re-admission within 30 days of discharge (PARR-30). BMJ open. 2012;2(4):e001667.CrossRefPubMedPubMedCentral Billings J, Blunt I, Steventon A, Georghiou T, Lewis G, Bardsley M. Development of a predictive model to identify inpatients at risk of re-admission within 30 days of discharge (PARR-30). BMJ open. 2012;2(4):e001667.CrossRefPubMedPubMedCentral
15.
Zurück zum Zitat AbdelRahman SE, Zhang M, Bray BE, Kawamoto K. A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study. BMC Med Inform Decis Making. 2014;14(1):1.CrossRef AbdelRahman SE, Zhang M, Bray BE, Kawamoto K. A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study. BMC Med Inform Decis Making. 2014;14(1):1.CrossRef
16.
Zurück zum Zitat Meadem N, Verbiest N, Zolfaghar K, Agarwal J, Chin SC, Roy SB. Exploring preprocessing techniques for prediction of risk of readmission for congestive heart failure patients. In Data mining and healthcare (DMH), at International Conference on Knowledge Discovery and Data Mining (KDD) 2013. Meadem N, Verbiest N, Zolfaghar K, Agarwal J, Chin SC, Roy SB. Exploring preprocessing techniques for prediction of risk of readmission for congestive heart failure patients. In Data mining and healthcare (DMH), at International Conference on Knowledge Discovery and Data Mining (KDD) 2013.
17.
Zurück zum Zitat Duggal R, Khatri SK, Shukla B. Improving patient matching: single patient view for clinical decision support using Big Data analytics. In Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), 2015 4th International Conference on 2015 Sep 2 (pp. 1–6). IEEE. Duggal R, Khatri SK, Shukla B. Improving patient matching: single patient view for clinical decision support using Big Data analytics. In Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), 2015 4th International Conference on 2015 Sep 2 (pp. 1–6). IEEE.
18.
Zurück zum Zitat Duggal, Reena, Shukla, B. & Khatri, S. K. Big Data Analytics in Indian healthcare system—opportunities and challenges, National Conference on Computing, Communication and Information Processing 2015 (NCCCIP-2015), ISBN: 978–93–84935-27-6, (DOI: NCCIP2015/NERIST/02/03–05-2015/CP28). Duggal, Reena, Shukla, B. & Khatri, S. K. Big Data Analytics in Indian healthcare system—opportunities and challenges, National Conference on Computing, Communication and Information Processing 2015 (NCCCIP-2015), ISBN: 978–93–84935-27-6, (DOI: NCCIP2015/NERIST/02/03–05-2015/CP28).
19.
Zurück zum Zitat Chen JY, Ma Q, Chen H, Yermilov I. New bundled world: quality of care and readmission in diabetes patients. J Diabet Sci Technol. 2012;6(3):563–71.CrossRef Chen JY, Ma Q, Chen H, Yermilov I. New bundled world: quality of care and readmission in diabetes patients. J Diabet Sci Technol. 2012;6(3):563–71.CrossRef
20.
Zurück zum Zitat Radovanovic S, Vukicevic M, Kovacevic A, Stiglic G, Obradovic Z. Domain knowledge based hierarchical feature selection for 30-day hospital readmission prediction. In Artificial intelligence in medicine. Springer International Publishing; 2015 pp. 96–100. Radovanovic S, Vukicevic M, Kovacevic A, Stiglic G, Obradovic Z. Domain knowledge based hierarchical feature selection for 30-day hospital readmission prediction. In Artificial intelligence in medicine. Springer International Publishing; 2015 pp. 96–100.
21.
Zurück zum Zitat Hosseinzadeh A, Izadi M, Verma A, Precup D, Buckeridge D. Assessing the predictability of hospital readmission using machine learning. In Twenty-Fifth IAAI Conference; 2013. Hosseinzadeh A, Izadi M, Verma A, Precup D, Buckeridge D. Assessing the predictability of hospital readmission using machine learning. In Twenty-Fifth IAAI Conference; 2013.
22.
Zurück zum Zitat Shams I, Ajorlou S, Yang K. A predictive analytics approach to reducing 30-day avoidable readmissions among patients with heart failure, acute myocardial infarction, pneumonia, or COPD. Health Care Manag Sci. 2015;18(1):19–34.CrossRefPubMed Shams I, Ajorlou S, Yang K. A predictive analytics approach to reducing 30-day avoidable readmissions among patients with heart failure, acute myocardial infarction, pneumonia, or COPD. Health Care Manag Sci. 2015;18(1):19–34.CrossRefPubMed
23.
Zurück zum Zitat Zolfaghar K, Verbiest N, Agarwal J, Meadem N, Chin SC, Roy SB, Teredesai A, Hazel D, Amoroso P, Reed L. Predicting risk-of-readmission for congestive heart failure patients: a multi-layer approach. arXiv preprint arXiv:1306.2094. 2013. Zolfaghar K, Verbiest N, Agarwal J, Meadem N, Chin SC, Roy SB, Teredesai A, Hazel D, Amoroso P, Reed L. Predicting risk-of-readmission for congestive heart failure patients: a multi-layer approach. arXiv preprint arXiv:1306.2094. 2013.
24.
Zurück zum Zitat Braga P, Portela F, Santos MF, Rua F. Data mining models to predict patient’s readmission in intensive care units. Braga P, Portela F, Santos MF, Rua F. Data mining models to predict patient’s readmission in intensive care units.
25.
Zurück zum Zitat Vukicevic M, Radovanovic S, Kovacevic A, Stiglic G, Obradovic Z. Improving hospital readmission prediction using domain knowledge based virtual examples. In Knowledge management in organizations Springer International Publishing; 2015 pp. 695–706. Vukicevic M, Radovanovic S, Kovacevic A, Stiglic G, Obradovic Z. Improving hospital readmission prediction using domain knowledge based virtual examples. In Knowledge management in organizations Springer International Publishing; 2015 pp. 695–706.
26.
Zurück zum Zitat Han J, Kamber M. Data mining. 2nd ed. Amsterdam: Elsevier; 2006. p. 72–85 .310-317 Han J, Kamber M. Data mining. 2nd ed. Amsterdam: Elsevier; 2006. p. 72–85 .310-317
27.
Zurück zum Zitat Hall MA, Smith LA. Feature subset selection: a correlation based filter approach. In International Conference on Neural Information Processing and Intelligent Information Systems; 1997 pp. 855–858. Hall MA, Smith LA. Feature subset selection: a correlation based filter approach. In International Conference on Neural Information Processing and Intelligent Information Systems; 1997 pp. 855–858.
28.
Zurück zum Zitat Peng L, Lei L. A review of missing data treatment methods. Intell Inf Manag Syst Technol. 2005;1(3):412–9. Peng L, Lei L. A review of missing data treatment methods. Intell Inf Manag Syst Technol. 2005;1(3):412–9.
29.
Zurück zum Zitat Su X, Khoshgoftaar TM, Greiner R. Using imputation techniques to help learn accurate classifiers. In Tools with artificial intelligence, 2008. ICTAI’08. 20th IEEE International Conference on 2008; 1:437–444. IEEE. Su X, Khoshgoftaar TM, Greiner R. Using imputation techniques to help learn accurate classifiers. In Tools with artificial intelligence, 2008. ICTAI’08. 20th IEEE International Conference on 2008; 1:437–444. IEEE.
30.
Zurück zum Zitat Hosmer Jr DW, Lemeshow S. Applied logistic regression. 2nd ed. John Wiley & Sons; 2004. Hosmer Jr DW, Lemeshow S. Applied logistic regression. 2nd ed. John Wiley & Sons; 2004.
32.
Zurück zum Zitat Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explor Newsletter. 2009;11(1):10–8.CrossRef Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explor Newsletter. 2009;11(1):10–8.CrossRef
33.
Zurück zum Zitat Zolfaghar K, Meadem N, Teredesai A, Roy SB, Chin SC, Muckian B. Big data solutions for predicting risk-of-readmission for congestive heart failure patients. InBig Data, 2013 I.E. International Conference on; 2013pp. 64–71. IEEE. Zolfaghar K, Meadem N, Teredesai A, Roy SB, Chin SC, Muckian B. Big data solutions for predicting risk-of-readmission for congestive heart failure patients. InBig Data, 2013 I.E. International Conference on; 2013pp. 64–71. IEEE.
34.
Zurück zum Zitat Chin SC, Zolfaghar K, Roy SB, Teredesai A, Amoroso P. Divide-n-Discover discretization based data exploration framework for healthcare analytics. Healthinf 2014; 329-333. Chin SC, Zolfaghar K, Roy SB, Teredesai A, Amoroso P. Divide-n-Discover discretization based data exploration framework for healthcare analytics. Healthinf 2014; 329-333.
Metadaten
Titel
Impact of selected pre-processing techniques on prediction of risk of early readmission for diabetic patients in India
verfasst von
Reena Duggal
Suren Shukla
Sarika Chandra
Balvinder Shukla
Sunil Kumar Khatri
Publikationsdatum
30.04.2016
Verlag
Springer India
Erschienen in
International Journal of Diabetes in Developing Countries / Ausgabe 4/2016
Print ISSN: 0973-3930
Elektronische ISSN: 1998-3832
DOI
https://doi.org/10.1007/s13410-016-0495-4

Weitere Artikel der Ausgabe 4/2016

International Journal of Diabetes in Developing Countries 4/2016 Zur Ausgabe