Skip to main content
Erschienen in: Journal of Medical Systems 1/2019

01.01.2019 | Systems-Level Quality Improvement

A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery

verfasst von: H. Benhar, A. Idri, J. L. Fernández-Alemán

Erschienen in: Journal of Medical Systems | Ausgabe 1/2019

Einloggen, um Zugang zu erhalten

Abstract

The increasing amount of data produced by various biomedical and healthcare systems has led to a need for methodologies related to knowledge data discovery. Data mining (DM) offers a set of powerful techniques that allow the identification and extraction of relevant information from medical datasets, thus enabling doctors and patients to greatly benefit from DM, particularly in the case of diseases with high mortality and morbidity rates, such as heart disease (HD). Nonetheless, the use of raw medical data implies several challenges, such as missing data, noise, redundancy and high dimensionality, which make the extraction of useful and relevant information difficult and challenging. Intensive research has, therefore, recently begun in order to prepare raw healthcare data before knowledge extraction. In any knowledge data discovery (KDD) process, data preparation is the step prior to DM that deals with data imperfectness in order to improve its quality so as to satisfy the requirements and improve the performances of DM techniques. The objective of this paper is to perform a systematic mapping study (SMS) on data preparation for KDD in cardiology so as to provide an overview of the quantity and type of research carried out in this respect. The SMS consisted of a set of 58 selected papers published in the period January 2000 and December 2017. The selected studies were analyzed according to six criteria: year and channel of publication, preparation task, medical task, DM objective, research type and empirical type. The results show that a high amount of data preparation research was carried out in order to improve the performance of DM-based decision support systems in cardiology. Researchers were mainly interested in the data reduction preparation task and particularly in feature selection. Moreover, the majority of the selected studies focused on classification for the diagnosis of HD. Two main research types were identified in the selected studies: solution proposal and evaluation research, and the most frequently used empirical type was that of historical-based evaluation.
Anhänge
Nur mit Berechtigung zugänglich
Literatur
17.
Zurück zum Zitat Benhar H., Idri A., Fernández-Alemán J.L. (2018) Data preprocessing for decision making in medical informatics: potential and analysis. In: Rocha Á., Adeli H., Reis L., Costanzo S. (eds) Trends and advances in information systems and technologies. WorldCIST'18 2018. Advances in intelligent systems and computing, vol 746. Springer, Cham. Benhar H., Idri A., Fernández-Alemán J.L. (2018) Data preprocessing for decision making in medical informatics: potential and analysis. In: Rocha Á., Adeli H., Reis L., Costanzo S. (eds) Trends and advances in information systems and technologies. WorldCIST'18 2018. Advances in intelligent systems and computing, vol 746. Springer, Cham.
20.
Zurück zum Zitat Zhang, Y., Kambhampati, C., Davis, D. N., Goode, K., Cleland, J. G. F., A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In: Proc. - 2012 9th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2012, pp. 2840–2844, 2012. https://doi.org/10.1109/FSKD.2012.6233805. Zhang, Y., Kambhampati, C., Davis, D. N., Goode, K., Cleland, J. G. F., A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In: Proc. - 2012 9th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2012, pp. 2840–2844, 2012. https://​doi.​org/​10.​1109/​FSKD.​2012.​6233805.
32.
Zurück zum Zitat Condori-Fernandez, N., Daneva, M., Sikkel, K., Wieringa, R., Dieste, O., Pastor, O., A Systematic mapping study on empirical evaluation of software requirements specifications techniques. In: 2009 3rd Int. Symp. Empir. Softw. Eng. Meas., pp. 502–505, 2009. https://doi.org/10.1109/ESEM.2009.5314232. Condori-Fernandez, N., Daneva, M., Sikkel, K., Wieringa, R., Dieste, O., Pastor, O., A Systematic mapping study on empirical evaluation of software requirements specifications techniques. In: 2009 3rd Int. Symp. Empir. Softw. Eng. Meas., pp. 502–505, 2009. https://​doi.​org/​10.​1109/​ESEM.​2009.​5314232.
36.
43.
Zurück zum Zitat Anbarasi, M., Anupriya, E., and Iyengar, N. C. S. N., Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int. J. Eng. Sci. Technol. 2:5370–5376, 2010. Anbarasi, M., Anupriya, E., and Iyengar, N. C. S. N., Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int. J. Eng. Sci. Technol. 2:5370–5376, 2010.
44.
Zurück zum Zitat Peter, T. J., and Somasundaram, K., Study and development of novel feature selection framework for heart disease prediction. IJSRP 2:1–7, 2012. Peter, T. J., and Somasundaram, K., Study and development of novel feature selection framework for heart disease prediction. IJSRP 2:1–7, 2012.
57.
58.
Zurück zum Zitat Jabbar, M. A., Deekshatulu, B. L., and Chandra, P., Classification of heart disease using artificial neural network and feature subset selection. GJCST 13:5–14, 2013. Jabbar, M. A., Deekshatulu, B. L., and Chandra, P., Classification of heart disease using artificial neural network and feature subset selection. GJCST 13:5–14, 2013.
60.
Zurück zum Zitat Bhatia, S., Prakash, P., Pillai, G. N., SVM based decision support system for heart disease classification with integer-coded genetic algorithm to select critical features. In: Proc. World Congr. Eng. Comput. Sci., 2008. Bhatia, S., Prakash, P., Pillai, G. N., SVM based decision support system for heart disease classification with integer-coded genetic algorithm to select critical features. In: Proc. World Congr. Eng. Comput. Sci., 2008.
61.
Zurück zum Zitat Millet-Roig, J., Ventura-Galiano, R., Chorro-Gasco, F. J., Cebrian, A., Support vector machine for arrhythmia discrimination with wavelet transform-based feature selection, in: Comput. Cardiol. 2000. vol. 27 (Cat. 00CH37163), IEEE, pp. 407–410, 2000. https://doi.org/10.1109/CIC.2000.898543. Millet-Roig, J., Ventura-Galiano, R., Chorro-Gasco, F. J., Cebrian, A., Support vector machine for arrhythmia discrimination with wavelet transform-based feature selection, in: Comput. Cardiol. 2000. vol. 27 (Cat. 00CH37163), IEEE, pp. 407–410, 2000. https://​doi.​org/​10.​1109/​CIC.​2000.​898543.​
65.
Zurück zum Zitat Weston, J., Watkins, C., Support vector machines for multi-class pattern recognition. In ESANN, 1999 Weston, J., Watkins, C., Support vector machines for multi-class pattern recognition. In ESANN, 1999
66.
Zurück zum Zitat Zhu, X., Zhang, S., Jin, Z., Zhang, Z., and Xu, Z., Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1):110–121, 2011.CrossRef Zhu, X., Zhang, S., Jin, Z., Zhang, Z., and Xu, Z., Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1):110–121, 2011.CrossRef
81.
Zurück zum Zitat Xu, S., Zhang, Z., Wang, D., Hu, J., Duan, X., Zhu, T., Cardiovascular risk prediction method based on CFS subset evaluation and random forest classification framework. In: 2017 IEEE 2nd Int. Conf. Big Data Anal. (ICBDA), IEEE, pp. 228–232, 2017. https://doi.org/10.1109/ICBDA.2017.8078813. Xu, S., Zhang, Z., Wang, D., Hu, J., Duan, X., Zhu, T., Cardiovascular risk prediction method based on CFS subset evaluation and random forest classification framework. In: 2017 IEEE 2nd Int. Conf. Big Data Anal. (ICBDA), IEEE, pp. 228–232, 2017. https://​doi.​org/​10.​1109/​ICBDA.​2017.​8078813.
89.
Zurück zum Zitat Bowyer, K. W., Mentoring advice on “Conferences versus journals” for CSE Faculty 2012, pp. 1–9, 2012. Bowyer, K. W., Mentoring advice on “Conferences versus journals” for CSE Faculty 2012, pp. 1–9, 2012.
93.
Zurück zum Zitat Visalakshi, N. K., and Thangavel, K., Impact of normalization in distributed K-means clustering. Int. J. Soft Comput. 4:168–172, 2009. Visalakshi, N. K., and Thangavel, K., Impact of normalization in distributed K-means clustering. Int. J. Soft Comput. 4:168–172, 2009.
97.
Zurück zum Zitat El Idrissi, T., Idri, A., Bakkoury, Z., Systematic map and review of predictive techniques in diabetes self- management. Int. J. Inf. Manag., In Press, 2018. El Idrissi, T., Idri, A., Bakkoury, Z., Systematic map and review of predictive techniques in diabetes self- management. Int. J. Inf. Manag., In Press, 2018.
Metadaten
Titel
A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery
verfasst von
H. Benhar
A. Idri
J. L. Fernández-Alemán
Publikationsdatum
01.01.2019
Verlag
Springer US
Erschienen in
Journal of Medical Systems / Ausgabe 1/2019
Print ISSN: 0148-5598
Elektronische ISSN: 1573-689X
DOI
https://doi.org/10.1007/s10916-018-1134-z

Weitere Artikel der Ausgabe 1/2019

Journal of Medical Systems 1/2019 Zur Ausgabe