Skip to main content
Erschienen in: Journal of Medical Systems 5/2014

01.05.2014 | Transactional Processing Systems

A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases

verfasst von: Nihat Yilmaz, Onur Inan, Mustafa Serter Uzer

Erschienen in: Journal of Medical Systems | Ausgabe 5/2014

Einloggen, um Zugang zu erhalten

Abstract

The most important factors that prevent pattern recognition from functioning rapidly and effectively are the noisy and inconsistent data in databases. This article presents a new data preparation method based on clustering algorithms for diagnosis of heart and diabetes diseases. In this method, a new modified K-means Algorithm is used for clustering based data preparation system for the elimination of noisy and inconsistent data and Support Vector Machines is used for classification. This newly developed approach was tested in the diagnosis of heart diseases and diabetes, which are prevalent within society and figure among the leading causes of death. The data sets used in the diagnosis of these diseases are the Statlog (Heart), the SPECT images and the Pima Indians Diabetes data sets obtained from the UCI database. The proposed system achieved 97.87 %, 98.18 %, 96.71 % classification success rates from these data sets. Classification accuracies for these data sets were obtained through using 10-fold cross-validation method. According to the results, the proposed method of performance is highly successful compared to other results attained, and seems very promising for pattern recognition applications.
Literatur
1.
Zurück zum Zitat Myatt. G. J., Making sense of data a practical guide to exploratory data analysis and data mining. John Wiley & Sons, 2007. Myatt. G. J., Making sense of data a practical guide to exploratory data analysis and data mining. John Wiley & Sons, 2007.
2.
Zurück zum Zitat Han, J., and Kamber, M., Data Mining Concepts and Techniques, (2nd ed.). Morgan Kauffmann Publishers, 2006. Han, J., and Kamber, M., Data Mining Concepts and Techniques, (2nd ed.). Morgan Kauffmann Publishers, 2006.
5.
Zurück zum Zitat Tang, W., and Khoshgoftaar, T. M., Noise identification with the k-means algorithm. Ictai 2004: 16th IEEE Internationalconference on Tools with Artificial Intelligence, Proceedings:373–378, 2004 Tang, W., and Khoshgoftaar, T. M., Noise identification with the k-means algorithm. Ictai 2004: 16th IEEE Internationalconference on Tools with Artificial Intelligence, Proceedings:373–378, 2004
6.
Zurück zum Zitat Zhang, B., Li, S. S., Wu, C. S., Gao, L. R., Zhang, W. J., and Peng, M., A neighbourhood-constrained k-means approach to classify very high spatial resolution hyperspectral imagery. Remote Sens Lett 4(2):161–170, 2013. doi:10.1080/2150704x.2012.713139.CrossRef Zhang, B., Li, S. S., Wu, C. S., Gao, L. R., Zhang, W. J., and Peng, M., A neighbourhood-constrained k-means approach to classify very high spatial resolution hyperspectral imagery. Remote Sens Lett 4(2):161–170, 2013. doi:10.​1080/​2150704x.​2012.​713139.CrossRef
9.
Zurück zum Zitat Zaman, S., and Karray, F., Features Selection using Fuzzy ESVDF for Data Dimensionality Reduction. Int. Conf. Comput. Eng. Technol. I:81–87, 2009. doi:10.1109/Iccet.2009.36. Zaman, S., and Karray, F., Features Selection using Fuzzy ESVDF for Data Dimensionality Reduction. Int. Conf. Comput. Eng. Technol. I:81–87, 2009. doi:10.​1109/​Iccet.​2009.​36.
11.
Zurück zum Zitat Duch, W., Adamczak, R., and Grabczewski, K., A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans. Neural Networks. 12(2):277–306, 2001.CrossRef Duch, W., Adamczak, R., and Grabczewski, K., A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans. Neural Networks. 12(2):277–306, 2001.CrossRef
12.
Zurück zum Zitat Sahan, S., Polat, K., Kodaz, H., and Gunes, S., The medical applications of attribute weighted artificial immune system (AWAIS): Diagnosis of Heart and Diabetes Diseases. Artif. Immune Syst., Proc. 3627:456–468, 2005.CrossRef Sahan, S., Polat, K., Kodaz, H., and Gunes, S., The medical applications of attribute weighted artificial immune system (AWAIS): Diagnosis of Heart and Diabetes Diseases. Artif. Immune Syst., Proc. 3627:456–468, 2005.CrossRef
15.
Zurück zum Zitat Ozsen, S., Gunes, S., Kara, S., and Latifoglu, F., Use of kernel functions in artificial immune systems for the nonlinear classification problems. IEEE T. Inf. Technol. B. 13(4):621–628, 2009. doi:10.1109/Titb.2009.2019637.CrossRef Ozsen, S., Gunes, S., Kara, S., and Latifoglu, F., Use of kernel functions in artificial immune systems for the nonlinear classification problems. IEEE T. Inf. Technol. B. 13(4):621–628, 2009. doi:10.​1109/​Titb.​2009.​2019637.CrossRef
16.
Zurück zum Zitat Sub bulakshmi, C. V., Deepa, S. N., and Malathi, N., Extreme learning machine for two category data classification. Paper presented at the IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), 2012 Sub bulakshmi, C. V., Deepa, S. N., and Malathi, N., Extreme learning machine for two category data classification. Paper presented at the IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), 2012
17.
Zurück zum Zitat Karabulut, E. M., and Ibrikci, T., Effective Diagnosis of Coronary Artery Disease Using The Rotation Forest Ensemble Method. J. Med. Syst. 36(5):3011–3018, 2012.CrossRef Karabulut, E. M., and Ibrikci, T., Effective Diagnosis of Coronary Artery Disease Using The Rotation Forest Ensemble Method. J. Med. Syst. 36(5):3011–3018, 2012.CrossRef
20.
Zurück zum Zitat Polat, K., and Gunes, S., An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit. Signal Proc. 17(4):702–710, 2007. doi:10.1016/j.dsp.2006.09.005.CrossRef Polat, K., and Gunes, S., An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit. Signal Proc. 17(4):702–710, 2007. doi:10.​1016/​j.​dsp.​2006.​09.​005.CrossRef
21.
Zurück zum Zitat Polat, K., Gunes, S., and Arslan, A., A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Syst. Appl. 34(1):482–487, 2008. doi:10.1016/j.eswa.2006.09.012.CrossRef Polat, K., Gunes, S., and Arslan, A., A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Syst. Appl. 34(1):482–487, 2008. doi:10.​1016/​j.​eswa.​2006.​09.​012.CrossRef
22.
Zurück zum Zitat Chikh, M. A., Saidi, M., and Settouti, N., Diagnosis of Diabetes Diseases Using an Artificial Immune Recognition System2 (AIRS2) with Fuzzy K-nearest Neighbor. J. Med. Syst. 36(5):2721–2729, 2012.CrossRef Chikh, M. A., Saidi, M., and Settouti, N., Diagnosis of Diabetes Diseases Using an Artificial Immune Recognition System2 (AIRS2) with Fuzzy K-nearest Neighbor. J. Med. Syst. 36(5):2721–2729, 2012.CrossRef
23.
Zurück zum Zitat Ahmad, F., Isa, N. A. M., Hussain, Z., and Osman, M. K., Intelligent medical disease diagnosis using improved hybrid genetic algorithm - multilayer perceptron network. J. Med. Syst. 37(2), 2013 Ahmad, F., Isa, N. A. M., Hussain, Z., and Osman, M. K., Intelligent medical disease diagnosis using improved hybrid genetic algorithm - multilayer perceptron network. J. Med. Syst. 37(2), 2013
24.
Zurück zum Zitat Ozcift, A., SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of parkinson disease. J. Med. Syst. 36(4):2141–2147, 2012.CrossRef Ozcift, A., SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of parkinson disease. J. Med. Syst. 36(4):2141–2147, 2012.CrossRef
25.
Zurück zum Zitat MacQueen, J. B., Some methods for classification and analysis of multivariate observations. Paper presented at the In Proceedings of 5th Berkeley symposium on mathematical statistics and probability, California, 1967 MacQueen, J. B., Some methods for classification and analysis of multivariate observations. Paper presented at the In Proceedings of 5th Berkeley symposium on mathematical statistics and probability, California, 1967
28.
Zurück zum Zitat Cortes, C., and Vapnik, V., Support-Vector Networks. Mach Learn. 20(3):273–297, 1995.MATH Cortes, C., and Vapnik, V., Support-Vector Networks. Mach Learn. 20(3):273–297, 1995.MATH
29.
Zurück zum Zitat Stehman, S. V., Selecting and interpreting measures of thematic classification accuracy. Remote Sens. Environ. 62(1):77–89, 1997.CrossRef Stehman, S. V., Selecting and interpreting measures of thematic classification accuracy. Remote Sens. Environ. 62(1):77–89, 1997.CrossRef
32.
Zurück zum Zitat Francois, D., Rossi, F., Wertz, V., and Verleysen, M., Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing. 70:1276–1288, 2007.CrossRef Francois, D., Rossi, F., Wertz, V., and Verleysen, M., Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing. 70:1276–1288, 2007.CrossRef
33.
Zurück zum Zitat Diamantidis, N. A., Karlis, D., and Giakoumakis, E. A., Unsupervised stratification of cross-validation for accuracy estimation. Artif. Intell. 116:1–16, 2000.CrossRefMATHMathSciNet Diamantidis, N. A., Karlis, D., and Giakoumakis, E. A., Unsupervised stratification of cross-validation for accuracy estimation. Artif. Intell. 116:1–16, 2000.CrossRefMATHMathSciNet
34.
Zurück zum Zitat Breiman, L., Friedman, J., Olshen, R., Stone, C., Classification and regression trees. Wadsworth & Boks/Cole Advanced Boks & Software, 1984 Breiman, L., Friedman, J., Olshen, R., Stone, C., Classification and regression trees. Wadsworth & Boks/Cole Advanced Boks & Software, 1984
35.
Zurück zum Zitat Kohavi, R., A study of cross validation and bootstrap for accuracy estimation and model selection. Paper presented at the The Fourteenth International Joint Conference on Artificial Intelligence, San Francisco, 1995 Kohavi, R., A study of cross validation and bootstrap for accuracy estimation and model selection. Paper presented at the The Fourteenth International Joint Conference on Artificial Intelligence, San Francisco, 1995
36.
Zurück zum Zitat Yao, X., and Liu, Y., A new evolutionary system for evolving artificial neural networks. IEEE Trans. Neural Networks 8(3):694–713, 1997.CrossRefMathSciNet Yao, X., and Liu, Y., A new evolutionary system for evolving artificial neural networks. IEEE Trans. Neural Networks 8(3):694–713, 1997.CrossRefMathSciNet
37.
Zurück zum Zitat Polat, K., Sahan, S., and Gunes, S., Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing. Expert Syst. Appl. 32(2):625–631, 2007. doi:10.1016/j.eswa.2006.01.027.CrossRef Polat, K., Sahan, S., and Gunes, S., Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing. Expert Syst. Appl. 32(2):625–631, 2007. doi:10.​1016/​j.​eswa.​2006.​01.​027.CrossRef
38.
Zurück zum Zitat Blake, C. L., M.C.J. (1998) UCI repository of machine learning databases. Blake, C. L., M.C.J. (1998) UCI repository of machine learning databases.
39.
Metadaten
Titel
A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases
verfasst von
Nihat Yilmaz
Onur Inan
Mustafa Serter Uzer
Publikationsdatum
01.05.2014
Verlag
Springer US
Erschienen in
Journal of Medical Systems / Ausgabe 5/2014
Print ISSN: 0148-5598
Elektronische ISSN: 1573-689X
DOI
https://doi.org/10.1007/s10916-014-0048-7

Weitere Artikel der Ausgabe 5/2014

Journal of Medical Systems 5/2014 Zur Ausgabe