Breast cancer diagnosis using GA feature selection and Rotation Forest

Aličković, Emina; Subasi, Abdulhamit

doi:10.1007/s00521-015-2103-9

Breast cancer diagnosis using GA feature selection and Rotation Forest

Original Article
Published: 18 November 2015

Volume 28, pages 753–763, (2017)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Emina Aličković¹ &
Abdulhamit Subasi²

2856 Accesses
214 Citations
Explore all metrics

Abstract

Breast cancer is one of the primary causes of death among the women worldwide, and the accurate diagnosis is one of the most significant steps in breast cancer treatment. Data mining techniques can support doctors in diagnosis decision-making process. In this paper, we present different data mining techniques for diagnosis of breast cancer. Two different Wisconsin Breast Cancer datasets have been used to evaluate the system proposed in this study. The proposed system has two stages. In the first stage, in order to eliminate insignificant features, genetic algorithms are used for extraction of informative and significant features. This process reduces the computational complexity and speed up the data mining process. In the second stage, several data mining techniques are employed to make a decision for two different categories of subjects with or without breast cancer. Different individual and multiple classifier systems were used in the second stage in order to construct accurate system for breast cancer classification. The performance of the methods is evaluated using classification accuracy, area under receiver operating characteristic curves and F-measure. Results obtained with the Rotation Forest model with GA-based 14 features show the highest classification accuracy (99.48 %), and when compared with the previous works, the proposed approach reveals the enhancement in performances. Results obtained in this study have potential to open new opportunities in diagnosis of breast cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abbas HA (2001) An evolutionary artificial neural network approach for breast cancer diagnosis. Artif Intell Med 25:265–281
Article Google Scholar
Abonyi J, Szeifert F (2003) Supervised fuzzy clustering for the identification of fuzzy classifiers. Pattern Recogn Lett 24(14):2195–2207
Article MATH Google Scholar
Albrecht AA, Lappas G, Vinterbo SA, Wong CK, Ohno-Machado L (2002) Two applications of the LSA machine. In: 9th international conference on neural information processing, pp 184–189
Astudillo CA, Oommenb BJ (2013) On achieving semi-supervised pattern recognition by utilizing tree-based SOMs. Pattern Recogn 46(1):293–304
Article MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article MATH Google Scholar
Cevikalp H, Triggs B, Yavuz HS, Kucuk Y, Kucuk M, Barkana A (2010) Large margin classifiers based on affine hulls. Neurocomputing 73:3160–3168
Article Google Scholar
Chang PC, Fan CY, Dzan WY (2010) A CBR-based fuzzy decision tree approach for database classification. Expert Syst Appl 37:214–225
Article Google Scholar
Chen HL, Yang B, Wang SJ, Liu DY, Li HZ, Wen BL (2014) Towards an optimal support vector machine classifier using a parallel particle swarm optimization strategy. Appl Math Comput 239:180–197
MathSciNet MATH Google Scholar
Chen H-L, Yang B, Liu J, Liu D-Y (2011) A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst Appl 38(7):9014–9022
Article Google Scholar
Du K-L, Swamy M (2006) Neural networks in a softcomputing framework. Springer, New York
MATH Google Scholar
Fan CY, Chang PC, Lin JJ, Hsieh JC (2011) A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification. Appl Soft Comput 11:632–644
Article Google Scholar
Fielding AH (2007) Cluster and classification techniques for the biosciences. Cambridge University Press, Cambridge
Google Scholar
Fogel DB, Wasson EC, Boughton EM (1995) Evolving neural network for detecting breast cancer. Cancer Lett 96:49–53
Article Google Scholar
Gadaras I, Mikhailov L (2009) An interpretable fuzzy rule-based classification methodology for medical diagnosis. Artif Intell Med 47(1):25–41
Article Google Scholar
Goodman D, Boggess L, Watkins A (2002) Artificial immune system classification of multiple-class problems. In: Intelligent engineering systems through artificial neural networks: smart engineering system design: neural networks, fuzzy logic, evolutionary programming, complex systems and artificial life, vol 12, pp 179–184
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Article Google Scholar
Hamilton HJ, Shan N, Cercone N (1996) RIAC: a rule induction algorithm based on approximate classification. In: International conference on engineering applications of neural networks
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36
Article Google Scholar
Hassan MR, Begg R, Morsi Y, Lynch K (2006) HMM-fuzzy model for breast cancer diagnosis. In: 15th international conference on machines in medicine and biology
Hassan MR, Hossain MM, Begg RK, Ramamohanarao K, Morsi Y (2010) Breast-cancer identification using HMM-fuzzy approach. Comput Biol Med 40:240–251
Article Google Scholar
Hassanien AE (2004) Rough set approach for attribute reduction and rule generation. J Am Soc Inf Sci Technol 55(11):954–962
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, California
Book MATH Google Scholar
Haykin S (2005) Neural networks: a comprehensive foundation. Pearson Education, New York
MATH Google Scholar
Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Jerez-Aragones J, Gomez-Ruiz JA, Ramos-Jimenez G, Munoz-Perez J, Alba-Conejo E (2003) A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif Intell Med 27(1):45–63
Article Google Scholar
Kim SB, Rattakorn P (2011) Unsupervised feature selection using weighted principal components. Expert Syst Appl 38:5704–5710
Article Google Scholar
Koloseni D, Lampinen J, Luukka P (2013) Differential evolution based nearest prototype classifier with optimized distance measures for the features in the data sets. Expert Syst Appl 40(10):4075–4082
Article Google Scholar
Law M, Figueiredo M, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166
Article Google Scholar
Li DC, Liu CW (2010) A class possibility based kernel to increase classification accuracy for small data sets using support vector machines. Expert Syst Appl 37:3104–3110
Article Google Scholar
Lim CK, Chan CS (2015) A weighted inference engine based on interval-valued fuzzy relational theory. Expert Syst Appl 42(7):3410–3419
Article Google Scholar
Liu X, Ren Y (2010) Novel artificial intelligent techniques via AFS theory: feature selection, concept categorization and characteristic description. Appl Soft Comput 10:793–805
Article Google Scholar
Maglogiannis I, Zafiropoulos E (2009) An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. Appl Intell 30(1):24–36
Article Google Scholar
Marcano-Cedeño A, Quintanilla-Domínguez J, Andina D (2011) WBCD breast cancer database classification applying artificial metaplasticity neural network. Expert Syst Appl 38(11):9573–9579
Article Google Scholar
Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–202
Article Google Scholar
Nauck D, Kruse R (1999) Obtaining interpretable fuzzy classification rules from medical data. Artif Intell Med 16:149–169
Article Google Scholar
Obuchowski NA (2003) Receiver operating characteristic curves and their use in radiology. Radiology 229:3–8
Article Google Scholar
Pawlak Z (1982) Rough sets. Int J Parallel Prog 11(5):341–356
MATH Google Scholar
Pena-Reyes CA, Sipper M (1999) A fuzzy-genetic approach to breast cancer diagnosis. Artif Intell Med 17:131–155
Article Google Scholar
Peng L, Yang B, Jiang J (2009) A novel feature selection approach for biomedical data classification. J Biomed Inform 179(1):809–819
Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, Los Altos
Google Scholar
Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90
MATH Google Scholar
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
Article Google Scholar
Saez JA, Derrac J, Luengo J, Herrera F (2014) Statistical computation of feature weighting schemes through data estimation for nearest neighbor classifiers. Pattern Recogn 47(12):3941–3948
Article Google Scholar
Sahan S, Polat K (2007) A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis. Comput Biol Med 3:415–423
Article Google Scholar
Salzberg SL (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Disc 1:317–328
Article Google Scholar
Sebe N, Cohen I, Garg A, Huang TS (2005) Machine learning in computer vision. Springer, New York
MATH Google Scholar
Setiono R (2000) Generating concise and accurate classification rules for breast cancer diagnosis. Artif Intell Med 18(3):205–217
Article Google Scholar
Ster B, Dobnikar A (1996) Neural networks in medical diagnosis: comparison with other methods. In: Proceedings of the international conference on engineering applications of neural networks, pp 427–430
Stoean R, Stoean C (2013) Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection. Expert Syst Appl 40:2677–2686
Article MathSciNet Google Scholar
Swets JA (1979) ROC analysis applied to the evaluation of medical imaging techniques. Invest Radiol 14:109–121
Article Google Scholar
Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123
Article Google Scholar
UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set (2012) Retrieved 15 Mar 2012, from UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data set: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data Set (2012) Retrieved 16 Mar 2012, from UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data set: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)
Vapnik VN (2005) The nature of statistical learning theory. Spinger, New York
MATH Google Scholar
Wang CJ, Huang CL (2006) A GA-based feature selection and parameters optimization. Expert Syst Appl 31:231–240
Article Google Scholar
Weka 3: Data Mining with Open Source Machine Learning Software in Java (2012) Retrieved 15 Mar 2012, from Weka 3—Data Mining with Open Source Machine Learning Software in Java. http://www.cs.waikato.ac.nz/~ml/weka/
WHO | Breast Cancer: Prevention and Control (2015) Retrieved 20 Jan 2015, from WHO | World Health Organization. http://www.who.int/cancer/detection/breastcancer/en/index1.html
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Elsevier, San Francisco
MATH Google Scholar
Zhao M, Fu C, Ji L, Tang K, Zhou M (2011) Feature selection and parameter optimization for support vector machines: a new approach based on genetic algorithm with feature chromosomes. Expert Syst Appl 38:5197–5204
Article Google Scholar
Zheng B, Yoon SW, Lam SS (2014) Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl 41(4):1476–1482
Article Google Scholar
Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering and Information Technologies, International Burch University, Francuske Revolucije bb. Ilidza, 71000, Sarajevo, Bosnia and Herzegovina
Emina Aličković
Computer Science Department, College of Engineering, Effat University, Jeddah, 21478, Saudi Arabia
Abdulhamit Subasi

Authors

Emina Aličković
View author publications
You can also search for this author in PubMed Google Scholar
Abdulhamit Subasi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdulhamit Subasi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aličković, E., Subasi, A. Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput & Applic 28, 753–763 (2017). https://doi.org/10.1007/s00521-015-2103-9

Download citation

Received: 24 February 2015
Accepted: 04 November 2015
Published: 18 November 2015
Issue Date: April 2017
DOI: https://doi.org/10.1007/s00521-015-2103-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Breast cancer diagnosis using GA feature selection and Rotation Forest

Abstract

Access this article

Similar content being viewed by others

Robust Method for Breast Cancer Classification Based on Feature Selection Using RGWO Algorithm

Performance Analysis of Breast Cancer Data Using Mann–Whitney U Test and Machine Learning

Breast cancer prediction using different machine learning methods applying multi factors

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Breast cancer diagnosis using GA feature selection and Rotation Forest

Abstract

Access this article

Similar content being viewed by others

Robust Method for Breast Cancer Classification Based on Feature Selection Using RGWO Algorithm

Performance Analysis of Breast Cancer Data Using Mann–Whitney U Test and Machine Learning

Breast cancer prediction using different machine learning methods applying multi factors

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation