Skip to main content

Advertisement

Log in

Breast cancer diagnosis using GA feature selection and Rotation Forest

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Breast cancer is one of the primary causes of death among the women worldwide, and the accurate diagnosis is one of the most significant steps in breast cancer treatment. Data mining techniques can support doctors in diagnosis decision-making process. In this paper, we present different data mining techniques for diagnosis of breast cancer. Two different Wisconsin Breast Cancer datasets have been used to evaluate the system proposed in this study. The proposed system has two stages. In the first stage, in order to eliminate insignificant features, genetic algorithms are used for extraction of informative and significant features. This process reduces the computational complexity and speed up the data mining process. In the second stage, several data mining techniques are employed to make a decision for two different categories of subjects with or without breast cancer. Different individual and multiple classifier systems were used in the second stage in order to construct accurate system for breast cancer classification. The performance of the methods is evaluated using classification accuracy, area under receiver operating characteristic curves and F-measure. Results obtained with the Rotation Forest model with GA-based 14 features show the highest classification accuracy (99.48 %), and when compared with the previous works, the proposed approach reveals the enhancement in performances. Results obtained in this study have potential to open new opportunities in diagnosis of breast cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. Abbas HA (2001) An evolutionary artificial neural network approach for breast cancer diagnosis. Artif Intell Med 25:265–281

    Article  Google Scholar 

  2. Abonyi J, Szeifert F (2003) Supervised fuzzy clustering for the identification of fuzzy classifiers. Pattern Recogn Lett 24(14):2195–2207

    Article  MATH  Google Scholar 

  3. Albrecht AA, Lappas G, Vinterbo SA, Wong CK, Ohno-Machado L (2002) Two applications of the LSA machine. In: 9th international conference on neural information processing, pp 184–189

  4. Astudillo CA, Oommenb BJ (2013) On achieving semi-supervised pattern recognition by utilizing tree-based SOMs. Pattern Recogn 46(1):293–304

    Article  MATH  Google Scholar 

  5. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  6. Cevikalp H, Triggs B, Yavuz HS, Kucuk Y, Kucuk M, Barkana A (2010) Large margin classifiers based on affine hulls. Neurocomputing 73:3160–3168

    Article  Google Scholar 

  7. Chang PC, Fan CY, Dzan WY (2010) A CBR-based fuzzy decision tree approach for database classification. Expert Syst Appl 37:214–225

    Article  Google Scholar 

  8. Chen HL, Yang B, Wang SJ, Liu DY, Li HZ, Wen BL (2014) Towards an optimal support vector machine classifier using a parallel particle swarm optimization strategy. Appl Math Comput 239:180–197

    MathSciNet  MATH  Google Scholar 

  9. Chen H-L, Yang B, Liu J, Liu D-Y (2011) A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst Appl 38(7):9014–9022

    Article  Google Scholar 

  10. Du K-L, Swamy M (2006) Neural networks in a softcomputing framework. Springer, New York

    MATH  Google Scholar 

  11. Fan CY, Chang PC, Lin JJ, Hsieh JC (2011) A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification. Appl Soft Comput 11:632–644

    Article  Google Scholar 

  12. Fielding AH (2007) Cluster and classification techniques for the biosciences. Cambridge University Press, Cambridge

    Google Scholar 

  13. Fogel DB, Wasson EC, Boughton EM (1995) Evolving neural network for detecting breast cancer. Cancer Lett 96:49–53

    Article  Google Scholar 

  14. Gadaras I, Mikhailov L (2009) An interpretable fuzzy rule-based classification methodology for medical diagnosis. Artif Intell Med 47(1):25–41

    Article  Google Scholar 

  15. Goodman D, Boggess L, Watkins A (2002) Artificial immune system classification of multiple-class problems. In: Intelligent engineering systems through artificial neural networks: smart engineering system design: neural networks, fuzzy logic, evolutionary programming, complex systems and artificial life, vol 12, pp 179–184

  16. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18

    Article  Google Scholar 

  17. Hamilton HJ, Shan N, Cercone N (1996) RIAC: a rule induction algorithm based on approximate classification. In: International conference on engineering applications of neural networks

  18. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36

    Article  Google Scholar 

  19. Hassan MR, Begg R, Morsi Y, Lynch K (2006) HMM-fuzzy model for breast cancer diagnosis. In: 15th international conference on machines in medicine and biology

  20. Hassan MR, Hossain MM, Begg RK, Ramamohanarao K, Morsi Y (2010) Breast-cancer identification using HMM-fuzzy approach. Comput Biol Med 40:240–251

    Article  Google Scholar 

  21. Hassanien AE (2004) Rough set approach for attribute reduction and rule generation. J Am Soc Inf Sci Technol 55(11):954–962

    Article  Google Scholar 

  22. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, California

    Book  MATH  Google Scholar 

  23. Haykin S (2005) Neural networks: a comprehensive foundation. Pearson Education, New York

    MATH  Google Scholar 

  24. Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

  25. Jerez-Aragones J, Gomez-Ruiz JA, Ramos-Jimenez G, Munoz-Perez J, Alba-Conejo E (2003) A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif Intell Med 27(1):45–63

    Article  Google Scholar 

  26. Kim SB, Rattakorn P (2011) Unsupervised feature selection using weighted principal components. Expert Syst Appl 38:5704–5710

    Article  Google Scholar 

  27. Koloseni D, Lampinen J, Luukka P (2013) Differential evolution based nearest prototype classifier with optimized distance measures for the features in the data sets. Expert Syst Appl 40(10):4075–4082

    Article  Google Scholar 

  28. Law M, Figueiredo M, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166

    Article  Google Scholar 

  29. Li DC, Liu CW (2010) A class possibility based kernel to increase classification accuracy for small data sets using support vector machines. Expert Syst Appl 37:3104–3110

    Article  Google Scholar 

  30. Lim CK, Chan CS (2015) A weighted inference engine based on interval-valued fuzzy relational theory. Expert Syst Appl 42(7):3410–3419

    Article  Google Scholar 

  31. Liu X, Ren Y (2010) Novel artificial intelligent techniques via AFS theory: feature selection, concept categorization and characteristic description. Appl Soft Comput 10:793–805

    Article  Google Scholar 

  32. Maglogiannis I, Zafiropoulos E (2009) An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. Appl Intell 30(1):24–36

    Article  Google Scholar 

  33. Marcano-Cedeño A, Quintanilla-Domínguez J, Andina D (2011) WBCD breast cancer database classification applying artificial metaplasticity neural network. Expert Syst Appl 38(11):9573–9579

    Article  Google Scholar 

  34. Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–202

    Article  Google Scholar 

  35. Nauck D, Kruse R (1999) Obtaining interpretable fuzzy classification rules from medical data. Artif Intell Med 16:149–169

    Article  Google Scholar 

  36. Obuchowski NA (2003) Receiver operating characteristic curves and their use in radiology. Radiology 229:3–8

    Article  Google Scholar 

  37. Pawlak Z (1982) Rough sets. Int J Parallel Prog 11(5):341–356

    MATH  Google Scholar 

  38. Pena-Reyes CA, Sipper M (1999) A fuzzy-genetic approach to breast cancer diagnosis. Artif Intell Med 17:131–155

    Article  Google Scholar 

  39. Peng L, Yang B, Jiang J (2009) A novel feature selection approach for biomedical data classification. J Biomed Inform 179(1):809–819

    Google Scholar 

  40. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, Los Altos

    Google Scholar 

  41. Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90

    MATH  Google Scholar 

  42. Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630

    Article  Google Scholar 

  43. Saez JA, Derrac J, Luengo J, Herrera F (2014) Statistical computation of feature weighting schemes through data estimation for nearest neighbor classifiers. Pattern Recogn 47(12):3941–3948

    Article  Google Scholar 

  44. Sahan S, Polat K (2007) A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis. Comput Biol Med 3:415–423

    Article  Google Scholar 

  45. Salzberg SL (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Disc 1:317–328

    Article  Google Scholar 

  46. Sebe N, Cohen I, Garg A, Huang TS (2005) Machine learning in computer vision. Springer, New York

    MATH  Google Scholar 

  47. Setiono R (2000) Generating concise and accurate classification rules for breast cancer diagnosis. Artif Intell Med 18(3):205–217

    Article  Google Scholar 

  48. Ster B, Dobnikar A (1996) Neural networks in medical diagnosis: comparison with other methods. In: Proceedings of the international conference on engineering applications of neural networks, pp 427–430

  49. Stoean R, Stoean C (2013) Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection. Expert Syst Appl 40:2677–2686

    Article  MathSciNet  Google Scholar 

  50. Swets JA (1979) ROC analysis applied to the evaluation of medical imaging techniques. Invest Radiol 14:109–121

    Article  Google Scholar 

  51. Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123

    Article  Google Scholar 

  52. UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set (2012) Retrieved 15 Mar 2012, from UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data set: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)

  53. UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data Set (2012) Retrieved 16 Mar 2012, from UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data set: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)

  54. Vapnik VN (2005) The nature of statistical learning theory. Spinger, New York

    MATH  Google Scholar 

  55. Wang CJ, Huang CL (2006) A GA-based feature selection and parameters optimization. Expert Syst Appl 31:231–240

    Article  Google Scholar 

  56. Weka 3: Data Mining with Open Source Machine Learning Software in Java (2012) Retrieved 15 Mar 2012, from Weka 3—Data Mining with Open Source Machine Learning Software in Java. http://www.cs.waikato.ac.nz/~ml/weka/

  57. WHO | Breast Cancer: Prevention and Control (2015) Retrieved 20 Jan 2015, from WHO | World Health Organization. http://www.who.int/cancer/detection/breastcancer/en/index1.html

  58. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Elsevier, San Francisco

    MATH  Google Scholar 

  59. Zhao M, Fu C, Ji L, Tang K, Zhou M (2011) Feature selection and parameter optimization for support vector machines: a new approach based on genetic algorithm with feature chromosomes. Expert Syst Appl 38:5197–5204

    Article  Google Scholar 

  60. Zheng B, Yoon SW, Lam SS (2014) Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl 41(4):1476–1482

    Article  Google Scholar 

  61. Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdulhamit Subasi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aličković, E., Subasi, A. Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput & Applic 28, 753–763 (2017). https://doi.org/10.1007/s00521-015-2103-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-015-2103-9

Keywords

Navigation