Acute leukemia classification by ensemble particle swarm model selection
Introduction
According to the Leukemia and Lymphoma Society, “leukemia is a malignant disease (cancer) of the bone marrow and blood characterized by an uncontrolled accumulation of blood cells [1]”. Leukemia is divided in myelogenous and lymphocytic types, where both types can be acute or chronic (which progresses slowly compared to acute leukemia). A few highlights taken from the Leukemia and Lymphoma Society facts 2010–2011 are as follows [1]:
- •
It is estimated that 259,889 people in the USA are living with, or are in remission from, leukemia.
- •
An estimated 43,050 new cases of leukemia will be diagnosed in the USA during 2011.
- •
In 2010, leukemia was expected to affect more than 10 times as many adults (39,733) as children (3317, aged 0–14 years).
- •
The most common type of childhood leukemia (0–19 years old) is acute lymphocytic leukemia (ALL).
- •
In 2007, the most recent year for which data are available, 74% of new ALL cases occurred among children (approximately 2859 cases, aged 0–19 years).
- •
In 2010, it was anticipated that approximately 21,840 deaths (12,660 males and 9180 females) would be attributable to leukemia in the USA, i.e., 8950 attributable to acute myelogenous leukemia (AML), 4390 to chronic lymphocytic leukemia (CLL), 1420 to ALL, and 440 to chronic myeloid leukemia (CML).
Although these figures relate to the USA, proportional statistics are expected for other countries. Thus, it is clear that medical and technological advances in the understanding of leukemia will have a broad impact on the entire world population. In particular, acute leukemia is a deadly disease and the morphological identification of leukocytes is a fundamental task in its detection (the focus of this study). Acute leukemia may be either ALL or AML, with the following acute leukemia subtypes according to the French–American–British classification [2]: L1, L2, and L3 in the lymphocytic family; and M0, M1, M2, M3, M4, M5, M6, and M7 in the myelogenous family.
The morphological identification of acute leukemia is mainly performed by chemists and hematologists. The process starts when a bone marrow sample is taken from the patient's spine, which is prepared as a smear with Wright's staining method. This makes the white globules more visible during analysis. Depending on the economic resources of the hospital (because this equipment is very expensive), a flow cytometry test is conducted so the specific leukemia type and sub-type can be identified. After an accurate diagnosis, the appropriate treatment can be given to the patient. The flow cytometer test is usually considered reliable, which can make the morphological analysis obsolete, although high costs can mean that the morphological test is used. Indeed, most of the hospitals found in developing countries do not have flow cytometers, so morphological analysis is still required. During the morphological analysis, chemists and hematologists study the type and maturity level of leukocytes in the bone marrow sample. They use their knowledge to analyze the morphology of leukocytes to identify the type and subtype of acute leukemia. The identification by experts is reliable, but automated tools would be useful to support experts and reduce the costs for health institutions.
This paper describes an automated approach for morphological acute leukemia classification from images based on machine vision and machine learning techniques. The proposed method consists of three main phases: cell segmentation, feature extraction, and classification. In a previous study, we focused on segmentation and feature extraction [3], [4], whereas we concentrate on the classification stage in this paper. More specifically, we focus on the problem of selecting the best classification model (formed by data preprocessing, feature selection, and classification methods) to provide maximum classification accuracy.
In previous studies, physicians have reported errors of up to 40% when classifying acute leukemia subtypes.1 We have obtained accuracies close to 90% (in average) in our own previous studies by manually combining methods of preprocessing, feature selection, and classification, while trying to identify appropriate parameters to increase accuracy [4]. Although we obtained satisfactory results via the manual selection of classification models, this method was extremely difficult and time-consuming. Thus, automatic methods for effectively selecting classifiers are required. Clearly, increasing the classification accuracy translates into improved diagnoses by the hematologist, which in turn means that patients will have a better treatments and an increased life expectancy.
We aimed to develop more powerful tools to improve the classification accuracy of our previous study [3], [4] and provide more reliable tools for medical diagnosis. Previously, we were mainly focused on segmentation algorithms. We also worked on the extraction of descriptive features/characteristics for acute leukemia cells. However, we did not find a combination of algorithms, parameters, and sets of descriptive characteristics, guaranteeing outstanding results. In order to achieve our goal, we propose the use of ensemble particle swarm model selection (EPSMS) for the automatic selection of accurate classifiers for the morphological identification of acute leukemia.
EPSMS is a generic tool that explores the search space of candidate classifiers to automatically build ensemble classifiers [5]. The main benefit of EPSMS is that it can obtain very effective classification models without user intervention. EPSMS selects ensembles instead of single models [6], [7], so it is more robust to noisy data and it provides more stable predictions. A distinctive feature of EPSMS is that the ensemble classifier is formed of heterogeneous full models, where a full model is composed of methods for preprocessing, feature selection, and classification. We use EPSMS for type/subtype acute leukemia classification in an one-vs-all classification method where we selected ad hoc classifiers for each binary type/subtype problem. This was advantageous because different classification problems may require different classification models. Our results compared favorably with those reported in a previous study [3], [4], where classification models were constructed manually after machine learning experts spent long periods of time in development. Furthermore, the models selected using EPSMS can provide insights into distinctive features of the acute leukemia type/subtype classification task. The improvements in performance were significant and they may motivate further research on the application of EPSMS to other medical tasks.
As mentioned earlier, we adopted a morphological approach because this is an inexpensive method that is available in many hospitals in developing countries. We are aware of other more precise options for addressing the problem, such as flow cytometry or microarray gene expression analysis techniques. Unfortunately, these techniques are not accessible to most people living in poor countries.2 We will also compare the performance of our technique with other approaches when data is available (as the cost of tests decrease) and when the method can be offered to more people. This is part of our future work. Meanwhile, we think the proposed approach is a practical alternative to expensive techniques. This statement is supported by experimental results that show the proposed approach achieves very similar performance to that obtained with alternative and more expensive methods.
The rest of this paper is organized as follows. The next section reviews related work on acute leukemia classification and ensemble member selection. Section 3 describes particle swarm model selection, which is the method EPSMS is based on. Section 4 introduces EPSMS and Section 5 describes how EPSMS was used for acute leukemia classification. Section 6 reports our experimental results acute leukemia classification. Finally, Section 7 summarizes our main findings and outlines future work areas.
Section snippets
Related work
This section reviews related work on acute leukemia classification and classification using ensemble methods.
Particle swarm model selection
PSMS is the application of particle swarm optimization (PSO) to the problem of full model selection (FMS) [6]. Given a pool of methods for data preprocessing, feature selection, and pattern classification, and a data set associated with a classification task, FMS is the task of selecting the best combination of methods such that an estimate of generalization performance is maximized for the classification task. In addition, the hyper-parameters must be optimized for each of the selected
Ensemble particle swarm model selection
EPSMS is an extension of PSMS, which has the goal of building ensemble classifiers from PSMS's partial solutions [5]. The intuition behind EPSMS is that a combination of candidate solutions can result in ensemble classifiers that are capable of outperforming individual models. EPSMS is motivated by the large number of solutions (i.e., a total of (tmax + 1) × m) that are evaluated via PSMS's search process, most of which achieve performance better than random after a few iterations. EPSMS is also
EPSMS for acute leukemia classification
This section describes our proposed approach for acute leukemia classification using EPSMS. We first describe the scenario. Next, we describe the methods used for cell segmentation and feature extraction. Finally, we describe how EPSMS was used to select competitive classification models.
Experiments and results
This section describes the experimental results for the application of EPSMS for leukemia subtype classification. The next section describes the experimental methodology we used. Section 6.2 describes the experimental results for leukemia subtype classification using EPSMS.
Conclusions
We proposed the application of EPSMS to the problem of acute leukemia classification. The classification of acute leukemia types/subtypes is an important task because it ensures patients receive appropriate treatments. Very effective methods and tests are available for this task, but they are complex and very expensive. Thus, these methods are not available in most developing countries. The morphological classification of acute leukemia, where bone marrow cell images are analyzed, is an
Acknowledgements
The first author was supported by PROMEP under grant 103.5/11/4330 and under the UANL-PAICYT program 2010. The authors are grateful with Dr. Rúben Lobato and Dr. José E. Alonso from the Department of Hematology, Mexican Social Security Institute, Puebla, Mexico, for their help in the collection and annotation of samples. A. Rosales and C. Reta thank CONACyT for scholarship nos. 335690 and 212409, respectively. The authors are grateful to the reviewers and editors for their comments, which have
References (49)
- et al.
An approach to the automatic design of multiple classifier systems
Pattern Recognition Letters
(2001) - et al.
Classifier selection for majority voting
Information Fusion
(2005) - et al.
An energy-based model for image annotation and retrieval
Computer Vision and Image Understanding
(2011) - et al.
Analysis of the IJCNN 2007 competition agnostic learning vs. prior knowledge
Neural Networks
(2008) The use of the area under the ROC curve in the evaluation of machine learning algorithms
Pattern Recognition
(1997)- The Leukemia and Lymphoma Society (LLS). Leukemia, lymphoma, myeloma facts 2010–2011; 2011. Online, Available at...
- et al.
Proposals for the classification of the acute leukaemias. French–American–British (FAB) cooperative group
British Journal of Haematology
(1976) - et al.
Leukemia identification from bone marrow cells images using a machine vision and data mining strategy
Intelligent Data Analysis
(2011) - et al.
Segmentation of bone marrow cell images for morphological classification of acute leukemia
- et al.
Ensemble particle swarm model selection
Particle swarm model selection
Journal of Machine Learning Research
Evolutionary model type selection for global surrogate modeling
Journal of Machine Learning Research
A comparative study of feature selection methods for probabilistic neural networks in cancer classification
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring
Science
Class prediction and discovery using gene expression data
Bayesian automatic relevance determination algorithms for classifying gene expression data
Bioinformatics
Artificial neural networks approaches for multidimensional classification of acute lymphoblastic leukemia gene expression samples
Transactions on Information Science and Applications, World Scientific and Engineering Academy and Society
Classification of leukemia blood samples using neural networks
Annals of Biomedical Engineering
Ensemble methods in machine learning
Some fundamental issues in ensemble methods
Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy
Machine Learning
On diversity and accuracy of homogeneous and heterogeneous ensembles
International Journal of Hybrid Intelligent Systems
Experiments with a new boosting algorithm
Bagging prediction
Machine Learning
Cited by (64)
Leukemia segmentation and classification: A comprehensive survey
2022, Computers in Biology and MedicineAn Adaptive Harmony Search Approach for Gene Selection and Classification of High Dimensional Medical Data
2021, Journal of King Saud University - Computer and Information SciencesA review on machine learning techniques for acute leukemia classification
2021, Biosignal Processing and Classification Using Computational Learning and Intelligence: Principles, Algorithms, and ApplicationsPrediction of leukemia by classification and clustering techniques
2021, Machine Learning, Big Data, and IoT for Medical InformaticsCost-sensitive KNN classification
2020, NeurocomputingA survey on image segmentation of blood and bone marrow smear images with emphasis to automated detection of Leukemia
2020, Biocybernetics and Biomedical EngineeringCitation Excerpt :Available literature in the area of leukemia detection by image processing methods and related studies revealed that region based methods, active contour, and neural network based methods are also used for segmentation of peripheral blood and bone marrow smear images and the related works are summarized in Table 7. The studies [92–98] under this category relates to leukemia. The study in [99] deals with anemia and the works in [100–114] deal with segmentation of WBCs using various methods.