Acute leukemia classification by ensemble particle swarm model selection

doi:10.1016/j.artmed.2012.03.005

Artificial Intelligence in Medicine

Volume 55, Issue 3, July 2012, Pages 163-175

https://doi.org/10.1016/j.artmed.2012.03.005 Get rights and content

Abstract

Objective

Acute leukemia is a malignant disease that affects a large proportion of the world population. Different types and subtypes of acute leukemia require different treatments. In order to assign the correct treatment, a physician must identify the leukemia type or subtype. Advanced and precise methods are available for identifying leukemia types, but they are very expensive and not available in most hospitals in developing countries. Thus, alternative methods have been proposed. An option explored in this paper is based on the morphological properties of bone marrow images, where features are extracted from medical images and standard machine learning techniques are used to build leukemia type classifiers.

Methods and materials

This paper studies the use of ensemble particle swarm model selection (EPSMS), which is an automated tool for the selection of classification models, in the context of acute leukemia classification. EPSMS is the application of particle swarm optimization to the exploration of the search space of ensembles that can be formed by heterogeneous classification models in a machine learning toolbox. EPSMS does not require prior domain knowledge and it is able to select highly accurate classification models without user intervention. Furthermore, specific models can be used for different classification tasks.

Results

We report experimental results for acute leukemia classification with real data and show that EPSMS outperformed the best results obtained using manually designed classifiers with the same data. The highest performance using EPSMS was of 97.68% for two-type classification problems and of 94.21% for more than two types problems. To the best of our knowledge, these are the best results reported for this data set. Compared with previous studies, these improvements were consistent among different type/subtype classification tasks, different features extracted from images, and different feature extraction regions. The performance improvements were statistically significant. We improved previous results by an average of 6% and there are improvements of more than 20% with some settings. In addition to the performance improvements, we demonstrated that no manual effort was required during acute leukemia type/subtype classification.

Conclusions

Morphological classification of acute leukemia using EPSMS provides an alternative to expensive diagnostic methods in developing countries. EPSMS is a highly effective method for the automated construction of ensemble classifiers for acute leukemia classification, which requires no significant user intervention. EPSMS could also be used to address other medical classification tasks.

Introduction

According to the Leukemia and Lymphoma Society, “leukemia is a malignant disease (cancer) of the bone marrow and blood characterized by an uncontrolled accumulation of blood cells [1]”. Leukemia is divided in myelogenous and lymphocytic types, where both types can be acute or chronic (which progresses slowly compared to acute leukemia). A few highlights taken from the Leukemia and Lymphoma Society facts 2010–2011 are as follows [1]:

•
It is estimated that 259,889 people in the USA are living with, or are in remission from, leukemia.
•
An estimated 43,050 new cases of leukemia will be diagnosed in the USA during 2011.
•
In 2010, leukemia was expected to affect more than 10 times as many adults (39,733) as children (3317, aged 0–14 years).
•
The most common type of childhood leukemia (0–19 years old) is acute lymphocytic leukemia (ALL).
•
In 2007, the most recent year for which data are available, 74% of new ALL cases occurred among children (approximately 2859 cases, aged 0–19 years).
•
In 2010, it was anticipated that approximately 21,840 deaths (12,660 males and 9180 females) would be attributable to leukemia in the USA, i.e., 8950 attributable to acute myelogenous leukemia (AML), 4390 to chronic lymphocytic leukemia (CLL), 1420 to ALL, and 440 to chronic myeloid leukemia (CML).

Although these figures relate to the USA, proportional statistics are expected for other countries. Thus, it is clear that medical and technological advances in the understanding of leukemia will have a broad impact on the entire world population. In particular, acute leukemia is a deadly disease and the morphological identification of leukocytes is a fundamental task in its detection (the focus of this study). Acute leukemia may be either ALL or AML, with the following acute leukemia subtypes according to the French–American–British classification [2]: L1, L2, and L3 in the lymphocytic family; and M0, M1, M2, M3, M4, M5, M6, and M7 in the myelogenous family.

The morphological identification of acute leukemia is mainly performed by chemists and hematologists. The process starts when a bone marrow sample is taken from the patient's spine, which is prepared as a smear with Wright's staining method. This makes the white globules more visible during analysis. Depending on the economic resources of the hospital (because this equipment is very expensive), a flow cytometry test is conducted so the specific leukemia type and sub-type can be identified. After an accurate diagnosis, the appropriate treatment can be given to the patient. The flow cytometer test is usually considered reliable, which can make the morphological analysis obsolete, although high costs can mean that the morphological test is used. Indeed, most of the hospitals found in developing countries do not have flow cytometers, so morphological analysis is still required. During the morphological analysis, chemists and hematologists study the type and maturity level of leukocytes in the bone marrow sample. They use their knowledge to analyze the morphology of leukocytes to identify the type and subtype of acute leukemia. The identification by experts is reliable, but automated tools would be useful to support experts and reduce the costs for health institutions.

This paper describes an automated approach for morphological acute leukemia classification from images based on machine vision and machine learning techniques. The proposed method consists of three main phases: cell segmentation, feature extraction, and classification. In a previous study, we focused on segmentation and feature extraction [3], [4], whereas we concentrate on the classification stage in this paper. More specifically, we focus on the problem of selecting the best classification model (formed by data preprocessing, feature selection, and classification methods) to provide maximum classification accuracy.

In previous studies, physicians have reported errors of up to 40% when classifying acute leukemia subtypes.¹ We have obtained accuracies close to 90% (in average) in our own previous studies by manually combining methods of preprocessing, feature selection, and classification, while trying to identify appropriate parameters to increase accuracy [4]. Although we obtained satisfactory results via the manual selection of classification models, this method was extremely difficult and time-consuming. Thus, automatic methods for effectively selecting classifiers are required. Clearly, increasing the classification accuracy translates into improved diagnoses by the hematologist, which in turn means that patients will have a better treatments and an increased life expectancy.

We aimed to develop more powerful tools to improve the classification accuracy of our previous study [3], [4] and provide more reliable tools for medical diagnosis. Previously, we were mainly focused on segmentation algorithms. We also worked on the extraction of descriptive features/characteristics for acute leukemia cells. However, we did not find a combination of algorithms, parameters, and sets of descriptive characteristics, guaranteeing outstanding results. In order to achieve our goal, we propose the use of ensemble particle swarm model selection (EPSMS) for the automatic selection of accurate classifiers for the morphological identification of acute leukemia.

EPSMS is a generic tool that explores the search space of candidate classifiers to automatically build ensemble classifiers [5]. The main benefit of EPSMS is that it can obtain very effective classification models without user intervention. EPSMS selects ensembles instead of single models [6], [7], so it is more robust to noisy data and it provides more stable predictions. A distinctive feature of EPSMS is that the ensemble classifier is formed of heterogeneous full models, where a full model is composed of methods for preprocessing, feature selection, and classification. We use EPSMS for type/subtype acute leukemia classification in an one-vs-all classification method where we selected ad hoc classifiers for each binary type/subtype problem. This was advantageous because different classification problems may require different classification models. Our results compared favorably with those reported in a previous study [3], [4], where classification models were constructed manually after machine learning experts spent long periods of time in development. Furthermore, the models selected using EPSMS can provide insights into distinctive features of the acute leukemia type/subtype classification task. The improvements in performance were significant and they may motivate further research on the application of EPSMS to other medical tasks.

As mentioned earlier, we adopted a morphological approach because this is an inexpensive method that is available in many hospitals in developing countries. We are aware of other more precise options for addressing the problem, such as flow cytometry or microarray gene expression analysis techniques. Unfortunately, these techniques are not accessible to most people living in poor countries.² We will also compare the performance of our technique with other approaches when data is available (as the cost of tests decrease) and when the method can be offered to more people. This is part of our future work. Meanwhile, we think the proposed approach is a practical alternative to expensive techniques. This statement is supported by experimental results that show the proposed approach achieves very similar performance to that obtained with alternative and more expensive methods.

The rest of this paper is organized as follows. The next section reviews related work on acute leukemia classification and ensemble member selection. Section 3 describes particle swarm model selection, which is the method EPSMS is based on. Section 4 introduces EPSMS and Section 5 describes how EPSMS was used for acute leukemia classification. Section 6 reports our experimental results acute leukemia classification. Finally, Section 7 summarizes our main findings and outlines future work areas.

Section snippets

Related work

This section reviews related work on acute leukemia classification and classification using ensemble methods.

Particle swarm model selection

PSMS is the application of particle swarm optimization (PSO) to the problem of full model selection (FMS) [6]. Given a pool of methods for data preprocessing, feature selection, and pattern classification, and a data set associated with a classification task, FMS is the task of selecting the best combination of methods such that an estimate of generalization performance is maximized for the classification task. In addition, the hyper-parameters must be optimized for each of the selected

Ensemble particle swarm model selection

EPSMS is an extension of PSMS, which has the goal of building ensemble classifiers from PSMS's partial solutions [5]. The intuition behind EPSMS is that a combination of candidate solutions can result in ensemble classifiers that are capable of outperforming individual models. EPSMS is motivated by the large number of solutions (i.e., a total of (t_max + 1) × m) that are evaluated via PSMS's search process, most of which achieve performance better than random after a few iterations. EPSMS is also

EPSMS for acute leukemia classification

This section describes our proposed approach for acute leukemia classification using EPSMS. We first describe the scenario. Next, we describe the methods used for cell segmentation and feature extraction. Finally, we describe how EPSMS was used to select competitive classification models.

Experiments and results

This section describes the experimental results for the application of EPSMS for leukemia subtype classification. The next section describes the experimental methodology we used. Section 6.2 describes the experimental results for leukemia subtype classification using EPSMS.

Conclusions

We proposed the application of EPSMS to the problem of acute leukemia classification. The classification of acute leukemia types/subtypes is an important task because it ensures patients receive appropriate treatments. Very effective methods and tests are available for this task, but they are complex and very expensive. Thus, these methods are not available in most developing countries. The morphological classification of acute leukemia, where bone marrow cell images are analyzed, is an

Acknowledgements

The first author was supported by PROMEP under grant 103.5/11/4330 and under the UANL-PAICYT program 2010. The authors are grateful with Dr. Rúben Lobato and Dr. José E. Alonso from the Department of Hematology, Mexican Social Security Institute, Puebla, Mexico, for their help in the collection and annotation of samples. A. Rosales and C. Reta thank CONACyT for scholarship nos. 335690 and 212409, respectively. The authors are grateful to the reviewers and editors for their comments, which have

References (49)

G. Giacinto et al.
An approach to the automatic design of multiple classifier systems
Pattern Recognition Letters
(2001)
D. Ruta et al.
Classifier selection for majority voting
Information Fusion
(2005)
H.J. Escalante et al.
An energy-based model for image annotation and retrieval
Computer Vision and Image Understanding
(2011)
I. Guyon et al.
Analysis of the IJCNN 2007 competition agnostic learning vs. prior knowledge
Neural Networks
(2008)
A. Bradley
The use of the area under the ROC curve in the evaluation of machine learning algorithms
Pattern Recognition
(1997)
The Leukemia and Lymphoma Society (LLS). Leukemia, lymphoma, myeloma facts 2010–2011; 2011. Online, Available at...
J. Bennett et al.
Proposals for the classification of the acute leukaemias. French–American–British (FAB) cooperative group
British Journal of Haematology
(1976)
J.A. Gonzalez et al.
Leukemia identification from bone marrow cells images using a machine vision and data mining strategy
Intelligent Data Analysis
(2011)
C. Reta et al.
Segmentation of bone marrow cell images for morphological classification of acute leukemia
H.J. Escalante et al.
Ensemble particle swarm model selection

H.J. Escalante et al.

Particle swarm model selection

Journal of Machine Learning Research

(2009)

D. Gorissen et al.

Evolutionary model type selection for global surrogate modeling

Journal of Machine Learning Research

(2009)

C.J. Huang et al.

A comparative study of feature selection methods for probabilistic neural networks in cancer classification

T.R. Golub et al.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

Science

(1999)

D.K. Slonim et al.

Class prediction and discovery using gene expression data

Y. Li et al.

Bayesian automatic relevance determination algorithms for classifying gene expression data

Bioinformatics

(2002)

N. Zong et al.

Artificial neural networks approaches for multidimensional classification of acute lymphoblastic leukemia gene expression samples

Transactions on Information Science and Applications, World Scientific and Engineering Academy and Society

(2005)

M. Adjouadi et al.

Classification of leukemia blood samples using neural networks

Annals of Biomedical Engineering

(2010)

T. Dietterich

Ensemble methods in machine learning

W. Wang

Some fundamental issues in ensemble methods

L. Kuncheva et al.

Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy

Machine Learning

(2003)

S. Bian et al.

On diversity and accuracy of homogeneous and heterogeneous ensembles

International Journal of Hybrid Intelligent Systems

(2007)

Y. Freund et al.

Experiments with a new boosting algorithm

L. Breiman

Bagging prediction

Machine Learning

(1996)

Cited by (64)

Leukemia segmentation and classification: A comprehensive survey
2022, Computers in Biology and Medicine
Blood is made up of leukocytes (WBCs), erythrocytes (RBCs), and thrombocytes. The ratio of blood cancer diseases is increasing rapidly, among which leukemia is one of the famous cancer which may lead to death. Leukemia cancer is initiated by the unnecessary growth of immature WBCs present in the sponge tissues of bone marrow. It is generally analyzed by etiologists by perceiving slides of blood smear images under a microscope. The morphological features and blood cells count facilitated the etiologists to detect leukemia. Due to the late detection and expensive instruments used for leukemia analysis, the death rate has risen significantly. The fluorescence-based cell sorting technique and manual recounts using a hemocytometer are error-prone and imprecise. Leukemia detection methods consist of pre-processing, segmentation, features extraction, and classification. In this article, recent deep learning methodologies and challenges for leukemia detection are discussed. These methods are helpful to examine the microscopic blood smears images and for the detection of leukemia more accurately.
An Adaptive Harmony Search Approach for Gene Selection and Classification of High Dimensional Medical Data
2021, Journal of King Saud University - Computer and Information Sciences
In Bioinformatics, microarray data analysis has gained substantial attention for disease diagnosis. Microarray data is represented with a huge search space which imposes the foremost difficulties in selection of most relevant facts in terms of genes. In this esteem, we have recommended a hybridised harmony search and Pareto optimization approach for feature selection in high dimensional data classification problem. In the first stage an adaptive harmony search algorithm for gene selection with probability distribution factor for optimal gene ranking is implemented. This selection is further refined applying a bi-objective Pareto based feature selection technique to select optimal minimum number of top ranked genes. The importance and relevance of the selected genes are verified through a few numbers of classifiers. Experimental analysis is conducted over four well known microarray datasets. Finally statistical analysis is conducted to prove the superiority of proposed work with two other nature inspired algorithms. Simulation result reveals that the proposed hybridisation is providing high potentiality in both sample classification and feature subset prediction prospective for high dimensional databases.
A review on machine learning techniques for acute leukemia classification
2021, Biosignal Processing and Classification Using Computational Learning and Intelligence: Principles, Algorithms, and Applications
Acute leukemia is a malignant disease characterized by an excess of immature white blood cells, which proliferate in the circulatory system and replace healthy blood cells. These abnormal cells cause that the body exposure to diseases, affecting a large proportion of the world's population. Acute leukemia is categorized into two types and ten subtypes. Hence, early detection of the particular class of acute leukemia helps to provide patients with adequate treatment. In recent years, studies have focused on the development of automatic methods to detect and classify acute leukemia and its subtypes as an alternative tool to aid in diagnosis. Among these studies, machine learning techniques have gained much attention and shown success. This chapter aims at providing an overview of the most recent advances in the use of machine learning techniques to classify acute leukemia, by examining the different stages involved in this task, such as image preprocessing, feature extraction, and classification. This chapter includes a brief analysis of these trends, emphasizing current issues and possible challenges in this area.
Prediction of leukemia by classification and clustering techniques
2021, Machine Learning, Big Data, and IoT for Medical Informatics
Leukemia is a kind of blood cancer that impacts the white blood cells and damages the bone marrow. Typically the complete blood count (CBC) and bone marrow are affected. It can be a fatal disease if not identified at the earliest stage. Usually, manual microscopic assessment of stained sample slides is used for analysis of leukemia, but manual diagnostic strategies are time consuming, less accurate, and prone to errors due to diverse human elements such as pressure, fatigue, and so on. To avoid possible faults and errors and to assist pathologists, clustering and classification techniques are required, which are being used in every medical field to obtain better outcomes. This chapter emphasizes clustering and classification techniques applied to detection of leukemia. The proposed work consists of two phases: Phase I deals with the collection of the dataset and visualization of datasets, and Phase II deals with machine learning and data mining techniques for the prediction of leukemia. We would expect that the proposed techniques would show better performance than other existing techniques. The proposed techniques could be utilized for other diseases as well.
Cost-sensitive KNN classification
2020, Neurocomputing
KNN (K Nearest Neighbors) classification is one of top-10 data mining algorithms. It is significant to extend KNN classifiers sensitive to costs for imbalanced data classification applications. This paper designs two efficient cost-sensitive KNN classification models, referred to Direct-CS-KNN classifier and Distance-CS-KNN classifier. The two CS-KNN classifiers are further improved with extant strategies, such as smoothing, minimum-cost k-value selection, feature selection and ensemble selection. We evaluate our methods with real data sets, to show that our CS-KNN classifiers can significantly reduce misclassification cost.
A survey on image segmentation of blood and bone marrow smear images with emphasis to automated detection of Leukemia
2020, Biocybernetics and Biomedical Engineering
Citation Excerpt :
Available literature in the area of leukemia detection by image processing methods and related studies revealed that region based methods, active contour, and neural network based methods are also used for segmentation of peripheral blood and bone marrow smear images and the related works are summarized in Table 7. The studies [92–98] under this category relates to leukemia. The study in [99] deals with anemia and the works in [100–114] deal with segmentation of WBCs using various methods.
Leukemia is an abnormal proliferation of leukocytes in the bone marrow and blood and it is usually diagnosed by the pathologists by observing the blood smear under a microscope. The count of various cells and their morphological features are used by the pathologists to identify and classify leukemia. An abnormal increase in the count of immature leukocytes along with a reduced count of other blood cells may be an indication of leukemia. The Pathologist may then recommend for bone marrow examination to confirm and identify the specific type of leukemia. These conventional methods are time consuming and may be affected by the skill and expertise of the medical professionals involved in the diagnostic procedures. Image processing based methods can be used to analyze the microscopic smear images to detect the incidence of leukemia automatically and quickly. Image segmentation is one of the very important tasks in processing and analyzing medical images. In the proposed paper an attempt has been made to review the available works in the area of medical image processing of blood smear images, highlighting automated detection of leukemia. The available works in the related area are reviewed based on the segmentation method used. It is learnt that even though there are many studies for detection of acute leukemia only a very few studies are there for the detection of chronic leukemia. There are a few related review studies available in the literature but, none of the works classify the previous studies based on the segmentation method used.

View all citing articles on Scopus

View full text

Acute leukemia classification by ensemble particle swarm model selection

Abstract

Objective

Methods and materials

Results

Conclusions

Introduction

Section snippets

Related work

Particle swarm model selection

Ensemble particle swarm model selection

EPSMS for acute leukemia classification

Experiments and results

Conclusions

Acknowledgements

Pattern Recognition Letters

Information Fusion

Computer Vision and Image Understanding

Neural Networks

Pattern Recognition

Proposals for the classification of the acute leukaemias. French–American–British (FAB) cooperative group

British Journal of Haematology

Leukemia identification from bone marrow cells images using a machine vision and data mining strategy

Intelligent Data Analysis

Segmentation of bone marrow cell images for morphological classification of acute leukemia

Ensemble particle swarm model selection

Particle swarm model selection

Journal of Machine Learning Research

Evolutionary model type selection for global surrogate modeling

Journal of Machine Learning Research

A comparative study of feature selection methods for probabilistic neural networks in cancer classification

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

Science

Class prediction and discovery using gene expression data

Bayesian automatic relevance determination algorithms for classifying gene expression data

Bioinformatics

Artificial neural networks approaches for multidimensional classification of acute lymphoblastic leukemia gene expression samples

Transactions on Information Science and Applications, World Scientific and Engineering Academy and Society

Classification of leukemia blood samples using neural networks

Annals of Biomedical Engineering

Ensemble methods in machine learning

Some fundamental issues in ensemble methods

Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy

Machine Learning

On diversity and accuracy of homogeneous and heterogeneous ensembles

International Journal of Hybrid Intelligent Systems

Experiments with a new boosting algorithm

Bagging prediction

Machine Learning