Acute leukemia classification by ensemble particle swarm model selection

https://doi.org/10.1016/j.artmed.2012.03.005Get rights and content

Abstract

Objective

Acute leukemia is a malignant disease that affects a large proportion of the world population. Different types and subtypes of acute leukemia require different treatments. In order to assign the correct treatment, a physician must identify the leukemia type or subtype. Advanced and precise methods are available for identifying leukemia types, but they are very expensive and not available in most hospitals in developing countries. Thus, alternative methods have been proposed. An option explored in this paper is based on the morphological properties of bone marrow images, where features are extracted from medical images and standard machine learning techniques are used to build leukemia type classifiers.

Methods and materials

This paper studies the use of ensemble particle swarm model selection (EPSMS), which is an automated tool for the selection of classification models, in the context of acute leukemia classification. EPSMS is the application of particle swarm optimization to the exploration of the search space of ensembles that can be formed by heterogeneous classification models in a machine learning toolbox. EPSMS does not require prior domain knowledge and it is able to select highly accurate classification models without user intervention. Furthermore, specific models can be used for different classification tasks.

Results

We report experimental results for acute leukemia classification with real data and show that EPSMS outperformed the best results obtained using manually designed classifiers with the same data. The highest performance using EPSMS was of 97.68% for two-type classification problems and of 94.21% for more than two types problems. To the best of our knowledge, these are the best results reported for this data set. Compared with previous studies, these improvements were consistent among different type/subtype classification tasks, different features extracted from images, and different feature extraction regions. The performance improvements were statistically significant. We improved previous results by an average of 6% and there are improvements of more than 20% with some settings. In addition to the performance improvements, we demonstrated that no manual effort was required during acute leukemia type/subtype classification.

Conclusions

Morphological classification of acute leukemia using EPSMS provides an alternative to expensive diagnostic methods in developing countries. EPSMS is a highly effective method for the automated construction of ensemble classifiers for acute leukemia classification, which requires no significant user intervention. EPSMS could also be used to address other medical classification tasks.

Introduction

According to the Leukemia and Lymphoma Society, “leukemia is a malignant disease (cancer) of the bone marrow and blood characterized by an uncontrolled accumulation of blood cells [1]”. Leukemia is divided in myelogenous and lymphocytic types, where both types can be acute or chronic (which progresses slowly compared to acute leukemia). A few highlights taken from the Leukemia and Lymphoma Society facts 2010–2011 are as follows [1]:

  • It is estimated that 259,889 people in the USA are living with, or are in remission from, leukemia.

  • An estimated 43,050 new cases of leukemia will be diagnosed in the USA during 2011.

  • In 2010, leukemia was expected to affect more than 10 times as many adults (39,733) as children (3317, aged 0–14 years).

  • The most common type of childhood leukemia (0–19 years old) is acute lymphocytic leukemia (ALL).

  • In 2007, the most recent year for which data are available, 74% of new ALL cases occurred among children (approximately 2859 cases, aged 0–19 years).

  • In 2010, it was anticipated that approximately 21,840 deaths (12,660 males and 9180 females) would be attributable to leukemia in the USA, i.e., 8950 attributable to acute myelogenous leukemia (AML), 4390 to chronic lymphocytic leukemia (CLL), 1420 to ALL, and 440 to chronic myeloid leukemia (CML).

Although these figures relate to the USA, proportional statistics are expected for other countries. Thus, it is clear that medical and technological advances in the understanding of leukemia will have a broad impact on the entire world population. In particular, acute leukemia is a deadly disease and the morphological identification of leukocytes is a fundamental task in its detection (the focus of this study). Acute leukemia may be either ALL or AML, with the following acute leukemia subtypes according to the French–American–British classification [2]: L1, L2, and L3 in the lymphocytic family; and M0, M1, M2, M3, M4, M5, M6, and M7 in the myelogenous family.

The morphological identification of acute leukemia is mainly performed by chemists and hematologists. The process starts when a bone marrow sample is taken from the patient's spine, which is prepared as a smear with Wright's staining method. This makes the white globules more visible during analysis. Depending on the economic resources of the hospital (because this equipment is very expensive), a flow cytometry test is conducted so the specific leukemia type and sub-type can be identified. After an accurate diagnosis, the appropriate treatment can be given to the patient. The flow cytometer test is usually considered reliable, which can make the morphological analysis obsolete, although high costs can mean that the morphological test is used. Indeed, most of the hospitals found in developing countries do not have flow cytometers, so morphological analysis is still required. During the morphological analysis, chemists and hematologists study the type and maturity level of leukocytes in the bone marrow sample. They use their knowledge to analyze the morphology of leukocytes to identify the type and subtype of acute leukemia. The identification by experts is reliable, but automated tools would be useful to support experts and reduce the costs for health institutions.

This paper describes an automated approach for morphological acute leukemia classification from images based on machine vision and machine learning techniques. The proposed method consists of three main phases: cell segmentation, feature extraction, and classification. In a previous study, we focused on segmentation and feature extraction [3], [4], whereas we concentrate on the classification stage in this paper. More specifically, we focus on the problem of selecting the best classification model (formed by data preprocessing, feature selection, and classification methods) to provide maximum classification accuracy.

In previous studies, physicians have reported errors of up to 40% when classifying acute leukemia subtypes.1 We have obtained accuracies close to 90% (in average) in our own previous studies by manually combining methods of preprocessing, feature selection, and classification, while trying to identify appropriate parameters to increase accuracy [4]. Although we obtained satisfactory results via the manual selection of classification models, this method was extremely difficult and time-consuming. Thus, automatic methods for effectively selecting classifiers are required. Clearly, increasing the classification accuracy translates into improved diagnoses by the hematologist, which in turn means that patients will have a better treatments and an increased life expectancy.

We aimed to develop more powerful tools to improve the classification accuracy of our previous study [3], [4] and provide more reliable tools for medical diagnosis. Previously, we were mainly focused on segmentation algorithms. We also worked on the extraction of descriptive features/characteristics for acute leukemia cells. However, we did not find a combination of algorithms, parameters, and sets of descriptive characteristics, guaranteeing outstanding results. In order to achieve our goal, we propose the use of ensemble particle swarm model selection (EPSMS) for the automatic selection of accurate classifiers for the morphological identification of acute leukemia.

EPSMS is a generic tool that explores the search space of candidate classifiers to automatically build ensemble classifiers [5]. The main benefit of EPSMS is that it can obtain very effective classification models without user intervention. EPSMS selects ensembles instead of single models [6], [7], so it is more robust to noisy data and it provides more stable predictions. A distinctive feature of EPSMS is that the ensemble classifier is formed of heterogeneous full models, where a full model is composed of methods for preprocessing, feature selection, and classification. We use EPSMS for type/subtype acute leukemia classification in an one-vs-all classification method where we selected ad hoc classifiers for each binary type/subtype problem. This was advantageous because different classification problems may require different classification models. Our results compared favorably with those reported in a previous study [3], [4], where classification models were constructed manually after machine learning experts spent long periods of time in development. Furthermore, the models selected using EPSMS can provide insights into distinctive features of the acute leukemia type/subtype classification task. The improvements in performance were significant and they may motivate further research on the application of EPSMS to other medical tasks.

As mentioned earlier, we adopted a morphological approach because this is an inexpensive method that is available in many hospitals in developing countries. We are aware of other more precise options for addressing the problem, such as flow cytometry or microarray gene expression analysis techniques. Unfortunately, these techniques are not accessible to most people living in poor countries.2 We will also compare the performance of our technique with other approaches when data is available (as the cost of tests decrease) and when the method can be offered to more people. This is part of our future work. Meanwhile, we think the proposed approach is a practical alternative to expensive techniques. This statement is supported by experimental results that show the proposed approach achieves very similar performance to that obtained with alternative and more expensive methods.

The rest of this paper is organized as follows. The next section reviews related work on acute leukemia classification and ensemble member selection. Section 3 describes particle swarm model selection, which is the method EPSMS is based on. Section 4 introduces EPSMS and Section 5 describes how EPSMS was used for acute leukemia classification. Section 6 reports our experimental results acute leukemia classification. Finally, Section 7 summarizes our main findings and outlines future work areas.

Section snippets

Related work

This section reviews related work on acute leukemia classification and classification using ensemble methods.

Particle swarm model selection

PSMS is the application of particle swarm optimization (PSO) to the problem of full model selection (FMS) [6]. Given a pool of methods for data preprocessing, feature selection, and pattern classification, and a data set associated with a classification task, FMS is the task of selecting the best combination of methods such that an estimate of generalization performance is maximized for the classification task. In addition, the hyper-parameters must be optimized for each of the selected

Ensemble particle swarm model selection

EPSMS is an extension of PSMS, which has the goal of building ensemble classifiers from PSMS's partial solutions [5]. The intuition behind EPSMS is that a combination of candidate solutions can result in ensemble classifiers that are capable of outperforming individual models. EPSMS is motivated by the large number of solutions (i.e., a total of (tmax + 1) × m) that are evaluated via PSMS's search process, most of which achieve performance better than random after a few iterations. EPSMS is also

EPSMS for acute leukemia classification

This section describes our proposed approach for acute leukemia classification using EPSMS. We first describe the scenario. Next, we describe the methods used for cell segmentation and feature extraction. Finally, we describe how EPSMS was used to select competitive classification models.

Experiments and results

This section describes the experimental results for the application of EPSMS for leukemia subtype classification. The next section describes the experimental methodology we used. Section 6.2 describes the experimental results for leukemia subtype classification using EPSMS.

Conclusions

We proposed the application of EPSMS to the problem of acute leukemia classification. The classification of acute leukemia types/subtypes is an important task because it ensures patients receive appropriate treatments. Very effective methods and tests are available for this task, but they are complex and very expensive. Thus, these methods are not available in most developing countries. The morphological classification of acute leukemia, where bone marrow cell images are analyzed, is an

Acknowledgements

The first author was supported by PROMEP under grant 103.5/11/4330 and under the UANL-PAICYT program 2010. The authors are grateful with Dr. Rúben Lobato and Dr. José E. Alonso from the Department of Hematology, Mexican Social Security Institute, Puebla, Mexico, for their help in the collection and annotation of samples. A. Rosales and C. Reta thank CONACyT for scholarship nos. 335690 and 212409, respectively. The authors are grateful to the reviewers and editors for their comments, which have

References (49)

  • H.J. Escalante et al.

    Particle swarm model selection

    Journal of Machine Learning Research

    (2009)
  • D. Gorissen et al.

    Evolutionary model type selection for global surrogate modeling

    Journal of Machine Learning Research

    (2009)
  • C.J. Huang et al.

    A comparative study of feature selection methods for probabilistic neural networks in cancer classification

  • T.R. Golub et al.

    Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

    Science

    (1999)
  • D.K. Slonim et al.

    Class prediction and discovery using gene expression data

  • Y. Li et al.

    Bayesian automatic relevance determination algorithms for classifying gene expression data

    Bioinformatics

    (2002)
  • N. Zong et al.

    Artificial neural networks approaches for multidimensional classification of acute lymphoblastic leukemia gene expression samples

    Transactions on Information Science and Applications, World Scientific and Engineering Academy and Society

    (2005)
  • M. Adjouadi et al.

    Classification of leukemia blood samples using neural networks

    Annals of Biomedical Engineering

    (2010)
  • T. Dietterich

    Ensemble methods in machine learning

  • W. Wang

    Some fundamental issues in ensemble methods

  • L. Kuncheva et al.

    Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy

    Machine Learning

    (2003)
  • S. Bian et al.

    On diversity and accuracy of homogeneous and heterogeneous ensembles

    International Journal of Hybrid Intelligent Systems

    (2007)
  • Y. Freund et al.

    Experiments with a new boosting algorithm

  • L. Breiman

    Bagging prediction

    Machine Learning

    (1996)
  • Cited by (64)

    • An Adaptive Harmony Search Approach for Gene Selection and Classification of High Dimensional Medical Data

      2021, Journal of King Saud University - Computer and Information Sciences
    • A review on machine learning techniques for acute leukemia classification

      2021, Biosignal Processing and Classification Using Computational Learning and Intelligence: Principles, Algorithms, and Applications
    • Prediction of leukemia by classification and clustering techniques

      2021, Machine Learning, Big Data, and IoT for Medical Informatics
    • A survey on image segmentation of blood and bone marrow smear images with emphasis to automated detection of Leukemia

      2020, Biocybernetics and Biomedical Engineering
      Citation Excerpt :

      Available literature in the area of leukemia detection by image processing methods and related studies revealed that region based methods, active contour, and neural network based methods are also used for segmentation of peripheral blood and bone marrow smear images and the related works are summarized in Table 7. The studies [92–98] under this category relates to leukemia. The study in [99] deals with anemia and the works in [100–114] deal with segmentation of WBCs using various methods.

    View all citing articles on Scopus
    View full text