AptaCDSS-E: A classifier ensemble-based clinical decision support system for cardiovascular disease level prediction

https://doi.org/10.1016/j.eswa.2007.04.015Get rights and content

Abstract

Conventional clinical decision support systems are generally based on a single classifier or a simple combination of these models, showing moderate performance. In this paper, we propose a classifier ensemble-based method for supporting the diagnosis of cardiovascular disease (CVD) based on aptamer chips. This AptaCDSS-E system overcomes conventional performance limitations by utilizing ensembles of different classifiers. Recent surveys show that CVD is one of the leading causes of death and that significant life savings can be achieved if precise diagnosis can be made. For CVD diagnosis, our system combines a set of four different classifiers with ensembles. Support vector machines and neural networks are adopted as base classifiers. Decision trees and Bayesian networks are also adopted to augment the system. Four aptamer-based biochip data sets including CVD data containing 66 samples were used to train and test the system. Three other supplementary data sets are used to alleviate data insufficiency. We investigated the effectiveness of the ensemble-based system with several different aggregation approaches by comparing the results with single classifier-based models. The prediction performance of the AptaCDSS-E system was assessed with a cross-validation test. The experimental results show that our system achieves high diagnosis accuracy (>94%) and comparably small prediction difference intervals (<6%), proving its usefulness in the clinical decision process of disease diagnosis. Additionally, 10 possible biomarkers are found for further investigation.

Introduction

Recent surveys show that cardiovascular disease (CVD), which includes heart disease and stroke, is one of the leading causes of death regardless of sex in the United States and all over the world (CDC’s Report 1). From the report, CVD accounts for nearly 40% of all deaths in the US annually. While these largely preventable diseases are more prevalent among people aged more than 65, the number of sudden deaths from heart disease among people aged 15–34 has also increased substantially (CDC’s Report 2). Therefore, significant life savings can be achieved if a precise diagnosis can be made to CVD patients. Correct diagnosis, however, is not easy to make and is often delayed due to the many factors complicating disease diagnosis. For example, clinical symptoms, functional, and pathologic manifestations of heart disease are often associated with many other human organs besides the heart itself, and often heart disease may show diverse syndromes. Furthermore, different types of heart disease can have similar symptoms, further complicating diagnosis (Yan, Jiang, Zheng, Peng, & Li, 2006).

To reduce the time of intensive diagnosis and to improve diagnosis accuracy, the development of reliable and powerful clinical decision support systems (CDSSs) that support the aforementioned increasingly complicated diagnosis decision processes in the medical diagnosis is crucial (Yan et al., 2006). Recently, many medical institutions are increasingly adopting tools that offer decision support to improve patient outcomes and reduce clinical diagnosis errors and costs.

In the last two decades, the use of artificial intelligence tools has become widely accepted in medical applications to support patient diagnosis more effectively. Especially, the application of various machine learning approaches such as decision trees (DTs), artificial neural networks (ANNs), Bayesian networks (BNs), and support vector machines (SVMs) have been actively tried for meeting clinical support requirements. Consequently, CDSS or medical diagnosis systems using different machine learning approaches have shown great potential, and many machine learning methods have been tried for a wide variety of clinical and medical applications. Here we briefly review some part of the previous work in this area before presenting our own machine-learning-based approach.

The use of decision trees is one of the most popularly applied methods for CDSS due to its simplicity and capacity for humanly understandable inductive rules. Many researchers have employed DT to resolve various biological problems, including diagnostic error analysis (Murphy, 2001), potential biomarker finding (Qu et al., 2002, Won et al., 2003), and proteomic mass spectra classification (Geurts et al., 2005).

Bayesian networks are a probability-based inference model, increasingly used in the medical domain as a method of knowledge representation for reasoning under uncertainty for a wide range of applications, including disease diagnosis (Balla, Iansek, & Elstein, 1985), genetic counseling (Harris, 1990), expert system development (Stockwell, 1993), gene network modeling (Liu, Sung, & Mittal, 2006), and emergency medical decision support system (MDSS) design (Sadeghi, Barzi, Sadeghi, & King, 2006).

Neural networks have also been applied to the medical and diagnosis fields, most actively as the basis of a soft computing method to render the complex and fuzzy cognitive process of diagnosis. Many applications, for example, have shown the suitability of neural networks in CDSS design and other biomedical application, including diagnosis of myocardial infarction (Baxt, 1990, Baxt, 1995), differentiation of assorted pathological data (Dybowski & Gant, 1995), MDSS for leukemia management (Chae, Park, Park, & Bae, 1998) and surgical decision support (Li, Liu, Chiu, & Jian, 2000), MDSS for cancer detection (West & West, 2000), assessment of chest-pain patients (Ellenius & Groth, 2000), decision making for birth mode (MacDowell et al., 2001), heart disease diagnosis (Türkoglu, Arslan, & Ilkay, 2002), CDSS for pharmaceutical applications (Mendyk & Jachowicz, 2005), CDSS development for gynecological diagnosis (Mangalampalli, Mangalampalli, Chakravarthy, & Jain, 2006), and biological signal classification (Güven & Kara, 2006). Recently, multilayer perceptrons (MLP), one of the most popular ANN models, has been applied to build an MDSS for five different heart diseases diagnoses (Yan et al., 2006). The three-layered MLP with 40 categorical input variables and modified learning method achieved a diagnosis accuracy of over 90%.

Support vector machines are a new and promising classification and regression technique proposed by Vapnik and his co-workers (Cortes and Vapnik, 1995, Vapnik, 1995). SVMs, developed in statistical learning theory, are recently of increasing interest to biomedical researchers. They are not only theoretically well-founded, but are also superior in practical applications. For medical, clinical decision support and biological domains, SVMs have been successfully applied to a wide variety of application domains, including MDSS for the diagnosis of tuberculosis infection (Veropoulos, Cristianini, & Campbell, 1999), tumor classification (Schubert, Müller, Fritz, Lichter, & Eils, 2003), myocardial infarction detection (Conforti & Guido, 2005), biomarker discovery (Prados et al., 2004), and cancer diagnosis (Majumder, Ghosh, & Gupta, 2005).

Hybrid models. Besides single model-based approaches, hybrid machine learning approaches have also been tried to boost the performance of conventional single model methods and to overcome the inherent weaknesses in any single method. Many hybrid model approaches have been proposed, including a hybrid expert system for epileptic crisis decision using an ANN and a fuzzy method (Brasil, de Azevedo, & Barreto, 2001), an ANN with a DT for the development of an intelligent decision support system (Tung, Huang, Chen, & Shih, 2005), and an SVM with an ANN for electromyogram classification (Güler & Koçer, 2005). Recently, a novel SVM method in combination with DT to generate human-understandable rules was proposed to alleviate the difficulty of understanding that arises from the black box characteristic of SVMs in transmembrane segments prediction (He, Hu, Harrison, Tai, & Pan, 2006). Their approach achieved prediction accuracy of 93% with understandable prediction rules and with confidence values over 90%.

Ensemble models. To overcome the limited generalization performance of single models and simple model combination approaches, more precise model combination methods, called “ensemble methods”, have been suggested. This multiple classifier combination is a technique that combines the decisions of different classifiers that are trained to solve the same problem but make different errors. Ensembles can reduce the variance of estimation errors and improve the overall classification accuracy. Many ensemble-based approaches have been proposed in recent research, including an ANN ensemble for decision support system (Ohlsson, 2004), an ensemble of ANNs for breast cancer and liver disorder prediction (Yang & Browne, 2004), MDSS with an ensemble of several different classifiers for breast diagnosis (West, Mangiameli, Rampal, & West, 2005), and multiple classifier combinations with an evolutionary approach (Kim, Min, & Han, 2006).

The majority of conventional CDSSs for disease diagnosis are generally based on the symptoms of the patient or data from simple medical questionnaires. To our knowledge, a CDSS for CVD diagnosis using an ensemble of multiple classifiers for comprehensive diagnosis and possible biomarker mining does not currently exist. The aim of this project is to develop a CDSS utilizing the expression information of physiological functional proteins with classifier ensembles for patient diagnosis. The patient’s serum microarray chip data are analyzed with several different classifiers in the ensemble. The developed system, AptaCDSS-E (Aptamer biochip-based CDSS – ensemble version), supports physicians by providing supplementary diagnosis information and clinicians by providing a possible set of biomarker candidates which can be used effectively for practical CVD diagnosis after some further experimental verifications.

The rest of the paper is organized as follows: In Section 2 we outline the system architecture, describe several key components of the system, and review the four basis classifiers used in our proposed system for disease level classification. In Section 3, the framework for constructing classifier ensembles is presented. Experimental results are reported in Section 4, including data description, preprocessing and feature selection, quality analysis of data, the possible marker proteins discovered by the system, and discussions of the results. Section 5 draws conclusions from this study.

Section snippets

The system architecture of AptaCDSS-E

The reviews of CDSS in literature show that very few studies involve field tests of a CDSS and almost none use a naturalistic design in routine clinical settings with real patients. Moreover, the studies mostly concern physicians rather than other clinicians (Kaplan, 2001). On this point, in the development of AptaCDSS-E we considered both clinicians and physicians equally by providing diagnosis support information to physicians and by providing the information about possible biomarker

Need for a classifier ensemble

The complexity and subtlety of microarray expression patterns between CVD patients and normal samples may increase the chance of misclassification when a single classifier is used because a single classifier tends to cover patterns originating from only part of the sample space. Therefore, it would be beneficial if multiple classifiers could be trained in such a way that each of the classifiers covers a different part of the sample space and their classification results were integrated to

Experimental results and discussion

The experimental steps of aptamer chip-based disease level classification with multiple classifiers are summarized in Fig. 9. The steps in category A were performed by the data supplier, and in this research we performed the steps with solid border in category B and C with the AptaCDSS-E. The final experimental verification of discovered possible biomarkers will be conducted in future work.

Conclusions

We have presented a classifier ensemble-based clinical decision support system called AptaCDSS-E for disease level prediction with aptamer biochip data. The system employs four different machine learning classifiers, combines the prediction results of each classifier in an ensemble machine, and generates supplementary information for disease diagnosis. The system was trained with four different disease data sets consisting of 242 cases including cardiovascular disease and the data sets were

Acknowledgements

This work was supported by the Korea Science and Engineering Foundation (KOSEF) through the National Research Lab. Program funded by the Ministry of Science and Technology (No. M10400000349-06J0000-34910) and supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF-2006-511-D00355). The authors would like to thank Byoung-Hee Kim, Je-Keun Rhee, Min-Oh Heo, Young-Jin Park, and Min-Hyeok Kim for the construction of AptaCDSS platform, and developers and

References (56)

  • M.-J. Kim et al.

    An evolutionary approach to the combination of multiple classifiers to predict a stock price index

    Expert Systems with Applications

    (2006)
  • T.-F. Liu et al.

    Model gene network by semi-fixed Bayesian network

    Expert Systems with Applications

    (2006)
  • A. Mangalampalli et al.

    A neural network based clinical decision-support system for efficient diagnosis and fuzzy-based prescription of gynecological diseases using homoeopathic medicinal system

    Expert Systems with Applications

    (2006)
  • A. Mendyk et al.

    Neural network as a decision support system in the development of pharmaceutical formulation – focus on solid dispersions

    Expert Systems with Applications

    (2005)
  • M. Ohlsson

    WeAidU – A decision support system for myocardial perfusion images using artificial neural networks

    Artificial Intelligence in Medicine

    (2004)
  • S. Sadeghi et al.

    A Bayesian model for triage decision support

    International Journal of Medical Informatics

    (2006)
  • H. Shin et al.

    A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples

    Journal of Biomedical Informatics

    (2006)
  • D.R.B. Stockwell

    LBS: Bayesian learning system for rapid expert system development

    Expert Systems with Applications

    (1993)
  • K.-Y. Tung et al.

    Mining the generation xers’ job attitudes by artificial neural network and decision tree – Empirical evidence in Taiwan

    Expert Systems with Applications

    (2005)
  • I. Türkoglu et al.

    An expert system for diagnosis of the heart valve diseases

    Expert Systems with Applications

    (2002)
  • D. West et al.

    Ensemble strategies for a medical diagnostic decision support system: A breast cancer diagnosis application

    European Journal of Operational Research

    (2005)
  • D. West et al.

    Model selection for a medical diagnostic decision support system: A breast cancer detection case

    Artificial Intelligence in Medicine

    (2000)
  • H.-M. Yan et al.

    A multilayer perceptron-based medical decision support system for heart disease diagnosis

    Expert Systems with Applications

    (2006)
  • E. Bauer et al.

    An empirical comparison of voting classification 37 algorithms: Bagging, boosting, and variants

    Machine Learning

    (1999)
  • W.G. Baxt

    Use of an artificial neural network for data analysis in clinical decision making: The diagnosis of acute coronary occlusion

    Neural Computing

    (1990)
  • C.M. Bishop

    Neural networks for pattern recognition

    (1995)
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • CDC’s Report 1. Accessed: 29.05.06....
  • Cited by (106)

    • Conceptual design of a machine learning-based wearable soft sensor for non-invasive cardiovascular risk assessment

      2021, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      The literature reports several contributions with artificial intelligence algorithms for diagnosing or predicting cardiovascular diseases. Machine learning techniques are generally applied to imaging examinations [16–19], such as computer tomography, or to the classification of cardiac diseases [20–24]. Conversely, with particular reference to wearable systems, machine learning is usually employed to process data.

    • Machine learning models for synthesizing actionable care decisions on lower extremity wounds

      2020, Smart Health
      Citation Excerpt :

      Machine learning (ML) methods are increasingly applied to clinical issues. Using ML ensembles, Econ et al. (2008)E detected cardiovascular disease from protein expression levels in blood samples. They found that bagged ensembles were more accurate than single classifiers, such as Decision Trees (DTs), Support Vector Machines (SVMs), Multi-layer Perceptron Artificial Neural Networks (MLP ANNs) and Bayesian networks.

    • MED-TMA: A clinical decision support tool for differential diagnosis of TMA with enhanced accuracy using an ensemble method

      2020, Thrombosis Research
      Citation Excerpt :

      Our validation successfully showed its improved performance, and the case series in the Supplementary results suggests that the results of MED-TMA are more accurate and intuitive than those of previous models. The ensemble approach can reduce the variance of errors, and thereby the stability of the prediction model can be increased [33]. Indeed, MED-TMA demonstrates the advantages of the ensemble model with superior prediction power compared with other methods [24,27] by integrating four ML methods (LR, NNET, RF, and NB).

    View all citing articles on Scopus
    View full text