AptaCDSS-E: A classifier ensemble-based clinical decision support system for cardiovascular disease level prediction
Introduction
Recent surveys show that cardiovascular disease (CVD), which includes heart disease and stroke, is one of the leading causes of death regardless of sex in the United States and all over the world (CDC’s Report 1). From the report, CVD accounts for nearly 40% of all deaths in the US annually. While these largely preventable diseases are more prevalent among people aged more than 65, the number of sudden deaths from heart disease among people aged 15–34 has also increased substantially (CDC’s Report 2). Therefore, significant life savings can be achieved if a precise diagnosis can be made to CVD patients. Correct diagnosis, however, is not easy to make and is often delayed due to the many factors complicating disease diagnosis. For example, clinical symptoms, functional, and pathologic manifestations of heart disease are often associated with many other human organs besides the heart itself, and often heart disease may show diverse syndromes. Furthermore, different types of heart disease can have similar symptoms, further complicating diagnosis (Yan, Jiang, Zheng, Peng, & Li, 2006).
To reduce the time of intensive diagnosis and to improve diagnosis accuracy, the development of reliable and powerful clinical decision support systems (CDSSs) that support the aforementioned increasingly complicated diagnosis decision processes in the medical diagnosis is crucial (Yan et al., 2006). Recently, many medical institutions are increasingly adopting tools that offer decision support to improve patient outcomes and reduce clinical diagnosis errors and costs.
In the last two decades, the use of artificial intelligence tools has become widely accepted in medical applications to support patient diagnosis more effectively. Especially, the application of various machine learning approaches such as decision trees (DTs), artificial neural networks (ANNs), Bayesian networks (BNs), and support vector machines (SVMs) have been actively tried for meeting clinical support requirements. Consequently, CDSS or medical diagnosis systems using different machine learning approaches have shown great potential, and many machine learning methods have been tried for a wide variety of clinical and medical applications. Here we briefly review some part of the previous work in this area before presenting our own machine-learning-based approach.
The use of decision trees is one of the most popularly applied methods for CDSS due to its simplicity and capacity for humanly understandable inductive rules. Many researchers have employed DT to resolve various biological problems, including diagnostic error analysis (Murphy, 2001), potential biomarker finding (Qu et al., 2002, Won et al., 2003), and proteomic mass spectra classification (Geurts et al., 2005).
Bayesian networks are a probability-based inference model, increasingly used in the medical domain as a method of knowledge representation for reasoning under uncertainty for a wide range of applications, including disease diagnosis (Balla, Iansek, & Elstein, 1985), genetic counseling (Harris, 1990), expert system development (Stockwell, 1993), gene network modeling (Liu, Sung, & Mittal, 2006), and emergency medical decision support system (MDSS) design (Sadeghi, Barzi, Sadeghi, & King, 2006).
Neural networks have also been applied to the medical and diagnosis fields, most actively as the basis of a soft computing method to render the complex and fuzzy cognitive process of diagnosis. Many applications, for example, have shown the suitability of neural networks in CDSS design and other biomedical application, including diagnosis of myocardial infarction (Baxt, 1990, Baxt, 1995), differentiation of assorted pathological data (Dybowski & Gant, 1995), MDSS for leukemia management (Chae, Park, Park, & Bae, 1998) and surgical decision support (Li, Liu, Chiu, & Jian, 2000), MDSS for cancer detection (West & West, 2000), assessment of chest-pain patients (Ellenius & Groth, 2000), decision making for birth mode (MacDowell et al., 2001), heart disease diagnosis (Türkoglu, Arslan, & Ilkay, 2002), CDSS for pharmaceutical applications (Mendyk & Jachowicz, 2005), CDSS development for gynecological diagnosis (Mangalampalli, Mangalampalli, Chakravarthy, & Jain, 2006), and biological signal classification (Güven & Kara, 2006). Recently, multilayer perceptrons (MLP), one of the most popular ANN models, has been applied to build an MDSS for five different heart diseases diagnoses (Yan et al., 2006). The three-layered MLP with 40 categorical input variables and modified learning method achieved a diagnosis accuracy of over 90%.
Support vector machines are a new and promising classification and regression technique proposed by Vapnik and his co-workers (Cortes and Vapnik, 1995, Vapnik, 1995). SVMs, developed in statistical learning theory, are recently of increasing interest to biomedical researchers. They are not only theoretically well-founded, but are also superior in practical applications. For medical, clinical decision support and biological domains, SVMs have been successfully applied to a wide variety of application domains, including MDSS for the diagnosis of tuberculosis infection (Veropoulos, Cristianini, & Campbell, 1999), tumor classification (Schubert, Müller, Fritz, Lichter, & Eils, 2003), myocardial infarction detection (Conforti & Guido, 2005), biomarker discovery (Prados et al., 2004), and cancer diagnosis (Majumder, Ghosh, & Gupta, 2005).
Hybrid models. Besides single model-based approaches, hybrid machine learning approaches have also been tried to boost the performance of conventional single model methods and to overcome the inherent weaknesses in any single method. Many hybrid model approaches have been proposed, including a hybrid expert system for epileptic crisis decision using an ANN and a fuzzy method (Brasil, de Azevedo, & Barreto, 2001), an ANN with a DT for the development of an intelligent decision support system (Tung, Huang, Chen, & Shih, 2005), and an SVM with an ANN for electromyogram classification (Güler & Koçer, 2005). Recently, a novel SVM method in combination with DT to generate human-understandable rules was proposed to alleviate the difficulty of understanding that arises from the black box characteristic of SVMs in transmembrane segments prediction (He, Hu, Harrison, Tai, & Pan, 2006). Their approach achieved prediction accuracy of 93% with understandable prediction rules and with confidence values over 90%.
Ensemble models. To overcome the limited generalization performance of single models and simple model combination approaches, more precise model combination methods, called “ensemble methods”, have been suggested. This multiple classifier combination is a technique that combines the decisions of different classifiers that are trained to solve the same problem but make different errors. Ensembles can reduce the variance of estimation errors and improve the overall classification accuracy. Many ensemble-based approaches have been proposed in recent research, including an ANN ensemble for decision support system (Ohlsson, 2004), an ensemble of ANNs for breast cancer and liver disorder prediction (Yang & Browne, 2004), MDSS with an ensemble of several different classifiers for breast diagnosis (West, Mangiameli, Rampal, & West, 2005), and multiple classifier combinations with an evolutionary approach (Kim, Min, & Han, 2006).
The majority of conventional CDSSs for disease diagnosis are generally based on the symptoms of the patient or data from simple medical questionnaires. To our knowledge, a CDSS for CVD diagnosis using an ensemble of multiple classifiers for comprehensive diagnosis and possible biomarker mining does not currently exist. The aim of this project is to develop a CDSS utilizing the expression information of physiological functional proteins with classifier ensembles for patient diagnosis. The patient’s serum microarray chip data are analyzed with several different classifiers in the ensemble. The developed system, AptaCDSS-E (Aptamer biochip-based CDSS – ensemble version), supports physicians by providing supplementary diagnosis information and clinicians by providing a possible set of biomarker candidates which can be used effectively for practical CVD diagnosis after some further experimental verifications.
The rest of the paper is organized as follows: In Section 2 we outline the system architecture, describe several key components of the system, and review the four basis classifiers used in our proposed system for disease level classification. In Section 3, the framework for constructing classifier ensembles is presented. Experimental results are reported in Section 4, including data description, preprocessing and feature selection, quality analysis of data, the possible marker proteins discovered by the system, and discussions of the results. Section 5 draws conclusions from this study.
Section snippets
The system architecture of AptaCDSS-E
The reviews of CDSS in literature show that very few studies involve field tests of a CDSS and almost none use a naturalistic design in routine clinical settings with real patients. Moreover, the studies mostly concern physicians rather than other clinicians (Kaplan, 2001). On this point, in the development of AptaCDSS-E we considered both clinicians and physicians equally by providing diagnosis support information to physicians and by providing the information about possible biomarker
Need for a classifier ensemble
The complexity and subtlety of microarray expression patterns between CVD patients and normal samples may increase the chance of misclassification when a single classifier is used because a single classifier tends to cover patterns originating from only part of the sample space. Therefore, it would be beneficial if multiple classifiers could be trained in such a way that each of the classifiers covers a different part of the sample space and their classification results were integrated to
Experimental results and discussion
The experimental steps of aptamer chip-based disease level classification with multiple classifiers are summarized in Fig. 9. The steps in category A were performed by the data supplier, and in this research we performed the steps with solid border in category B and C with the AptaCDSS-E. The final experimental verification of discovered possible biomarkers will be conducted in future work.
Conclusions
We have presented a classifier ensemble-based clinical decision support system called AptaCDSS-E for disease level prediction with aptamer biochip data. The system employs four different machine learning classifiers, combines the prediction results of each classifier in an ensemble machine, and generates supplementary information for disease diagnosis. The system was trained with four different disease data sets consisting of 242 cases including cardiovascular disease and the data sets were
Acknowledgements
This work was supported by the Korea Science and Engineering Foundation (KOSEF) through the National Research Lab. Program funded by the Ministry of Science and Technology (No. M10400000349-06J0000-34910) and supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF-2006-511-D00355). The authors would like to thank Byoung-Hee Kim, Je-Keun Rhee, Min-Oh Heo, Young-Jin Park, and Min-Hyeok Kim for the construction of AptaCDSS platform, and developers and
References (56)
- et al.
Bayesian diagnosis in presence of preexisting disease
Lancet
(1985) Application of artificial neural networks to clinical medicine
Lancet
(1995)- et al.
Hybrid expert system for decision supporting in the medical area: Complexity and cognitive computing
International Journal of Medical Informatics
(2001) - et al.
Artificial neural networks in pathology and medical laboratories
Lancet
(1995) - et al.
Transferability of neural network-based decision support algorithms for early assessment of chest-pain patients
International Journal of Medical Informatics
(2000) - et al.
Classification of electro-oculogram signals using artificial neural network
Expert Systems with Applications
(2006) Probabilistic belief networks for genetic counseling
Computer Methods and Programs in Biomedicine
(1990)- et al.
Transmembrane segments prediction and understanding using support vector machine and decision tree
Expert Systems with Applications
(2006) Evaluating informatics applications-clinical decision support systems literature review
International Journal of Medical Informatics
(2001)- et al.
Constructing support vector machine ensemble
Pattern Recognition
(2003)
An evolutionary approach to the combination of multiple classifiers to predict a stock price index
Expert Systems with Applications
Model gene network by semi-fixed Bayesian network
Expert Systems with Applications
A neural network based clinical decision-support system for efficient diagnosis and fuzzy-based prescription of gynecological diseases using homoeopathic medicinal system
Expert Systems with Applications
Neural network as a decision support system in the development of pharmaceutical formulation – focus on solid dispersions
Expert Systems with Applications
WeAidU – A decision support system for myocardial perfusion images using artificial neural networks
Artificial Intelligence in Medicine
A Bayesian model for triage decision support
International Journal of Medical Informatics
A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples
Journal of Biomedical Informatics
LBS: Bayesian learning system for rapid expert system development
Expert Systems with Applications
Mining the generation xers’ job attitudes by artificial neural network and decision tree – Empirical evidence in Taiwan
Expert Systems with Applications
An expert system for diagnosis of the heart valve diseases
Expert Systems with Applications
Ensemble strategies for a medical diagnostic decision support system: A breast cancer diagnosis application
European Journal of Operational Research
Model selection for a medical diagnostic decision support system: A breast cancer detection case
Artificial Intelligence in Medicine
A multilayer perceptron-based medical decision support system for heart disease diagnosis
Expert Systems with Applications
An empirical comparison of voting classification 37 algorithms: Bagging, boosting, and variants
Machine Learning
Use of an artificial neural network for data analysis in clinical decision making: The diagnosis of acute coronary occlusion
Neural Computing
Neural networks for pattern recognition
Bagging predictors
Machine Learning
Cited by (106)
A biologically-inspired hybrid deep learning approach for brain tumor classification from magnetic resonance imaging using improved gabor wavelet transform and Elmann-BiLSTM network
2022, Biomedical Signal Processing and ControlConceptual design of a machine learning-based wearable soft sensor for non-invasive cardiovascular risk assessment
2021, Measurement: Journal of the International Measurement ConfederationCitation Excerpt :The literature reports several contributions with artificial intelligence algorithms for diagnosing or predicting cardiovascular diseases. Machine learning techniques are generally applied to imaging examinations [16–19], such as computer tomography, or to the classification of cardiac diseases [20–24]. Conversely, with particular reference to wearable systems, machine learning is usually employed to process data.
Heart disease diagnosis systematic research using data mining and soft computing techniques
2021, Materials Today: ProceedingsMachine learning models for synthesizing actionable care decisions on lower extremity wounds
2020, Smart HealthCitation Excerpt :Machine learning (ML) methods are increasingly applied to clinical issues. Using ML ensembles, Econ et al. (2008)E detected cardiovascular disease from protein expression levels in blood samples. They found that bagged ensembles were more accurate than single classifiers, such as Decision Trees (DTs), Support Vector Machines (SVMs), Multi-layer Perceptron Artificial Neural Networks (MLP ANNs) and Bayesian networks.
MED-TMA: A clinical decision support tool for differential diagnosis of TMA with enhanced accuracy using an ensemble method
2020, Thrombosis ResearchCitation Excerpt :Our validation successfully showed its improved performance, and the case series in the Supplementary results suggests that the results of MED-TMA are more accurate and intuitive than those of previous models. The ensemble approach can reduce the variance of errors, and thereby the stability of the prediction model can be increased [33]. Indeed, MED-TMA demonstrates the advantages of the ensemble model with superior prediction power compared with other methods [24,27] by integrating four ML methods (LR, NNET, RF, and NB).
Clinical predictors for etiology of acute diarrhea in children in resource-limited settings
2020, PLoS Neglected Tropical Diseases