Background
Sepsis is a common and economically significant disease which has become an important public health issue globally and led to over 5.3 million people dies annually with an approximately overall mortality of 30%, particularly in the intensive care unit (ICU) [
1‐
3]. Sepsis is defined as a syndrome of physiologic, pathologic, and biochemical abnormalities induced by infection which results in life-threatening organ dysfunction caused by dysregulated host response [
3]. Different from those previous diagnostic criteria for sepsis, sepsis-3 highlighted the strong association between infection and organ failure according to the Third International Consensus Definitions for Sepsis and Septic Shock in February 2016 [
2], hence, the early identification and diagnosis for sepsis are essential, which could provide meaningful information for clinicians to assess patients' condition and improve survival outcomes through prompt and appropriate treatment. Due to the complex of vague sepsis syndrome definitions, unknown sources of infection and higher mortality, it is necessary to establish a reliable and effective prognostic model for sepsis. With the help of these prognostic models, strong evidences for clinical decision-making and rational allocation of public health care resources can be provided.
The establishment of prognosis model for sepsis patients has always been a hot topic in critical care medicine. Some sensitive serum markers, such as Ang-2, PCT, interleukin-6, pentraxin 3, etc. [
1,
4,
5], have been widely used to facilitate sepsis prognosis, however, their prognostic values are limited, not only rarely available but often lack of sensitivity or specificity. On the other hand, traditional prediction models based on small sample data such as logistic regression analysis and scoring systems including acute physiology and chronic health evaluation-II (APHACHE-II), Simplified acute physiology score-II (SAPS-II) and etc. [
6‐
8], are still providing comprehensively clinical importance of identifying patients who are at risk of unfavourable prognostic outcomes, but these methods and scores require the statistical assumption of the independent and linear relationship between explanatory and outcome variables or preclude the analysis of a large number of valuable variables. In addition, insufficient prognostic strength, large fluctuation range, poor stability and operability, tedious process, and other shortcomings exist in these predictive serum markers, models and scores to a certain extent.
Recently, novel machine learning techniques have demonstrated improved predictive performance compared to traditional prediction methods. Moreover, the evolution of statistical theory, computer technology and the establishment of specialized database for critical care medical such as MIMIC-III could help machine learning get more attention and recognition by clinicians. eXtreme Gradient Boosting (XGBoost) is a machine learning technique with the remarkable features of processing the missing data efficiently and flexibly and assembling weak prediction models to build a accurate one [
9]. As an open source package, XGBoost has been widely recognized in a number of machine learning and data mining challenges, for example, 17 solutions used XGBoost among the 29 challenge winning solutions published at Kaggle’s blog in 2015 and the top-10 winning teams used XGBoost in KDD Cup 2015 [
10].
Therefore, the goal of the study was twofold: firstly, we attempted to compare the performance of machine learning (XGboost) model with traditional prediction models (conventional logistic regression model and SAPS-II score model) in the prediction of the 30-days mortality in MIMIC-IIIpatients with sepsis-3. Secondly, we planned to plot nomogram and clinical impact curve (CIC) to validate the XGboost model.
Discussion
Sepsis, which is associated with profound mortality and substantial economic burden, is no longer defined simply as serious infection. In a systematic review and meta-analysis, Reinhart et al. [
20] concluded that the mortality rate estimate of ICU- and hospital-treated sepsis patients were 41.9% and 26.7% respectively, or one out of four sepsis patients did not survive their hospital stay. Torio et al. [
21] estimated sepsis accounted for 6.2% of the aggregate costs for all hospitalizations, or 23.7 billion USD in 2011. Furthermore, Moss et al. [
22] conducted a study spanning two decades (from 1979 to 2000), which reported the annual increase of sepsis cases was around 8.7%. The improvement of sepsis prevention, recognition, and treatment has been a global health priority since the declaration repeatedly by the World Health Organization (WHO) in 2017 [
23]. Progressive exacerbation of sepsis can lead to organ failure and death, but early aggressive therapy also forestalls further progression and rescues a decompensating patient. Unfortunately, in ICU it is very difficult for clinicians to predict which patients will respond favorably and could be out of the crisis or will deteriorate despite all interventions and resuscitative efforts. At present, these findings indicate the urgent need to increase efforts to promote reliable prediction models to identify patients with sepsis who are at increased risk of developing organ dysfunction and to prognosticate their mortality.
In this present study, the AUCs and DCAs we developed have demonstrated the benefit of using a XGboost model- as opposed to the classic logistic regression analysis and traditional SAPS II scoring system for early prediction of probability of septic mortality. Moreover, CIC and nomogram were plotted to evaluate the clinical usefulness and applicability net benefits of the model with the best diagnostic value. Logistic regression analysis as one of the classic regression analyses is widely used to test the association between sepsis and mortality. For instance, through the logistic regression analysis, Vivien et al. [
24] observed an association between mortality at day 28 and the tidal volume indexed on ideal body weight (VTIBW) in pre-hospital mechanically ventilated patients with septic shock; Wu et al. [
25] revealed that dynamic changes of serum S100B levels from day 3 to 1 were more associated with mortality than those on day 1 in patients with sepsis; Oud et al. [
26] indicated that sepsis was associated with most of the short-term deaths among ICU patients with SLE despite its relatively low mortality; Song et al. [
5] revealed that combined biomarkers approach showed good performance in predicting 28-day all-cause mortality among patients diagnosed with either sepsis or septic shock according to the sepsis-3 definition, however, the differences might not be statistically proven. Furthemore, some studies [
27,
28] found conventional logistic regression had a relatively low indicator of performance as measured by AUCs for ROC curves or showed higher prediction error and worsen performance compared to some novel techniques.
Several conventional prognostic scoring systems have been developed to provide relevant evaluation results considering the hospital mortality of ICU patients. The advantages of such scoring systems are easy to calculate and interpret. SAPS II, as one of the commonly used model, has better discrimination, calibration and power to predict deaths on ICU than the sequential organ failure assessment score (SOFA), which has been recommended for the identification and mortality prognostication of patients in ICU by sepsis-3 [
7]. Moreover, the ability of SAPS II to discriminate between survivors and non-survivors is as excellent as APACHE II score and other scores and even to help to play in end-of-life decision-making in ICUs [
8]. However, the specificity and sensitivity of scoring systems such as SAPS II are low, and the predictive performance is worse than that of multivariate predictive models. Last but not least, the evaluation systems and the accurate outcomes depended heavily on the practitioner’s experience [
6].
In recent years, various machine learning algorithms, a subset of artificial intelligence and a data analysis technique that develops algorithms to predict outcomes by “learning” from data, have been investigated for early detection of sepsis-3 and outperformed than conventional or classic statistic methods, which could automatically analyze complex data and produce significant results. Following is four notable examples of such algorithms. Buchman et al. [
29] concluded that machine learning-based CDS tools can accurately predict the onset of sepsis in an ICU patient 4–12 h prior to clinical recognition. Seymour et al. [
30] performed different machine learning methods and suggested 4 clinical phenotypes may help in understanding the heterogeneity of treatment effects for patients with sepsis. Kashyap et al. [
31] used JMP statistical software to conduct a supervised machine learning for identification of sepsis and septic shock and found it’s a reliable and efficient alternative to manual chart review. Winslow et al. [
32] applied machine learning to features calculated from patient with sepsis to estimate whether or not a patient enters this pre-shock state. However, all those articles mentioned above haven’t verified the superiority of machine learning models or done relevant further analysis or offered interpretation compared to other types of prediction model. More importantly, the primary outcomes of these studies are the emergence of detection of sepsis rather than poor clinical outcomes (i.e. mortality) of sepsis. XGBoost, a decision-tree-based algorithm, has been found to be the best algorithm for machine learning and prediction competition hosted by Kaggle.com [
10,
33]. Due to its best precision value and performance, XGBoost-based algorithm machine learning is increasingly emphasized as a competitive alternative to regression analysis and used in predicting clinical adverse outcomes.
In terms of the prognosis of sepsis, an artificial intelligence algorithm based on XGBoost has been published by Yuan et al. [
9] in 2020. Nevertheless, both of our articles about XGboost models have its own merits. Firstly, there are several limits in Yuan’s study mentioned by himself. For instance, the features selected were according to clinical experience but not algorithm; the representativeness of features may not clear in sepsis and some important dynamic features were not included; left or right censoring may be resulted from incomplete recording of electronic medical records (EMR) when patients transfer or discharge; besides, there were no validations for the XGboost model and no traditional regression analysis was used as a control. Secondly, there are some superiorities in our model compared to Yuan’s machine learning: the features selected were according to backward stepwise analysis which increased representativeness and accuracy; some important features are not missing such as lactate, AG, etc.; data was from MIMIC-III which is an updated database and provides detailed information; classic logistic regression analysis with AUCs and DCAs were used to contrast with XGboost except for traditional scoring system; crucially, nomogram and CIC were plotted to evaluate the clinical usefulness and applicability net benefits of the model. Thirdly, of course, some common limitations exist in both of our articles: measurement bias within calculation is possible due to the method is based on experts’ opinion; sepsis could happen at any time during ICU admission (even possibly hours before labelled), although with the help of algorithm, it’s still difficult for intensivists to integrate the data of point-of-care vital signs, latest lab reports and etc. all the time and to determine the patient condition with sepsis or not according to any database.
An interesting finding in our study is that the features included in the XGBoost-based model and logistic regression model showed consistent, which indicated the excellent performances of XGBoost model were significant, although the two models may fit and perform differently in different datasets. However, these recognitions of the features and sepsis-induced mortality cannot be entirely explained. Hence, further studies and efforts are needed to investigate the mechanisms underlying the role of these variables included in patients with sepsis-3. Following is a brief summary of remarkable or controversial features included in the XGBoost model. Among these features, the weight of urine output is the greatest which represents it is the most important predictor for 30-day mortality MIMIC-III patients. This result is compatible with some clinical studies. Vieira et al. [
34] reported higher urine output is associated with successful enteral nutrition therapy in septic shock patients. Laranja et al. [
35] concluded that septic patients with no acute kidney injury (AKI) had a more preserved urine output compared to that in all groups with AKI or AKI/chronic kidney disease (CKD). Lin et al. [
36] indicated decreased urine output could be manifested as a compensatory mechanism to maintain intravascular volume, and also imply intrinsic renal injury for patients in sepsis. Teixeira et al. [
37] confirmed that the use of diuretics was inversely associated with mortality and itself may exert a protective effect. Sodium_max is an interesting feature in our XGBoost-based model. Hypernatremia can be an independent predictor of poor outcome in septic patients in the ICU, which is similar to some views [
38]. However, another study [
39] showed the risk of death increased by 71.6% when serum sodium was < 129 mmol/L for patients with sepsis. Lactate and AG are typical metabolic indicators. Patients with a normal lactate level alone should not be excluded life-threatening sepsis, and with high AG levels regardless of lactate levels, have high rates of mortality and should also be considered for early, aggressive therapy [
40]. However, Liu and Velissaris et al. [
41,
42] clearly pointed out that plasma lactate were associated with poor outcomes in patients with sepsis and predicted mortality. INR is another crucial predictive factor in the machine learning model. Several studies [
43,
44] found septic patients with elevated INR and platelet count appeared to have a greater risk of death compared with those without coagulopathy. There is no doubt that age and metastatic cancer as basic demographic information could be included in the model which plays unfavorable effects for the mortality. Whereas, survival in critically ill cancer patients with sepsis improved significantly over time but reasons or mechanisms for this condition haven't been identified [
45]. In consideration of the source of infection, we found blood infection ranks the highest (38.49%), followed by MRSA screen (35.49%) and urine (17.36%), which indicates that we can perform empirical antibiotics treatment, but de-escalation or determination of whether or not to stop antibiotics or successful implementation of antimicrobial stewardship may help to improve a patient's clinical prognosis while preventing adverse outcomes [
46].
The strength of this study was mainly that it was the first time to predict the 30 day mortality of MIMIC-III patients with sepsis-3 using the XGBoost model, and compared to traditional regression analysis and clinical scoring system, and meanwhile verified by nomogram and CIC. We must acknowledge some other limitations of our study: firstly, because the data come from only one database and the majority of patients were white, potential bias may occur; secondly, further exploration for the database was not performed, which may lead to the abandonment of some key variables; thirdly, the proposed model was not designed to be validated by developing set from the database or our clinical data. Even so, we believe that the proposed model may contribute to further our understanding of the prognosis of patients suffering from sepsis in ICU.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.