Introduction
Heart failure (HF), a disorder in which systolic or diastolic dysfunction of the heart is attributed to structural or functional abnormalities of the heart [
1]. HF will be the final stage of development of various heart diseases [
2]. Approximately 40 million people worldwide suffer from HF [
3]. Currently, in Europe, the prevalence of HF is approximately 3/1000 person-years (all ages) or approximately 5/1000 person-years in adults [
4,
5]. In the United States, more than 5 million people are living with HF, and the number continues to increase at a rate of 550,000 cases diagnosed each year [
6,
7]. At the same time, there are about 8.9 million HF patients in China, and the prevalence rate of those over 35 years old is 1.3% [
8]. The prevalence of HF increases with age: from about 1% in those < 55 years old to > 10% in those 70 years or older [
9‐
11]. HF, as part of cardiovascular disease and a major public health problem worldwide, is an important cause of rising global mortality [
12]. The annual direct and indirect costs of HF are estimated at $29 billion due to its high prevalence, poor prognosis, and high readmission rates [
13]. In clinical practice, simple yet effective tools play a key role in predicting future events, especially in making decisions about primary prevention and treatment of HF patients. Therefore, effective mortality prediction can help doctors formulate more scientific treatment plans to prevent its deterioration, thereby improving the quality of life and reducing medical expenses.
The Nomogram is used as a graphical device that integrates predictors to determine the probability of a clinical event occurring in a given patient [
14]. The Nomogram is based on a logistic regression (LR) model that integrates multiple clinical predictors and displays these individual predictor contribution scores to accurately predict an individual patient’s risk of a clinical event, helping clinicians to optimize individualized treatment choices and assess treatment outcomes [
15‐
21].
The aim of this study was to develop and validate robust risk assessment models to predict all-cause mortality during hospitalization in HF patients. And to develop Nomogram and develop an online risk assessment system.
Methods
Data source
HF patient data was obtained from the Medical Information Marketplace in Intensive Care (MIMIC) III and MIMIC-IV databases. MIMIC-III contains data associated with 53,423 distinct hospital admissions for adult patients (aged 16 years or above) admitted to critical care units between 2001 and 2012. In addition, it contains data for 7870 neonates admitted between 2001 and 2008. The data covers 38,597 distinct adult patients and 49,785 hospital admissions [
22]. The MIMIC-IV database covers information on all patients at Beth Israel Deaconess Medical Center who recorded 523,740 admissions between 2008 and 2019, of which 76,540 were admitted to the ICU for admission [
23]. Clinical records including demographic data, vital signs, laboratory test results, microbiological culture results, imaging data, treatment regimens, medication records, and survival information are recorded in the MIMIC database. Use of the MIMIC database has been approved by the Beth Israel Deaconess Medical Center and the MIT Review Board. We received permission after applying for and completing the course and testing (Record Nos. 44703031 and 44703032). Informed consent was not required as all patient information in the database is anonymized [
24,
25].
Patients enrollment and data collection
Data were extracted using SQL (Structured Query Language) programming in Navicat Premium (version 15.0.12). Ninth revision of the International Classification of Diseases (ICD-9/10) codes were used to identify all patients hospitalized for congestive HF. Exclusion criteria: 1) patients younger than 18 years or older than 90 years; 2) patients with more than 20% missing data were excluded from the analysis. When patients are older than 90 years, these patients will be assigned an age of 300 years in MIMIC III and 91 years in MIMIC IV. Their actual age is unknown. We assigned the MIMIC-IV data to the training cohort for model building in the training cohort. The MIMIC-III patient data were used to perform the validation function of the model.
After identifying eligible subjects, we collected clinical data including demographics, comorbidities, vital signs, and laboratory parameters. Comorbidities included atrial fibrillation (AF), previous myocardial infarction (p-MI), type 2 diabetes mellitus (T2DM), hypertension, ventricular arrhythmias (VA), and acute kidney injury (AKI). Vital signs were collected from the first recorded results at the time of hospitalization and included heart rate (HR), respiratory rate (RR), temperature (T), Systolic blood pressure (SBP), Diastolic blood pressure (DBP), and mean artery pressure (MAP). Post-admission laboratory parameters were also obtained for the first time. The indicators studied were red blood cells (RBC), white blood cells (WBC), platelets, hemoglobin, hematocrit, mean red blood cell volume (MCV), mean red blood cell hemoglobin volume (MCH), mean red blood cell hemoglobin concentration (MCHC), albumin, alanine aminotransferase (ALT), aspartate Transaminase (AST), total bilirubin (TB), alkaline phosphatase (AP), and blood urea nitrogen (BUN), creatinine, glucose, lactate, total carbon dioxide (T-CO2), arterial partial pressure of oxygen (PaO2), arterial carbon dioxide partial pressure, (PaCO2) arterial oxygen saturation (SaO2), potential of hydrogen (pH), anion gap (AG), base excess (BE), bicarbonate, potassium, sodium, chloride, total calcium (T-calcium), phosphorus, magnesium, activated partial thromboplastin time (APTT), prothrombin time (PT), international normalized ratio (INR).
The diagnosis of AKI is based on the latest International Clinical Practice Guidelines for AKI [
26]. Any of the following three criteria meet the diagnostic criteria. (a) increase in creatinine by ≥ 0.3 mg/dl (≥ 26.5 μmol/L) within 48 h; (b) increase in creatinine to ≥ 1.5 times baseline, which is known or presumed to have occurred within the prior 7 days; (c) urine volume < 0.5 ml/kg/h for 6 h. Patients with CKD stage 5 will be excluded from AKI even if they meet the above criteria. In-hospital AKI diagnoses can also be accessed directly through the officially provided view codes. Hospitalization numbers for ICDs documenting paroxysmal ventricular tachycardia, ventricular flutter, and ventricular fibrillation will be flagged as VA.
Model construction and evaluation
LR models were used for model construction. Nomogram was used to visualize the regression model [
27]. Calibration curves can be used as one of the evaluation indicators of the model to assess the goodness of fit of the model [
28]. Decision curve analysis (DCA) can demonstrate the net benefit of an intervention by estimating the clinical utility of a predictive model based on a threshold probability (the probability of triggering a medical intervention by a physician or patient, corresponding to the probability that the harm of a false-positive intervention exceeds the harm of a false-negative no intervention) [
29,
30]. Once the model was established, data from the test cohort and validation cohort were used to further evaluate the performance of the model. Area and precision-recall curves under the receiver operating characteristic curve (AUC) were used to compare the performance of each model. We also calculated the net reclassification improvement (NRI) and integrated discrimination improvement (IDI) to evaluate the improvement of the new models [
31,
32].
Study endpoint
The endpoint event is in-hospital all-cause mortality; patients whose date of death coincides with the date of discharge or is less than 12 h from the date of discharge will be defined as having experienced in-hospital death.
Statistical analysis
During the data collection phase, every laboratory test result during the patient’s hospitalization will be collected and composed in a huge raw table. At this point, all variables were collated and, to avoid excessive bias. Variables with less than 20% missing values are randomly filled in using multiple interpolation, which is based on the R package “mice”. The missing proportions of all continuous variables before filling are displayed in the (Supplementary Table
1, Supplementary Figs.
1 and
2). Finally, in chronological order, only the results of the patient’s first laboratory examination were retained for the subsequent study.
Categorical variables were described by frequencies and percentages, and differences between groups were determined by chi-square test or Fisher’s exact test. Continuous variables were expressed as mean ± standard deviation or median and interquartile range (IQR), and groups were compared using Student’s t-test or Mann–Whitney u-test.
Univariate LR analyses were first performed, and variables with a probability of inclusion < 0.05 were selected for multivariate LR analysis. Those variables that still had an independent effect on outcome after multivariate correction would be retained. When their P-value is less than 0.001 will be used as predictor variables to develop the model.
The first LR (LR1) model was developed, which was incorporated with all continuous variables whose P-values remained less than 0.001 after multifactorial adjustment. predictors of the LR1 model included: age, RR, PaO2, platelet count, albumin, TB, AP, lactate, pH, BE, and phosphorus.
A second LR model (LR2) was developed by adding the variables AKI and VA to the LR1 model. R software (version 4.2.1) was used for statistical analysis; GraphPad Prism (version 8.3.0) was used to draw graphs; and P < 0.05 was considered statistically significant.
Discussion
The main findings of the current study are as follows: 1) The in-hospital mortality rate of HF patients in the MIMIC database was 13.5%; 2) A total of Nomogram models was used to assess the risk of in-hospital all-cause mortality in HF patients, and we found that Nomogram had good predictive efficacy in the early assessment of the risk of in-hospital all-cause mortality in HF patients. We found that Nomogram had good predictive efficacy for early assessment of all-cause mortality risk in hospitals in HF patients, with an AUC of 0.782 in the training cohort and 0.766 in the test cohort of the LR2 model; 3) We found that age, albumin, sodium, bicarbonate, lactate, magnesium, phosphate, platelets, AG, T-CO2, MCV, HR, PaO2, AP, BE, RBC, RR, TB, WBC, pH, AKI and VA are independent factors influencing in-hospital mortality in HF patients.
The risk of in-hospital all-cause mortality in HF patients in previous studies was 2.86% to 14.5% [
34‐
38]. In contrast, the in-hospital all-cause mortality of heart failure patients in this study was high. The possible reasons for this are as follows: (A) first, the median age of both MIMIC-III and MIMIC-IV was high (> 65 years) in both groups, suggesting that our study population may have more underlying disease and also the functional state of body organs is poor [
39]; (B) secondly, HF as an end-stage outcome of cardiac disease is characterized by a very poor prognosis, and the mortality rate of our study population is naturally higher than that of HF patients in general wards because they are from intensive care units [
40,
41]. Therefore, early identification is very important. It helps clinicians to take preventive measures in advance.
In this study, we found markers that are not specific to heart disease but are good predictors of patient prognosis. Bicarbonate was most often elevated in patients with more severe HF [
42], warning of a marker of severe HF. The study have noted that serum magnesium levels less than or equal to 2 mEq/L were associated with increased cardiovascular mortality [
43]. However, there have also been systematic reviews and meta-analyses showing that elevated blood magnesium is associated with an increased risk of cardiovascular (CV) mortality and all-cause mortality [
44,
45]. The study of Guo W and Nakano H suggested that abnormalities in BE increase the risk of all-cause mortality [
46,
47]. Unexpectedly, elevated serum phosphorus is associated with increased morbidity and mortality even when renal function is normal [
48,
49]. Could AP abnormalities in HF patients, which are associated with significant signs of systemic congestion and elevated right-sided filling pressures [
50], provide a new marker for the diagnosis of HF? Elevated bilirubin levels were significantly associated with the risk of death in pump failure [
51,
52], suggesting that clinicians should pay more attention to bilirubin levels in HF patients and may take certain therapeutic measures as early as possible. In conclusion, the results in our clinic are broadly in line with all previously reported findings.
Using the LR model, the risk probability of the derived population was categorized into < 10%, 10–30%, and > 30%, which were defined as low-, medium-, and high-risk categories, respectively. In addition, risk stratification was also presented in the external validation dataset. We documented the feasibility of the LR model to distinguish risky patients from other populations. By using the LR model, the risk probability of each patient can inform and support the clinician’s decision making. However, there were some deaths in the low-risk stratum and some survivors in the high-risk stratum. We suspect that these exceptions may be due to the different phenotypes of HF patients in the various risk strata. HF involves multiple pathophysiologic mechanisms, which may lead to clinically heterogeneous phenotypes [
53]. For example, unsupervised clustering analysis based on machine learning was used to differentiate between different phenotypes of heart failure with preserved ejection fraction (HFpEF) patients [
54]. Therefore, in future studies, we may use other methods for further analysis and perform experimental validation.
Peng S et al. developed a clinical prediction model for 28-day all-cause in-hospital mortality in critically ill patients with heart failure combined with hypertension using machine learning, in which Neural Network (NN) performed the best, with an AUC of 0.764 [
55]. Li J et al. developed several machine learning models, and found that XGBoost, LR models performed excellently [
56]. The logistic regression model was effective in improving the accuracy of risk stratification for in-hospital mortality in patients with HF. However, the sample size of this study was relatively small and included many variables, which is not conducive to clinical generalization. With the development of concepts such as real-world research and precision therapy, there is an increasing demand for medical big data processing by researchers. Therefore, we tried to explore a predictive model for the risk of in-hospital death in heart failure with a larger sample size and more robustness from another study.
We tried to develop a new model rather than validate the original model. The reason for this is that the variables included in previously developed models are not fully accessible. For example, the H2FPEF and HFA-PEFF scores [
57] developed by Ouwerkerk W et al. for the diagnosis of ejection fraction preserved heart failure, and the more commonly used Meta-analysis Global Group in Chronic Heart Failure (MAGGIC) score [
58]. Both performed well, but the former contains cardiac ultrasound data, and the latter contains BMI, NYHA classification, and other metrics not available from the MIMIC database. We had to abandon the validation of the developed model in favor of developing a new one.
This study used a high quality, large sample size database, MIMIC. there are several advantages to using the database. First, it is one of the few critical care databases that is freely accessible. Second, the dataset spans more than a decade and contains a wealth of detailed information about patient care. Third, once data use agreements are accepted, there are no restrictions on analysis by researchers, enabling clinical research and education around the world. Finally, data can be downloaded from multiple sources [
22].
There are several limitations in the current study. Firstly, although the internal validation of the model yielded the best discrimination and excellent calibration, the data came from public databases. Therefore, the generalizability of the column plot still needs to be externally validated using other medical centers. Further training in prospective studies could significantly improve the predictive performance and stability of the column plot; Second, although the column chart has been widely used in clinical practice to assist in medical decision making, we would like to further simplify the model and expand its usage scenarios. Finally, the model can be significantly improved by incorporating imaging data, such as cardiac ultrasound, electrocardiogram, and other parameters, or circulating biomarkers that are more predictive.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.