Introduction
As the unstable and progressive stage of coronary heart disease (CHD), acute coronary syndrome (ACS) includes three serious and life-threating clinical manifestations: ST-segment elevation myocardial infarction (STEMI), non-STEMI, and unstable angina pectoris [
1,
2]. The prognosis of ACS patients varies considerably for different pathophysiological changes in individuals based on their level of disease. Thus, a formal assessment to identify high-risk patients is essential in the management of ACS [
3].
Currently, the Global Registry of Acute Coronary Events (GRACE) is the most commonly used risk assessment tool and is recommended by guidelines for predicting short- and long-term mortality [
4]. However, the GRACE risk score was developed in North America, South America, and Europe but included few participants in Asia [
5]. The clinical performance of this risk score has not been assessed in the Chinese population. A risk tool derived from Clinical Pathways for Acute4 Coronary Syndromes (CPACS) investigators for Chinese patients with ACS has been described previously [
6]. However, this CPACS risk score only predicted hospital mortality and did not use algorithms to avoid overfitting in the model estimation. Therefore, we aim to develop a specific risk model for the prediction of long-term (3-year) mortality for Chinese ACS patients in a hospital-based dataset.
Discussion
In this study, we developed a risk model to predict the long-term mortality in Chinese ACS patients and performed internal validation of this model. Compared to the GRACE risk score, our risk model demonstrated better discriminative ability, improved calibration and a greater net benefit for clinical performance. Furthermore, we used machine learning methods such as random forest imputation and a penalty algorithm to maintain statistical power and avoid overfitting during model derivation. To the best of our knowledge, this is the first prediction model for long-term mortality in Chinese ACS patients.
The predictors selected by this risk model include “Age,” “Creatinine,” “Hemoglobin,” “Platelets,” “AST,” and “LVEF”. These risk factors could be supported by existing theories and research. Usually, older patients are more fragile and have more comorbidities. Many studies have considered age to be an independent predictor of ACS, and in studies focusing on other risk factors, age usually needs to be adjusted [
7]. Creatinine or eGFR levels are thought to be associated with mid- and long-term mortality in ACS patients, and ACS patients with renal insufficiency are more likely to experience bleeding and other complications when given invasive treatment [
8,
9]. In previous studies, baseline hemoglobin levels or anemia status were predictors of 30-day and 1-year mortality in patients with ACS or STEMI [
10], while hemoglobin levels of 1416 g/dl resulted in the lowest risk of death [
11]. Studies have reported that AST is associated with microvascular obstruction in ACS patients, and its predictive value is even better than that of NT-proBNP [
12]. A meta-analysis of 8 studies indicated that high baseline platelet levels would increase short-term and long-term mortality in ACS patients [
13], which may be related to the pathological basis of coronary heart disease involving the platelet-granulocytic system and acute pathogenesis of ACS involving intravascular inflammatory mechanisms [
14]. Finally, the LVEF is considered as a marker of cardiac function in heart disease, and the guidelines also recommend ultrasound or angiography for NST-ACS patients to evaluate left ventricular function [
3]. Low baseline LVEF is a predictor of mortality and MACE in ACS patients [
15].
The GRACE risk score was developed based on 123 hospitals in 14 countries but only involved a small number of Chinese patients [
4]. Most of the related studies on risk assessment of Chinese ACS patients investigated the domestic optimization or application of GRACE risk score. Previous CPACS studies have only reported patients with in-hospital mortality [
6]. Therefore, there is no long-term risk prediction tool for the Chinese ACS population. This study is the first attempt for this purpose, and several new machine learning algorithms were used to improve the accuracy of the model.
This prediction model was established according to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement [
16], and we also referred to the opinion of ABCD proposed by Ewout W for validation [
17]. Additionally, we did not use the conventional multiple imputations for missing value processing but applied a novel random forest algorithm. The random forest algorithm has been demonstrated as an efficient method to handle missing data. It can manage different types of missing data and can scale to high dimensions [
18]. Several different imputation algorithms based on random forests have been developed, and among them, MissForest was found to have a noticeable improvement on performance compared to other methods such as the k-nearest neighbors and parametric MICE methods [
19]. In this dataset, random forest imputation had a higher statistical power and better accuracy for prediction than complete-case analyses (AUC of the ROC for tenfold cross-validation, 0.744).
In the model derivation, we used the LASSO-Cox method to estimate the relationship between predictors and time-event. LASSO regularization is a method to manage overfitting and perform variable selection and has been widely used many types of machine learning algorithms [
20]. It adds the L1 norm of coefficients as the penalty term to the loss function and hence adds constraints to the coefficients. In contrast to ridge regularization, LASSO regularization performs different degrees of shrinkage on variables and pushes some coefficients to zero. When adding the LASSO method to the Cox model, the estimation variance is reduced, and a subset of predictors is selected while providing an interpretable Cox model [
21]. To ensure the accuracy of the model, we did not use a nomogram to simplify the parameters in the model presentation but to estimate the patient’s death risk through the cumulative hazard using the Cox model. This model showed good consistency (AUC of the ROC for tenfold cross-validation and C-statistic) for patients who died within 3 years and good agreement (slope and plot) for the actual and predicted 3-year mortality risk. The clinical usefulness of this model mainly lies in its ability to quantify the long-term mortality of patients by combining baseline data before angiography at an individual level. DCA could be used to evaluate whether our model is more advantageous for clinical applications than the GRACE model, which is currently widely used in clinical research [
22]. This method could help physician to assess the value of information provided by a risk assessment tool or test by weighted the potential risk and benefit [
23]. For all risk thresholds > 0%, our model showed a higher net benefit than the GRACE model. Therefore, we believe that our model can better help patients understand the disease and help doctors make clinical decisions. Particularly for patients with a high risk of ACS, doctors can use this model to assess whether patients can benefit from treatment.
There were some limitations of our study. First, the present study lacked external validation. In addition, due to the number of samples, there were relatively few death events in this dataset. However, careful statistical methods were used for the machine learning and penalty algorithms to ensure the accuracy of the model and prevent overfitting.
Methods
Study population
The data source for this investigation was the West China Hospital CHD database. This single center database prospectively includes all CHD or high-risk patients undergoing angiography in West China Hospital affiliated to Sichuan University. For this analysis, we enrolled consecutive CHD patients from January 2009 to September 2012 who were included in the database. ACS patients were eligible for inclusion if they had (1) angiographic evidence of ≥ 50% stenosis in ≥ 1 coronary vessel; (2) ischemic chest discomfort that increased or occurred at rest; and/or (3) electrocardiography or cardiac biomarker criteria consistent with ACS. The exclusion criteria were malignancies, pregnancy, end stage renal disease and severe liver or hematological diseases. These inclusion and exclusion criteria were met by 2406 continuous CHD patients enrolled from the database. After excluding patients with loss of follow-up (n = 192) and much missing data (n = 40), 2174 patients were included in the data analysis. The study protocol was approved by the local institutional review boards in accordance with the Declaration of Helsinki. All subjects provided written informed consent before enrolment.
Baseline characteristics
Demographic data, medical history, cardiovascular risk factors, vital signs at admission, medication at discharge, and the final diagnosis were obtained from the patients’ electronic medical records and reviewed by a trained study coordinator. Blood samples were collected at admission and before angiography, and plasma biomarkers including Fib, liver and kidney function, blood glucose, and serum lipids were analyzed in the Department of Laboratory Medicine, West China Hospital, accredited by the College of American Pathologists. The Elevated myocardia enzyme is defined as the cardiac troponin T or Creatine kinase-MB raised beyond the upper limit of laboratory reference values. Hypertension was defined as systolic blood pressure (SBP) ≥ 140 mm Hg, diastolic blood pressure (DBP) ≥ 90 mm Hg and/or patients receiving antihypertensive medications. Diabetes mellitus was diagnosed in patients who had previously undergone dietary treatment for diabetes, had received additional oral antidiabetic or insulin medications or had a current fasting blood glucose level of ≥ 7.0 mmol/L or a random blood glucose level ≥ 11.1 mmol/L. The GRACE risk prediction tool used for analysis of mortality has been described previously [
4]. The calculation of the GRACE risk score was performed using an online program (
http://www.outcomes-umassmed.org/grace).
Follow-up and study outcome
The follow-up period ended in January 2013. Follow-up information was collected through contact with the patients’ physicians, patients or their family. All data were corroborated with the hospital records. The primary endpoint of this study was all-cause mortality, and the secondary endpoint was cardiovascular death, as documented in the database. Death was considered to be cardiac death when it was caused by acute myocardial infarction (MI), significant arrhythmias, or refractory heart failure. Sudden unexpected death occurring without another explanation was considered cardiovascular death.
Statistical analyses
Baseline demographics and clinical characteristics were compared between non-surviving patients and survivors. Continuous variables are expressed as the mean ± standard deviation (SD), and categorical variables are reported as counts and percentages. T-tests and Chi squared tests were used to evaluate differences between groups for continuous and categorical variables, respectively. The Kaplan–Meier method was used to calculate the rate of cumulative events during the follow-up period.
Missing data
To avoid loss of statistical power, all missing data among baseline characteristics were assumed to be missing at random and imputed using a random Forest-based imputation method [
18,
24]. Specifically, the “MissForest” method was used. MissForest handles missing data by iteratively using Random Forests. It starts by imputing the missing values of the candidate column, which is the column with the least missing values. Then, the imputer fills other missing values of the remaining columns with a mean imputation and uses them as predictors to perform a random Forest model with the candidate column as output. The missing values of the candidate column are imputed according to the prediction made from the fitted random Forest. This process starts over again for the remaining columns and repeats over multiple times until a certain stopping criterion is met.
Model derivation and validation
The development and validation of this risk model followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement [
16]. The independent predictors of 3-year mortality were identified among baseline characteristics using a Cox proportional-hazards regression model. The proportional hazard assumption was verified using the Schoenfeld residuals method.
When performing model estimation, the LASSO method was applied to avoid the overfitting, and the penalty parameter was selected by cross-validation [
20,
21]. According to design of our study, only the clinical characteristic before intervention were put into LASSO path and feature selection. The LASSO method is a shrinkage regression technique using L1 regularization and designed for high-dimensional data. Furthermore, this algorithm shrinks the coefficients of noninfluential predictors to zero and thus excludes them from the final model. This technique has been widely used in both machine learning and clinical practice. The estimated risk of mortality of a given patient was calculated from the cumulative hazard function of the Cox model as follows:
$$H\,(t\left| {x_{\alpha } } \right.)=\,\exp \,(x_{\alpha }^{T} \,\beta )\,H_{0} \,(t).$$
In this equation, \(H_{0\,} \,(t)\) is the baseline hazard function of time t, and \(x_{\alpha }^{T} \,\beta\) is the linear product of the predictors and associated coefficients for a patient.
The model was validated with tenfold cross-validation [
25]. In the assessment of the discrimination ability of the prediction model, Harrell’s C-statistic was used to estimate the degree of discrimination, and ROC analysis was conducted for visual inspection. Furthermore, the calibration was investigated using a calibration plot by plotting the predicted and observed probabilities of events across increasing levels of predicted risk.
To assess the utility of our model in clinical practice, we compared this risk model to the GRACE score. First, we sought to examine the difference in the AUC of ROC between these two risk-assessment tools. Second, we performed decision-curve analysis to quantify the clinical usefulness of our prediction model (which was also compared with that of the GRACE score) [
26]. This analysis was used to assess the net true-positive classification rate using a model over a range of thresholds. The values (from 0 to 1) represent the benefit from clinical intervention, and higher values indicate more significant benefit.
Data analyses were performed using Python (version 3.6) with the scientific libraries “scikit-learn”, “scikit-survival”, “lifelines” and Stata Statistical Software (Release 15. College Station, TX: StataCorp LLC).
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.