Introduction
Early cardiopulmonary resuscitation (CPR) and early defibrillation are the key points in the chain of survival in cardiac arrest patients with shockable rhythms [
1,
2]. However, the priority of intervention, CPR or immediate defibrillation and the duration of CPR intervals prior to defibrillation are still debated, particularly in out-of-hospital cardiac arrests (OHCA) with long response times [
3‐
5]. Animal studies demonstrated that high success of restoration of spontaneous circulation (ROSC) is achieved when the heart is recently perfused, while prolonged untreated ventricular fibrillation (VF) with depleted energy phosphates leads to poor outcome [
6]. Clinical studies also indicated that not all VF patients benefit from being treated in the same manner with a time-based CPR/defibrillation protocols [
2,
7]. Optimizing timing of defibrillation might decrease the severity of postresuscitation myocardial dysfunction by reducing the numbers of failed shocks and by reducing the consequent unnecessary interruptions in chest compression, having therefore the potential to improve the final outcome of cardiac arrest [
8].
Quantitative electrocardiogram (ECG) waveform analysis provides a noninvasive reflection of the metabolic status of the myocardium during resuscitation and is a potential tool to guide and optimize CPR interventions, i.e., chest compression or defibrillation [
9]. During the last two decades, numerous features have been developed and used to predict the outcome of defibrillation, including time domain [
10‐
15], frequency domain [
15‐
19] and nonlinear measures [
20,
22]. As combining multiple predictive features may offer complementary information to improve the predictive accuracy [
16], several studies have been attempted to combine different VF features to enhance the predictive performance using the machine learning theory, albeit in relatively small populations [
25,
25]. Whether the combination of multiple predictive features can improve prediction capability for defibrillation outcome compared to the single features is still uncertain.
The purpose of the present study was to investigate whether combination of multiple VF features, by different machine learning strategies, including logistical regression (LR), artificial neural network and support vector machine (SVM), could improve the prediction capacity of defibrillation outcome using a large multicenter database of OHCA patients.
Discussion
In the present study, we investigated whether combination of multiple VF features could improve the capability of defibrillation outcome prediction using a large multicenter database from OHCA patients by machine learning strategies. The results indicated that the amplitude-related features outperformed other single waveform measures, while combining multiple VF features did not further improve the capability of defibrillation prediction.
Accuracy in predicting defibrillation outcome during resuscitation of VF cardiac arrest patients provides the potential to significantly enhance resuscitative strategies and improve patient’s outcome. A considerable number of defibrillation predictors have been proposed and shown to be promising in estimating VF duration, predicting defibrillation outcome, return to organized rhythm, and prognosticating long-term survival [
10‐
23]. Current best predictors achieve an AUC in predicting defibrillation outcome of 0.87, with a balanced sensitivity and specificity of approximately 80 %. The above approaches have already a high predictive power; nevertheless research identifying approaches that might further improve the accuracy of defibrillation outcome prediction for OHCA is still ongoing. A possible solution is to use patient-specific information in the ECG-based prediction model. In an earlier study, Monsieurs et al. showed that adding age to the prediction formula increased the correct classification of survivors and nonsurvivors in 100 OHCA victims [
27]. However, no significant improvement was obtained by including age, sex, presenting rhythm, presence of bystander CPR and ambulance response time when six different single prediction features were investigated in 530 shocks from 86 patients [
28].
Another practical approach to improve the predictive performance of current ECG analysis is to combine multiple VF features using machine learning strategies. In a dataset of 883 defibrillations from 156 OHCA patients, Eftestøl et al. demonstrated that the combination of two decorrelated spectral features based on the principal component analysis of an original feature dataset with information on CF and PF could reduce the number of unsuccessful defibrillations [
29]. In another database including 203 defibrillations from 47 patients with OHCA, Podbregar et al. reported that combining VF features including maximal amplitude, total energy of power spectral density and the Hurst exponent by genetic programming could potentially reduce the incidence of unsuccessful defibrillations [
23]. On the contrary, Watson et al. showed no improvement in defibrillation outcome prediction performance when combined entropy with four other features in comparison with the five wavelet-based features alone [
30]. In another clinical study, Neurauter et al. compared the performance of ten single predictive features and their combinations in 770 countershock attempts from 197 patients, and verified that combination of these predictive features using neural networks could not improve outcome prediction [
25]. Recently, Shandilya et al. predicted defibrillation success using a parametrically optimized SVM model from a database of 90 precountershock ECG signals. The PA (82.2 % vs. 64.6 %) and AUC (0.850 vs. 0.609) were considerably improved by combining six to ten features compared with single feature-based AMSA [
22]. Howe et al. investigated an alternative SVM-optimized classification approach, which combined multiple metrics with acceptable predictive attributes in a total of 115 defibrillations from 41 patients [
24]. In contrast to the 86 % sensitivity and 60 % specificity for single feature AMSA, performance of the combined features was improved to a sensitivity of 87.6 % and a specificity of 71.6 % for the prediction of return of organized rhythm.
Besides the differences in machine learning methods and feature selection [
22‐
25], the relative smaller sample size and not multicenter data might be responsible for the controversial conclusions when multiple features were applied to predict defibrillation outcome. In previous clinical studies, data were usually split into training and validation sets to testify the performance of predictors or designed parameters [
17,
18,
22‐
25]. Switching the role of two sets by a crossvalidation method was frequently adopted to increase the degree of expected reliability in studies with relative smaller sample size [
22‐
25]. Nevertheless, the test performances were considered in the design of the classifiers to optimize and generalize parameters [
17]. Thereby, the crossvalidation strategy would influence the design process and bias the validation results.
Our results, obtained from the largest database of ECG traces on OHCA patients to date, showed that amplitude-related measures, such as MS, AMSA, MdS, PSA, PPA outperformed frequency and nonlinear-based methods when ranked by AUC and exhibited similar shock success prediction performance, consistent with the study of Wu and Firoozabadi et al [
14,
26]. However, combining multiple VF features did not further improve the capability of defibrillation prediction in comparison to single features. This result was consistent with Neurauter et al. [
25] when neural network was used but was controversial to the study of Howe et al. [
24] when SVM was applied to combine multiple features. Notably, limited clinical data (115 defibrillations from 41 patients) were used in a crossvalidation SVM approach by Howe et al. [
24], which might have caused biased validation results. Moreover, SVM usually keeps a desirable predictive performance for a small number of samples, but a large number of samples with noise may cause overfitting and overspecialization during the training process of SVM and create a negative bias in accuracies when the validation data are passed through the model [
31]. Though overfitting happened when using the neural network with multiple hidden layers as well, neural network seemed more robust than SVM for a large number of training samples, which was caused by the different optimization functions and output variable forms employed in these two machine learning methods [
31].
The unimproved prediction power of multiple VF features may be due to the limited information obtained from ECG signals and indicates that various single VF features, such as MS and AMSA, already reached the maximum prediction power extractable from VF ECGs. Besides ECG waveform characteristics, outcome of defibrillation is related to other factors of patients, such as drug treatments, comorbidities, and Emergency Medical Systems (EMS) arrival time. Additional clinically relevant attributes, independent from ECG waveform metrics, such as end-tidal carbon dioxide, blood pressure, blood oxygen saturation and compression depths, might be considered to further improve prediction power [
22]. From another point of view, the longitudinal ECG data often has repeated defibrillations on each patient. The treatment effects and relative changes of a certain predictive feature may enhance the prediction performance in some degree.
We recognized that several limitations need to be considered in the study. First, this was a retrospective study on prospectively collected data. Sixteen predictive features were calculated only during the predefibrillation hands-off time and not in real time during chest compression. Second, the successful defibrillation was defined as sustained ROSC, but long-term survival was not considered. Peri-arrest factors such as age, sex, presenting rhythm, EMS arrival time, drug treatments, comorbidities, were not analyzed in this study. Third, further studies including independent ECG waveform metrics should to be tested in future prospective evaluations.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
GR, YL conceptualized and designed this study. TM, FF, MB, GC, RL, and AP were substantially involved in data acquisition and rhythm annotation. MH and YG performed the analyses and were substantially involved in data interpretation. MH, YL, and GR drafted the manuscript. All authors revised the manuscript critically for important intellectual content. All authors approved the final version.