Background
Hospitalized children, particularly those in high-acuity environments such as the pediatric intensive care unit (PICU), are inevitably susceptible to clinical deterioration. Several outcome prediction models such as the Pediatric Index of Mortality (PIM) and the Pediatric Risk of Mortality (PRISM) are widely used in PICUs [
1,
2]. However, these acuity scores are based on “snapshot” values gathered during the early period following PICU admission. These static scores fail to adapt with the patient’s clinical progression and offer little assistance for the management of individual patients [
3,
4].
Previous studies demonstrating that acute deterioration in patients is often preceded by subtle changes in physiological parameters [
5,
6] led to the development of the Early Warning Score (EWS) [
7]. Accurate and generalizable risk stratification tools may contribute to the timely identification of high-risk patients and facilitate earlier clinical intervention leading to improved patient outcomes [
8]. Since its introduction, the EWS has undergone many alterations, and its modified forms are widely used in general hospitals today [
9,
10]. However, the primary target population is usually confined to relatively healthy patients in general wards [
9,
11] or emergency department settings [
12] and may not be applicable to intensive care settings [
13].
Current literature frequently calls for the development of diverse intensive care warning scores [
14‐
16]. The rapid development in machine learning, coupled with the richness of data from extensive patient monitoring in the intensive care unit (ICU), provides unprecedented opportunities for the development of new prediction scores in the field of critical care [
17‐
19]. Challenges in the analytics of PICU data, including pathologic diversity and complexity [
20] and the wide range of age and developmental stages, are anticipated to be addressed by the implementation of innovative predictive modeling [
18,
21].
Curtis et al. developed a cardiac arrest prediction model by time series trend analysis using a support vector machine algorithm that achieved excellent performance [
22]. In addition, Zhengping et al. adopted Gradient Boosting Trees to learn an interpretable model, which demonstrated strong performance for the prediction of mortality and ventilator-free days in the PICU [
23]. Despite their successful application of data-driven analytics, the above studies were limited by the lack of external validation. To allow practical application in a real-world setting, the preliminary results would require further refinement regarding data elements, extraction, processing, and operation with acceptable false alarms.
In this paper, we describe the development and evaluation of a new tool, the Pediatric Risk of Mortality Prediction Tool (PROMPT), for real-time mortality prediction in PICUs. We also assessed PROMPT’s suitability for practical application in the clinical care of critically ill children.
Discussion
In this study, we developed and validated a targeted real-time early warning score, PROMPT, based on a CNN algorithm using a PICU dataset with routine vital signs. Utilizing a handful of variables, PROMPT achieved high performance with high sensitivity and specificity for predicting mortality in PICU patients. In predictive ability, it outperformed the conventional severity scoring system, PIM 3, as well as other models that use GBDT and LSTM.
Existing risk prediction tools in ICU use static physiological parameters from early in the course of critical illness (often within the first 24 h following admission), along with other components, such as age and diagnosis, to assess severity and risk of death for the purpose of predicting outcomes [
32]. For pediatric populations, PIM and PRISM are the most representative [
1,
2]. However, it is generally agreed that they are poor surrogates for risk stratification and should not be used as the basis for individual treatment decisions [
4,
33,
34]. Generic severity scores were originally developed and calibrated to maximize the capacity for mortality risk assessment for populations of interest, and not for clinical decision-making concerning individuals within those populations [
4]. Moreover, utilizing the poorest values within a fixed time window, regardless of the outcome of interventions, fails to reflect the dynamic clinical course including differential treatment responses. Thus, these systems are unable to distinguish which patients are at higher risk of developing specific acute conditions. In our study, this was demonstrated by the notably low discriminative ability of PIM 3 in mortality prediction.
Predictive analytics on time series monitoring data were introduced [
35,
36] based on evidence that physiologic signatures preceded acute deterioration of patients prior to the arousal of clinical suspicion [
5,
6,
37]. Widespread adoption of EHRs which could be queried in real time enabled the development of EWS with the ability to identify clinically deteriorating patients in need of intervention [
8,
38]. Accordingly, a wide variety of different tools now exist and are operated alongside rapid response teams in different hospital contexts [
9,
10,
39]. For instance, the Bedside Pediatric Early Warning Score (PEWS) is used across the UK National Health Service for the detection of patients in wards who are at risk of acute deterioration, facilitating their timely upgrade to higher level care [
40,
41]. Similarly, many other EWS systems have been developed and validated primarily on general wards [
11,
40], and their use has been extended to emergency departments [
12,
42] and prehospital settings [
43].
The ICU environment, where patients are clinically unstable and change rapidly between states of improvement and deterioration, calls for meticulous monitoring and clinical support. This has facilitated the development of ICU early warning systems [
18,
44,
45]. The development of more sophisticated monitoring devices has resulted in an exponential growth in sensor data. This, coupled with recent advances in machine learning, artificial intelligence techniques, and data archiving hardware, has facilitated the discovery of data-driven characteristics and patterns of diseases [
18,
36,
46‐
48]. However, the numerous developmental stages, baseline age-related differences in physiologic parameters, and the wide range of underlying pathologic diversity present unique challenges for the analysis of PICU patient data [
20,
21]. Moreover, physiological data of the patient is continuously influenced by clinical interventions such as oxygen supplement, volume resuscitation, and vasopressor use, given that the core principle of intensive care is to maintain the steady state [
20]. Because variations in physiological data occur within a complex biological system composed of multiple components that interact together, more sophisticated deep learning models such as neural networks, which automatically learn features, have demonstrated better performance than traditional machine learning [
49].
Our study makes several significant contributions to the existing literature on mortality prediction in the PICU setting. PROMPT utilized changing vital signs of individuals; employed CNN, a deep model primarily used in image analytics; and achieved high accuracy and discriminative ability in predicting mortality. Prediction performance decreased slightly as the time window ahead of the event lengthened from 6 to 60 h, and the performance of this earlier identification was relatively lower in the validation cohort. Nevertheless, PROMPT provided AUROC above 0.88 for predicting mortality 60 h in advance from both cohorts. Moreover, it consistently achieved higher sensitivity and specificity compared to other standard machine learning algorithms and PIM 3.
Accuracy and false alarm rate are important issues to consider in the practical implementation of EWS in ICU settings. Because sensitivity and specificity mutually interact, the performance of EWS and alarm fatigue should be weighed and optimized [
50]. Notably, PROMPT consistently provided higher specificity than PIM 3 and other algorithms against which it was tested. In addition, PROMPT maintained a higher level of accuracy than other models even with a small number of alarms (Additional file
1: Figure S3).
In this study, PROMPT used seven vital signs along with the patient’s age and body weight on PICU admission. The model does not require any custom data entry and relies entirely on data elements that are usually available from the EHRs of most hospitals. Incorporating further parameters such as laboratory tests would be expected to enhance PROMPT’s performance. However, we note that models based on continuously updated physiologic monitoring data are better able to provide timely warning of pending deterioration. Thus, using only the most basic and commonly measured critical care data streamed from the bedside monitor has an advantage for the broader adoption of this model in other ICUs. Relatively minimal data requirements, few manual data entry requirements, and automated operation on data extracted from EHRs save additional labor and cost and may lighten the burden of application in the clinical setting.
This study has several limitations. First, we could not determine the generalizability of our results to other populations. In addition, the retrospective study design did not allow the determination of model performance in a prospective setting. Our model remains a population-based estimate, as we did not validate its efficacy for individual prognostication in a prospective way. Moreover, despite PROMPT’s high performance in detecting and predicting mortality, this knowledge alone is insufficient to affect patient outcomes. Clinician input is required to determine clinical interventions and shape patient-centered outcomes.
However, considering that clinicians in the PICU environment face limited clinical resources and that rationing of health care is a reality in some respects, PROMPT may have the potential to benefit clinical practice. If the risk of critical adverse outcomes is identified earlier, clinicians could allocate staffing and other medical resources with a higher level of certainty. Our model utilizes easily collected data and, therefore, may be particularly suitable for bedside prognostications in relatively low-resourced environments.
In addition, because the predictive window of PROMPT is up to 60 h before death, earlier warnings may give physicians more time to intervene and prevent or mitigate mortality. Alternatively, once physicians are alerted and prepared for the likelihood of death, there are opportunities for preference-concordant, high-value care in PICUs by initiating goals of care discussions earlier and revising treatment plans. Hence, our future work will focus on the practical impact of early recognition of at-risk patients on clinically relevant outcomes.
Lastly, we would like to stress the additional implications of our model. Although our current model does not tell the clinician precisely how to treat a deteriorating patient, the trajectory of predicted risk and designation for time and feature contributions are expected to provide additional information, indirectly. Changes in the trend of predicted mortality over time, coupled with an event or specific intervention with a patient, may provide clinicians intuitive insight into potential associations with a favorable or unfavorable clinical course in individual cases.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.