2.1 Data collection
The current observational retrospective study collected data from adult patients that were admitted to the surgical ward for postoperative care after elective major or intermediate surgery in the Amsterdam University Medical Center (Amsterdam, the Netherlands) between December 2018 until March 2019. All patients received standard postoperative care including intermittent vital signs measurements according to local Early Warning Score protocols. In addition, patients were monitored using the wireless Sensium Vitals® system (Sensium Healthcare, Oxford, UK). For this aim, a chest-worn patch sensor with axillary temperature probe was applied to measure the patient’s heart rate (HR), respiratory rate (RR) and axillary temperature (T) every 2 min.
To support continuous monitoring, the original Sensium Vitals® algorithm was used as active alarm system. An alarm was generated in case one of the vital signs measurements exceeded the upper or lower threshold of the predefined normal range (HR: 40–120 beats/min, RR: 8–24 breaths/min, T: ≤ 38 °C respectively) for at least 7 successive measurements. As such, this alarm strategy includes an annunciation delay with interval length of 14 min in case of no missing or invalid measurements. For recurrent abnormalities, a new alarm of the same type was only generated if at least 5 successive measurements (minimal 10 min) had been in the normal range since the preceding alarm. In case of alarms, nurses were asked per protocol to assess the patient. When the nurse judged that an alarm was not caused by technical disturbances or movement, vital signs were measured manually and the Modified Early Warning Score (MEWS) [
24] was calculated; further actions were taken according to established local protocols.
Patients were only included for analysis when the total vital signs recording time was at least 24 h and each of the vital signs measurements was available for at least a third of the total recording time. Next to the collection of vital sign measurements and alarms, the presence of observed AEs was assessed retrospectively using the patients clinical record. Adverse events were defined as any postoperative complication, new illness, or deterioration of existing disease described in the patient record. The onset of the AEs was defined as the timing of diagnostic confirmation reported in nursing files, laboratory or radiology results, following the Institute for HealthCare Improvements’ Global Trigger tool [
25]. The end of the AE was defined by the moment that AE treatment was no longer reported in the patient record. Only AEs that presented or were treated during the period of continuous monitoring were included in the analysis.
2.2 Simulation of alarm strategies
The collected wireless vital sign measurements of patients and clinically observed alarms were used to retrospectively evaluate the performance of the currently used Sensium Vitals® alarm algorithm for detection of AEs. Next, simulation was used to investigate the performance of alternative alarm strategies in the same dataset. For this aim, the original Sensium Vitals® algorithm was first reproduced retrospectively in MATLAB (version 2019b, The MathWorks Inc., Natick, MA, US) adopting the alarm principles described by the manufacturers and default settings. Subsequently, six alternative alarm strategies were explored by modifying the original alarm algorithm, as specified in Table
1. Two of these strategies were based on previously described methods for abnormality detection (I) or prevention of false alarm rates (III), as explained below. The other strategies were introduced based on physiological assumptions (II, IV, V, VI). For each alternative alarm strategy, three (sets of) parameter settings were subsequently tested to investigate and select optimal standard parameter settings (Table
1). The tested parameter settings were chosen arbitrarily within in a range that was expected suitable, given physiology and default settings of the currently used algorithm.
Table 1
Specification and tested parameter settings of alternative alarm strategies
I. Threshold individualization | For each individual patient, alarm thresholds are defined using the cumulative density function (CDF), which was reproduced for each vital sign separately using the first 24 h of available data [ 25]. Accordingly, the standard lower and upper alarm thresholds are replaced by the vital sign level that corresponds to the lower ( CDFlow) and upper ( CDFhigh) percentiles of the individual CDF. Default alarm thresholds are used for the first 24 h | - CDFlow: 0.1%; CDFhigh: 99.9% - CDFlow: 0.5%; CDFhigh: 99.5% - CDFlow: 1%; CDFhigh: 99% |
II. Postoperative elevation of upper thresholds | The standard upper alarm threshold is increased by a fixed percentage (POincrease, i.e. postoperative increase factor) for the first four days after surgery | - POincrease: 5% for HR/RR; POincrease: 1% for T - POincrease: 10% for HR/RR; POincrease: 2.5% for T - POincrease: 25% for HR/RR; POincrease: 5% for T |
III. Increase annunciation delay interval | The length of the annunciation delay interval (Linterval) i.e. minimum number of successive abnormal measurements needed for generation of an alarm (default: 7 measurements, i.e. 14 min interval) is increased | - Linterval: 12 measurements - Linterval: 17 measurements - Linterval: 22 measurements |
IV. Daytime elevation of upper HR/RR thresholds | The standard upper HR and RR threshold is increased by a fixed percentage (DTincrease i.e. daytime increase factor) during daytime (8 a.m. to 10 p.m.) | - DTincrease: 5% for HR; DTincrease: 15% for RR - DTincrease: 10% for HR; DTincrease: 25% for RR - DTincrease: 25% for HR; DTincrease: 35% for RR |
V. Nighttime reduction of lower HR/RR thresholds | The standard lower HR and RR threshold is decreased by a fixed percentage (NTincrease i.e. nighttime decrease factor) during nighttime (10 p.m. to 8 a.m.) | - NTdecrease: 5% for HR; NTdecrease: 15% for RR - NTdecrease: 10% for HR; NTdecrease: 25% for RR - NTdecrease: 25% for HR; NTdecrease: 35% for RR |
VI. Slope-based alarms | An alarm is generated only in case the slope of the linear regression line calculated over a past time interval (Tslope) exceeds a preset threshold: HR slope: ± 15 bpm over Tslope RR slope: ± 10 brpm over Tslope T slope: ± 1 °C over Tslope | - Tslope: 4 h - Tslope: 8 h - Tslope: 12 h |
The first alternative alarm strategy (I) implemented individual thresholds to correct for differences in normal vital signs ranges between patients. For this aim, the first available 24 h of the recording was used to create individual distributions of the vital signs for each patient and identify corresponding upper and lower alarm thresholds for the remaining monitoring period, similar to the approach described by Poole et al. [
26].
The second alarm strategy (II) aimed to prevent false alarms, by increasing upper threshold levels in the first four postoperative days where levels for HR, RR and T are typically higher due to the surgical stress response [
27,
28].
The third strategy (III) focused on optimization of the annunciation delay, supported by the beneficial results reported in other studies [
20,
29,
30]. Accordingly, an increase in the interval length of alarms was simulated, such that vital signs should exceed a threshold for a longer successive period to cause an alarm. With this adaptation, it was aimed to reduce the number of false alarms related to short lasting abnormalities caused by normal variations or movement artifacts.
The fourth (IV) alarm strategy was designed to compensate for increased physical activity level, which leads to increased HR and RR levels as compared to resting state. As patients are most active during daytime, the upper HR and RR threshold was increased for daytime (8 a.m. to 10 p.m.) to prevent false alarms.
Likewise, the fifth alarm strategy (V) corrected for low HR and RR levels that are often observed during sleep [
31] by decreasing the corresponding lower threshold during nighttime (10 p.m. to 8 a.m.).
The sixth alarm strategy (VI) assessed vital signs solely based on time trends, as patterns of change are crucial in the detection of clinical deterioration [
32]. Accordingly, this alarm strategy generated alarms in case the upward or downward slope calculated over a predefined time window exceeded a certain threshold, without taking the absolute vital sign value into account. Trends were assessed for time windows of multiple hours, as the wireless monitoring system is currently indicated for detection of clinical deterioration and not as surveillance system for acute situations.
2.3 Evaluation of alarm strategies
The alarms that were respectively generated in clinical practice or during simulation were defined as true positives (TP) or false positives (FP) to evaluate the performance to detect AEs. Alarms that occurred in the 24 h before diagnostic confirmation and during the treatment period of the AE were classified as TP in case the vital sign abnormality could be physiologically explained by development or presence of the AE. To enable consequent alarm classification, a list of assumed relations between AEs and vital sign abnormalities was composed using clinical guidelines and literature. In case subsequent AEs with overlapping windows of presentation were observed, alarms that could be related to both events were not double counted but allocated only to the event that developed latest in time. As continuous monitoring is aimed to be used as an early warning tool, TP alarms that were generated in the 24 h before diagnostic confirmation of the AE were also investigated as a separate category (TPearly).
The performance of the original alarm strategy and each of the optimized alternative strategies was evaluated using two sensitivity rates (S
total, S
early), the total alarm rate, and the false discovery rate. S
total and S
early were defined as the number of AEs for which TP alarms or TP
early were observed respectively, and represent the sensitivity for detection or early detection of AEs. The total alarm rate was calculated as the sum of all alarms divided by the total recording time of all patients, resulting in an average number of alarms/day/patient. The false discovery rate was calculated as the percentage of alarms classified as FP. In addition to these four metrics, we introduced a performance score (P-score) to evaluate the relative improvement in overall performance for each of the alternative alarm strategies as compared to the original alarm strategy, based on the trade-off between early AE detection and total alarm rate. For this aim, sub scores were assigned to the level of increase or decrease in S
early and total alarm rate, as specified in Table
2. The P-score was calculated as the sum of the two sub scores assigned to S
early and total alarm rate respectively. Accordingly, a positive P-score indicates improvement in overall performance as opposed to the original alarm strategy the while a negative P-score indicates impairment.
Table 2
Scores used to calculate the performance score (P-score)
−3 | Searly ≤ Searly:ref–10 | TAR > TARref + 0.5 |
−2 | Searly−10 < Searly ≤ Searly:ref−5 | TARref + 0.25 < TAR ≤ TARref + 0.5 |
−1 | Searly:ref−5 < Searly ≤ Searly:ref | TARref < TAR ≤ TARref + 0.25 |
0 | Searly = Searly:ref | TAR = TARref |
1 | Searly:ref < Searly ≤ Searly:ref + 5 | TARref–0.25 < TAR ≤ TARref |
2 | Searly:ref + 5 < Searly ≤ Searly:ref + 10 | TARref–0.5 < TAR ≤ TARref – 0.25 |
3 | Searly > Searly:ref + 10 | TAR ≤ TARref–0.5 |
For each alternative alarm strategy, the parameter set with highest P-score was selected as most optimal and used as standard setting applied to each patient record for further analysis and evaluation. In case of an equal P-score, the setting with lowest false discovery rate or the setting with smallest modification (lowest correction factor) as compared to the original alarm algorithm was selected subsequently. In addition to evaluation of individual alarm strategies, we explored whether combining multiple strategies improved alarm performance. For this aim, all possible combinations of strategies I to V were implemented cumulatively. The trend-based strategy (VI) was not included in these combinations due to its incompatibility with strategies that adapt thresholds for absolute vital sign values. Last, stepwise backward elimination was performed. Accordingly, the strategies that affected the P-score most were removed step-by-step from the combination, starting from the full combination of strategies (I–V). This process was repeated until all combination sizes were tested.