Discussion
In our previously published study, we investigated the accuracy of a deep-learning model based on urine output to predict oliguric AKI in the ICU [
2]. In that study we demonstrated that the analysis of 12-h urine output with a deep learning model had good diagnostic performance, with an area under Receiving Operator Curve of 0,89 ± 0,01 (sensitivity 80% and specificity 84%). The estimated ability of the model to predict AKI stage 2 and 3 was of at least 12 h before the development of the event, with a + LR of 4,87 and 5,06 and a − LR of 0,24 and 0,20 respectively. In that study we used the data of the clinical databases collected in the US, the MIMIC-III and eICU. The use of retrospective data represents the main limitation of our investigation and, furthermore, the results need to be externally validated.
For this reason, in the current study we conducted the same analysis on a different database with the scope of externally validating the algorithms that had previously been developed and trained in the US population. The “AmsterdamUMC database” contains data concerning 23'106 admissions to the ICU of the University of Amsterdam observed and recorded between 2003 and 2016 [
5]. By comparing the analysed data with the eICU results, we observed a higher auROC (0,907 ± 0,007), a better + LR of 7,27 and a − LR of 0,22 for the AmsterdamUMC. This analysis allowed us to confirm the previous results with an external validation. Furthermore, we observed a slightly better performance of the deep learning model in the European dataset. Indeed, we found that the two databases have some differences and this may have led to a higher auROC and + LR. Particularly, the European database presented a higher incidence of AKI stage 2 and 3 and higher frequency of serum creatinine and urine output measurements.
The AmsterdamUMC database reported an AKI incidence of 4,2% vs 3% in the eICU: the most highly represented age group was 70–79 yrs (355%) in the AmsterdamUMC compared to 60–69 years (26,8%) in the eICU.
Serum creatinine was measured with a median acquisition frequency of 16,2 h instead of 23,2, and, more remarkably, the acquisition frequency of urine output in AmsterdamUMC was 1,3 h compared to 2,4 in the eICU (median values). Concerning hourly detection of urine output and serum creatinine monitoring, the percentage of database patients with a decreased frequency of observation is important. This fact represents the necessity to improve patient monitoring at least in established specific realities or situations. Nevertheless, considering the patients included in data analysis, it appears that the AmsterdamUMC database has a greater precision of variable acquisition and a relatively slightly higher number of events.
The possible connection between the higher frequency of observations and events and the observed differences on the positive likelihood ratio was not tested in our study. However, some issues may arise from our observations. The first aspect is the reason for investigating urine output in terms of a predictive outcome. It is known that AKI can develop in two different forms, i.e., oliguric (according to the KDIGO classification) and non-oliguric disease. Oliguric AKI represents about 40% of the disease, with a relevantly higher mortality (50–60%), both in the ICU and in hospital settings, than the non-oliguric form (10–20%) [
10,
11]. The second issue, which is strictly connected to the first, is the definition of oliguria. The KDIGO definition of oliguria considers a urine output criteria of < 0,5 ml/kg/h for a period of 6–12 h (AKI stage 1) or > 12 h (stage 2), and < 0,3 ml/kg/h for a period ≥ 24 h or anuria for a time ≥ 12 h, taking into account “consecutive” hours (stage 3). However, this classification is under debate because the incidence of AKI and related stage can differ when considering an average value of urine output for a determined period of time or a series of consecutive values [
12]. Finally, the third aspect is the inverted relationship between the predictivity of the urine output and the serum creatinine levels. Vincent et al. observed that oliguria on admission is present in about 25% of ICU patients and the mortality rate is twice as high compared to non-oliguric patients [
13]. They also reported differences in mortality depending on the ability to restore normal urine output within the first 48 h of ICU stay: the mortality rate is no different from the values observed in non-oliguric patients. They defined these patients as having “transient oliguria” and they made up 30% of the oliguric patients at ICU admission. The authors observed that oliguria averaged over 6 h had a greater sensitivity to predict AKI stage 1. Furthermore, the predictivity of oliguria on dramatic events, such as AKI development or hospital mortality, could be linked not only to the time window (how many hours) but also to the volume of diuresis, as demonstrated by the study of Ralib et al. [
14]. In their study the current AKI definition was based on a
“too liberal” urine output, because the authors observed that
“a 6-h urine output threshold of 0,3 ml/kg/hour best associated with mortality and dialysis”.
Regarding the accuracy of a predictive model with reference to the possible occurrence of a given event, the analysis of the ROC curve is commonly used. This analytical approach provides a graph that defines the sensitivity (i.e. the true positive rate) versus 1—specificity (i.e. the false-positive rate) for each possible cut-off of the prediction rule. In our analysis, we found areas under ROC curves to be consistently high (> 85%) in both cohorts and when using the analysis of both logistic regression or deep learning models (Tables
2,
3). We defined the results also in terms of positive and negative likelihood ratios, which represent the indexes that combine sensitivity and specificity. The + LR relates to a positive diagnosis in patients with a positive test and it is calculated as sensitivity/(1-specificity), with a value > 1. The + LR values that were obained by logistic regression were 4,21 (at a fixed sensitivity of 80%) and 4,05 (knee-point) in the Amsterdam cohort, whereas they were 3,20 (at a fixed sensitivity of 80%) and 3,52 (knee-point) in the eICU cohort (see Table
2). Of note, higher + LR values were observed in both cohorts when the deep learning model was applied in the analysis (Table
3). In particular, in the Amsterdam cohort the + LR of 7,27 (at a fixed sensitivity of 80%, Table
3) indicated a sevenfold increase in terms of odds of having the event in a patient with a positive test result. Therefore, the higher the level of + LR, the more informative the test.
The − LR gives the indication of having a diagnosis in patients with a negative test. It is calculated as (1—sensitivity)/(specificity) and its value is usually < 1. We found a − LR of 0,22 in the Amsterdam cohort (at a fixed sensitivity of 80%, Table
3) that indicated a 4,5-fold decrease in terms of odds of having the event of interest in a patient with a negative test result. Consequently, the smaller the − LR value, the more informative the test.
To date, there is scarce evidence in the literature concerning the accuracy of a predictive model in the urine output and AKI events. Macedo et al. reported a + LR of 1,25 and a − LR of 0,92 for 12-h oliguria in AKI stage 2, whereas the levels reported for 24 h oliguria in AKI stage 3 were 2,0 and 0,96, respectively [
15]. Another study involving cardiac surgery ICU patients indicated a + LR of 2,9 and a − LR of 0,45 in case of 6-h oliguria in AKI stage 1 [
12]. Based on this literature, it is important to point out that the accuracy of our prediction models is consistently higher than those provided by previous prediction rules and is in accordance with our previous study [
2]. A different approach to AKI prediction is the use of furosemide as a bolus and the consequent monitoring of diuresis for a 2-h period, i.e., the furosemide stress test (FST) [
16]. The test investigates the integrity of tubular function and predicts the worsening from AKI stage 1 or 2 to AKI stage 3 or the need for dialysis. The predictive capacity is high, as represented by the AUC equal to 0,87, with a sensitivity of 87,1% and a specificity of 84,1%. An increase in predictivity was obtained by coupling FST with biomarkers [
17]. In patients with urine TIMP-2xIGFBP-7 > 0,3, the AUC for AKIN stage 3 progression increases up to 0,9, and for RRT it increases up to 0,91. Although FST is a reliable tool for the prediction of AKI worsening, as both the meta-analysis by Chen [
18] and the review by Coca [
19] highlight, debate still exists on the heterogeneity of the studies, the number of enrolled patients, the type of study design, the severity of basal AKI, the role of albumin levels on furosemide sensitivity, the ability of continuous furosemide infusion to increase sensitivity and specificity of the test, as Mariano suggests [
20,
21], and the ability of AUC values to define the predictive capacity of a test.
Nevertheless, the application of a machine learning method differs from FST in several aspects.
The first is the target population: we applied the predictive model to all patients admitted to the ICU, with the exclusion of patients in need of continuous RRT (CRRT) during the stay, and of those with community-acquired AKI. We obtained excellent results in terms of AUC, positive and negative likelihood ratio when oliguric AKI was considered. On the contrary, the study by Chawla and Koiner applied FST to selected patients with AKIN stage 1 and 2, presence of granular or epithelial cell casts on urine sediment, or FeNa > 1%. In both studies patients were well resuscitated, sufficiently clinically stable and euvolemic.
The second difference is the moment of AKI prediction and its severity: FST is applied when a patient has AKIN stage 1 and 2 to predict AKI stage 3, including the need for hemodialysis. Our study continuously analyses the urine output in patients without AKI or with AKI stage 1 to predict AKI stage 2 and stage 3, with the exclusion of patients who require dialysis. Patients were included regardless of therapeutic intervention or volume cut-off, during all the ICU stays and by adopting 12-h periods of observation in sliding windows, to predict AKI stage 2 or 3.
The artificial intelligence that analysed the data, which included urine production to predict AKI events by exploiting the deep learning model, seems to overcome the KDIGO classification of urine output in terms of quantity and time. Actually, the KDIGO classification presented low accuracy in predicting higher stages of AKI development. In our study we have tested a mathematical model, derived from Artificial Intelligence process, which has a high accuracy to predict AKI stage 2/3 future events. In our previous analysis, the highest observed + LR was 5,00 and the lowest –LR was 0,20. As mentioned above, in the present study we used a different database with a different approach towards urine output analysis, and an artificial intelligence application characterized by the deep learning process. Furthermore, by comparing the results with the current study, a + LR equal to 5,00 demonstrated a moderate increase in the probability of a disease, given by a positive test.
This study also presents some limitations regarding the design since it relies on a retrospective source of data. Indeed, we documented an important difference between the two databases in terms of frequency of data acquisition that can significantly affect the validity of the analysis. The error in the manual determination of urine output can be estimated at around 20–26% [
21]. Another limiting factor is the usefulness of the information derived from DL analysis on everyday clinical work. For this reason, it appears necessary to develop an observational study in which a precise method of urine output recording is well established. Based on this consideration, we have already designed an observational prospective study which should start soon: all institutions interested in participating can contact our research group for further information and centre enrolment.
Finally, an interventional study designed to compare the use of electonic alarm is needed. It should be based on a highly accurate prediction model, as the one we have provided here, based on the routine clinical procedures of ICU mortality, length of stay or AKI development.