Background
-
In the study, we construct a dataset from a large amount of raw EHRs which contained one or more blood culture tests taken during hospitalization.×
-
The hybrid model incorporates Attention-based Bi-directional Long Short Term Memory (ABiLSTM) and Denoising Autoencoder (DAE) network. The ABiLSTM is used to extract textual features and DAE takes the numerical indicators as input for capturing important numerical features.
-
Conduct an extensive and large-scale empirical study to evaluate the effectiveness of the our method.
Related work
Methods
Task modeling
-
We construct a dataset D∗ from the real EHRs dataset. Specifically, positive examples indicate that patients have positive blood culture results at least once during hospitalization, which is denoted as D+∈D∗. Negative examples indicate that the result of patient’s blood culture were all negative, which is denoted as D−∈D∗.
-
At the training phase, we use the data D∗ that contains both D+ and D− to train our model M.
-
At the test phase, we apply the well-trained model M to predict patient’s blood culture test result, which can distinguish patients who has a serious illness without infections or patients with bloodstream infections.
Hybrid neural network model
Textual representation
Numerical representation
Out layer
Experiments
Dataset construction and data preprocessing
-
A patient who had blood culture positive results during hospitalization is selected as positive example.
-
A patient whose results of all blood culture tests are negative during hospitalization is selected as negative example.
Number | Clinical parameters | Abbreviation |
---|---|---|
F1 | Sex | SEX |
F2 | Age | AGE |
F3 | Temperature [ ∘C] | TEMP |
F4 | C-reactive protein concentration | CRP |
F5 | Procalcitonin | PCT |
F6 | Prothrombine time | PT |
F7 | Prothrombin time activity | PT% |
F8 | Thrombin time | TT |
F9 | Activated partial thromboplastin time | APTT |
F10 | Fibrinogen degradation products | FDP |
F11 | Fibrinogen | FIB |
F12 | D-Dimer | D-Dimer |
F13 | White blood cell | WBC |
F14 | Neutrophil | NEUT |
F15 | Blood platelet | PLT |
F16 | Red blood cell | RBC |
F17 | Hemoglobin | HB |
F18 | Platelet | PLT |
F19 | Neutrophil count | NEUT# |
F20 | Neutrophil ratio | NEUT% |
F21 | Lymphocyte count | LYMPH# |
F22 | Lymphocyte ratio | LYMPH% |
F23 | Hematocrit | HCT |
F24 | Red cell distribution width | RDW |
F25 | Mean platelet volume | MPV |
F26 | Basophil ration | BASO% |
F27 | Thrombocytocrit | Pct |
Evaluation metric
Predict as positive | Predict as negative | |
---|---|---|
Positive examples | TP | FN |
Negative examples | FP | TN |
Experimental settings
Type | Parameters |
---|---|
Training | λ=0.001,epochs=20 |
batchsize=16 | |
Embedding | dim(emb(L))=100 |
epochs=20 | |
BiLSTM | Lbilstm=128 |
Rdropoout=0.5 | |
AutoEncoder | NAE=6,NMLP=2 |
Results
Model | Precision (%) | Recall (%) | F-measure (%) |
---|---|---|---|
LR | 78.56 | 79.26 | 78.91 |
NB | 80.24 | 83.73 | 81.95 |
SVM | 84.56 | 81.95 | 83.23 |
ADT | 84.64 | 86.51 | 85.56 |
AVG | 82.00 | 82.86 | 82.41 |
Model | Precision (%) | Recall (%) | F-measure (%) |
---|---|---|---|
LR | 66.21 | 59.31 | 62.57 |
NB | 59.25 | 62.75 | 60.95 |
SVM | 63.56 | 65.25 | 64.39 |
ADT | 65.54 | 67.25 | 66.38 |
CNN | 69.21 | 71.18 | 70.18 |
BiLSTM | 76.35 | 69.19 | 72.59 |
ABiLSTM | 75.35 | 71.19 | 73.21 |
AVG | 66.92 | 55.01 | 67.18 |
Model | Precision (%) | Recall (%) | F-measure (%) |
---|---|---|---|
LR | 83.55 | 85.23 | 84.38 |
NB | 84.36 | 87.68 | 85.99 |
SVM | 86.12 | 87.41 | 86.76 |
ADT | 85.72 | 87.39 | 86.55 |
CNN | 87.36 | 88.69 | 88.01 |
BiLSTM | 87.16 | 90.01 | 88.56 |
CNN+DAE | 89.21 | 90.36 | 90.01 |
BiLSTM+DAE | 90.32 | 91.59 | 90.96 |
ABiLSTM+DAE | 90.15 | 92.35 | 91.23 |
AVG | 87.11 | 89.68 | 88.05 |
Discussion
Model | Recongition(%) | |
---|---|---|
BiLSTM+DAE | ADT | |
Correct | Correct | 1398 (49%) |
Correct | Wrong | 345 (12%) |
Wrong | Correct | 97 (3.3%) |
Wrong | Wrong | 107 (3.6%) |