Background
-
An effective framework is proposed to normalize the lab indicators, which combines a recall model and a binary classification model. The purpose of the recall model is to reduce alignment scale, a candidate set contains standard indicators is selected for each non-standard indicators. After this step, candidate-standard indicator pairs can be generated by a binary classification model through an enhanced sequential inference model(ESIM) based on the name and abbreviation of indicators. Experimental results of the proposed structure show that it achieves an F1-score of 92.08\(\%\) in the final binary classification.
-
Active learning is utilized for reducing annotation cost. A new selection strategy is proposed and is compared with shannon entropy, least confidence and a random baseline. Experiments show that the our strategy performs better than random baseline and could outperform the same result which is trained on full data with only 43\(\%\) training data.
-
A detailed case study on heart failure clinic analysis is conducted on the sub-dataset from the dataset of a regional healthcare platform called Shanghai Hospital Development Center (SHDC). The result shows that our proposed method is practical in data cleaning, data mining, text extracting and entity alignment.
Methods
Data pre-processing
Candidate selection
Binary classification
Input encoding
Local inference modeling
Inference composition
Active learning
Model | Recall | MRR |
---|---|---|
Edit distance | 86.67 | 0.74 |
Bow (bag of words) | 84.32 | 0.49 |
bm-25 | 90.10 | 0.35 |
tf-idf | 92.50 | 0.82 |
Model | Recall | MRR |
---|---|---|
Edit distance | 91.83 | 0.79 |
Bow (bag of words) | 89.78 | 0.53 |
bm-25 | 95.76 | 0.38 |
tf-idf | 97.38 | 0.87 |
Results
Dataset
Candidate selection
-
edit distance: a model measures the number of operations to transform one string to another.
-
bow(bag of words): a basic model to represent text into vector for similarity calculation.
-
bm-25: a baseline model in information retrieval.
Binary classifications
Active learning
Methods | Precision | Recall | \(\hbox {F}_1\)-score |
---|---|---|---|
Zhang | 88.38 | 79.89 | 83.53 |
BiMPM | 83.07 | 90.13 | 86.19 |
ESIM | 92.39 | 91.78 | 92.08 |
Non standard indicator | Standard indicators | ||
---|---|---|---|
Discussion
Qualitative analysis
Standard indicator | Non_standard indicator | Label | Predict |
---|---|---|---|
1 | 1 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 |