Background
Related work
Reference | Training Corpus | Testing Corpus |
---|---|---|
[8] | 275 Manual and mixed | 358 Mixed |
[10] | 148 Manual and mixed | 75 Mixed |
[13] | 50 Manual and mixed | 156 Mixed |
[12] | 800 Manual and mixed | 200 Mixed |
1000 Manual and mixed | 200 Mixed | |
[9] | 1575 to 2280 Automatic and only structured abstracts | 318 Mixed |
2394 to 14,279 Automatic and only structured abstracts | 2394 to 14,279 Only structured |
Methods
Experiments data
-
The training set consists of 800 abstracts of which 486 are unstructured and 314 are structured.
-
The test set consists of 200 abstracts of which 120 are unstructured and 80 structured.
Label | Number of sentences | % |
---|---|---|
Population | 662 | 6.8% |
Intervention | 565 | 5.8% |
Outcome | 3564 | 36.6% |
Other | 2712 | 27.9% |
Study Design | 193 | 2.0% |
Background | 2031 | 20.9% |
Total | 9727 | 100.0% |
CRF (MLM) stage
System description
-
removes the characters that can be confused with the end of sentence such as vs., %, E.g.,
-
corrects invalid decimal point numbers that cTAKES could consider as the end of sentence,
-
standardizes section headers.
Semantic features | |
---|---|
f1 | Number of words in the sentence that are in the age, race or gender keywords list |
f2 | Number of words belonging to the UMLS semantic group «Disorders» |
f3 | Number of words belonging to the UMLS semantic group «Procedures» or «Chemicals & Drugs» |
f4 | Number of words that are in the Outcome keywords list |
Structural features | |
f5 | Number of words of the sentence that are in the title |
f6 | Number of words of the sentence that are in the abstract’s « keywords » |
f7 | Sentence header |
f8 | Sentence length (number of words) |
f9 | Sentence relative position |
Lexical feature | |
f10 | The current word and its POS belongs to the bag-of-words |
-
wSe is a weight which depends on the presence of semantic features of the P element in the sentence: wSe = f1 + f2.
Analysis of MLM aspects
-
Model setting: Gaussian prior and training-proportion parameters
-
Training information layout: standard structure vs. information redundancy structure
-
Mixing different features
-
Type of feature values: binary vs. natural vs. categorical
-
Standardisation or not of section headings
-
Grouping structural features vs. non grouping
-
Mixed abstracts vs. only structured ones
-
Balancing of PICO element distribution
-
the RBM results to analyse how the RBM stage performed on the abstracts that are not labelled by the MLM stage,
-
the combined MLM and RBM results to compare them with the results in the literature review,
-
the 5-fold cross validation to assess overfitting and robustness of the model.
Sentence | Conditional probability calculated by the FRC model | Sentence label |
---|---|---|
1 | P (POPULATION | Phrase1) = p1 | p4 > p1, p2, p3 ➔label = OTHER |
P (INTERVENTION | Phrase1) = p2 | ||
P (OUTCOME | Phrase1) = p3 | ||
P (OTHER | Phrase1) = p4 | ||
1 | P (POPULATION | Phrase1) = p1 | p2 > p1, p4, p3 ➔label = INTERVENTION |
P (INTERVENTION | Phrase1) = p2 | ||
P (OUTCOME | Phrase1) = p3 | ||
P (OTHER | Phrase1) = p4 | ||
1 | P (POPULATION | Phrase1) = p1 | p1 > p2, p4, p3 ➔label = POPULATION |
P (INTERVENTION | Phrase1) = p2 | ||
P (OUTCOME | Phrase1) = p3 | ||
P (OTHER | Phrase1) = p4 |
Training file with information redundancy layout | |||||
---|---|---|---|---|---|
Sentence | Features | Label | Prediction | ||
S1 | f1 | f2 | f3 | INTERVENTION | 0 |
S1 | f1 | f2 | f3 | POPULATION | 1 |
S1 | f1 | f2 | f3 | OUTCOME | 0 |
S1 | f1 | f2 | f3 | OTHER | 0 |
S2 | f1 | f2 | f3 | INTERVENTION | 1 |
S2 | f1 | f2 | f3 | POPULATION | 1 |
S2 | f1 | f2 | f3 | OUTCOME | 0 |
S2 | f1 | f2 | f3 | OTHER | 0 |
Training file standard layout | |||||
Sentence | Features | Label | |||
S1 | f1 | f2 | f3 | … | POPULATION |
S2 | f1 | f2 | f3 | … | INTERVENTION |
S2 | f1 | f2 | f3 | … | POPULATION |
-
MPt category represents the characteristics of the Patient element like “patient”, “age”, “adult”, etc.
-
MP category represents the characteristics of the Problem element belonging to a UMLS semantic type such as Gold Syndrome Disease, Injury or Poisoning, Anatomical Abnormality, etc.
-
MI category represents the characteristics of the Intervention element belonging to a UMLS semantic type like Procedures, Chemicals and Drugs, Devices, etc.
-
MT category contains the words of the title of the abstract.
Common header | Mapped header | Total |
---|---|---|
OBJECTIVE | AIM, OBJECTIVE, BACKGROUND AND OBJECTIVES, CONTEXT, … | 37 |
METHOD | DESIGN, DESIGN AND METHODS, PATIENT(S), INTERVENTION, … | 30 |
RESULTS | FINDINGS, MAIN RESULTS, OUTCOME MEASURES, … | 13 |
CONCLUSION | CONCLUSION, DISCUSSION, IMPLICATIONS, SUMMARY, … | 12 |
-
If the sentence header is OBJECTIVE, then all the sentences in this section will have number 3; the number 3 is an arbitrary number close to the average size of the Objective section; its role is to standardize the structural feature.
-
If the header of the sentence is METHOD, then all the phrases in this section will have number 6 that is an arbitrary number close to the average size of the METHOD section plus the average size of the OBJECTIVE section.
-
If the header of the sentence is RESULT, then all the phrases in this section will have number 12.
-
If the header of the sentence is CONCLUSION, then all the sentences in this section will have number 14.
-
OBJECTIVE section for the sentences labeled “Background”;
-
METHOD section for the sentences labeled “Population”, “Intervention” or “StudyDesign”;
-
RESULT section for the sentences labeled “Outcome”;
-
CONCLUSION section for the sentences labeled “Other”.
RBM stage
Results
Aspect | Best choice of aspect | Other assessed choices |
---|---|---|
Gaussian prior | 10 | 0.1, 1, 10, 100 |
Model training-proportion | (100, 0%) | (50, 50%), (80, 20%), (90, 10%) |
Training information layout | Standard | Information redundancy |
Testing information layout | Redundant information | Standard |
Mixing different features | All features | Part of them |
Type of feature values | Categorical | Binary, natural |
Grouping structural features | Yes | No |
Assessment of the CRF model
-
using cTAKES instead of MetaMap [29] as a tool for extracting UMLS concepts in a text,
-
using CRF as a MLM algorithm.
P | I | O | |
---|---|---|---|
Number of sentences in training (%) | 662 (6.8) | 565 (5.9) | 3565 (36.6) |
Our MLM stage - blind test corpus | |||
F-score |
73%
|
43%
| 90% |
The best F-scores in ALTA shared task [12] | |||
System 1 |
58%
| 34% |
89%
|
System 2 | 51% |
35%
| 86% |
The best F-scores in paper [11] | |||
CRF |
81%
|
81%
|
98%
|
SVM | 31% | 21% | 90% |
Nave Bayes | 34% | 10% | 86% |
Multinomial Logistic Regression | 41% | 28% | 90% |
Other papers F-score results using the same training and test corpora | |||
Kim et al. [14] | 48% | 16% | 83% |
Verbek et al. [30] | 29% | 21% | 85% |
Sarker et al. [31] | 52% | 34% | 86% |
Examples of potential P sentences that are not considered in the test file | |
“An estimated 20% of all breast cancer or ovarian and breast cancer cases have familial aggregation.” [32] | |
“Clinical trials such as the Sudden Cardiac Death Heart Failure Trial (SCD-HeFT) are currently underway to investigate the role of the implantable defibrillator in patients with heart failure.” [33] | |
Examples of potential I sentences that are not considered in the test file | |
“Tizanidine hydrochloride is a very useful medication in patients suffering from spasticity caused by MS, acquired brain injury or spinal cord injury.” [34] | |
“Here we describe the influence of local anesthesia and back-muscle-training therapy on subjective and objective pain parameters in 21 low-back-pain patients who had similar clinical status and neurophysiologic findings and whose recurrent low back pain.” [35] | |
“Laparoscopy is highly accurate and effective in the management of peritoneal dialysis catheter dysfunction and results in prolongation of catheter life.” [36] | |
“Here, vertebroplasty and kyphoplasty may provide immediate pain relief by minimally invasive fracture stabilisation.” [37] |
RBM stage results
P | I | |
---|---|---|
Unstructured abstract extraction | 28 abstracts | 28 abstracts |
Structured abstract extraction | 10 abstracts | 14 abstracts |
Missed | 15 abstracts | 7 abstracts |
N/A (not applicable) | 9 abstracts | 55 abstracts |
Total | 62 | 104 |
-
Pre-filter the abstracts. In the context of these experiments the abstracts of the training and testing corpora were randomly sampled from the GEM [22] and AHRQ [23] institutions which explains the presence of the high number of the N/A abstracts for the I element. However, in the medical Question-Answer-System (QAS) context, the document filtering step of the QAS reduces the N/A abstracts; in fact, a filter is applied on the abstracts based on the question key words or the question type (therapy, etiology, prognosis, …).
-
Tighten the constraints on the features f1, f2 and f3 in the RBM rules.
Element P | Element I | |
---|---|---|
Precision | ||
MLM | 85% | 65% |
RBM | 61% | 40% |
Combined (MLM & RBM) | 77% | 51% |
Recall | ||
MLM | 74% | 57% |
RBM | 72% | 86% |
Combined (MLM & RBM) | 83% | 86% |
F-score | ||
MLM with CRF | 79% | 60% |
RBM | 66% | 55% |
Combined (MLM & RBM) |
80%
|
64%
|
ALTA [12] best F-scores | ||
MLM with CRF |
58%
|
35%
|
Paper [11] best F-scores | ||
MLM with CRF |
81.3%
|
81.1%
|