Background
INDICATIONS AND USAGE: For debridement and promotion of normal healing of hyperkeratotic surface lesions, particularly where healing is retarded by local infection, necrotic tissue, fibrinous or prurient debris or eschar. Urea is useful for the treatment of hyperkeratotic conditions such as dry, rough skin, dermatitis, psoriasis, xerosis, ichthyosis, eczema, keratosis, keratoderma, corns and calluses. |
Drug*
|
Condition
|
LOINC Sections*
|
Urea | hyperkeratotic lesion | INDICATION | |
Urea | stinging | ADVERSE REACTION | |
ADVERSE REACTIONS: Transient stinging, burning, itching or irritation may occur and normally disappear on discontinuing the medication. | Urea | burning | ADVERSE REACTION |
……
|
……
|
……
|
Methods
Data
Data set
Rx_Top200 (23) | Rx_Other (13) | ||
---|---|---|---|
Drug Name
|
NDC
|
Drug Name
|
NDC
|
Diovan | 0083-4001-01 | UREA | 42192-101-10 |
ARICEPT | 62856-851-30 | GlucaGen HypoKit | 0169-7065-15 |
DORYX | 50546-550-01 | Tramadol Hydrochloride | 54868-4638-6 |
BENICAR HCT | 65597-107-11 | Lisinopril | 51138-139-30 |
Copaxone | 0088-1153-30 | Glyburide | 23155-058-10 |
OTC (16)
| |||
Drug Name
|
NDC
| ||
Natural Fiber PowderOrange Flavor | 53329-102-56 | ||
WhiskCare 373 | 65585-373-04 | ||
Degree for Men CleanAntiperspirant and Deodorant | 64942-0866-2 | ||
UltrasolSunscreenSunscreen Lotion SPF 34 | 59886-319-11 | ||
Topcare Allergy | 36800-479-68 |
-
Boxed Warning sections with the LOINC code of 34066–1
-
Precautions sections with the LOINC code of 42232–9
-
Warning and Precautions sections with the LOINC code of 43685–7
-
Warning sections with the LOINC code of 34071–1
-
Contraindications sections with the LOINC code of 34070–3
-
Overdosage sections with the LOINC code of 34088–5
-
Indications & Usage sections with the LOINC code of 34067–9
-
Adverse reactions sections with the LOINC code of 34084–4
Gold standard annotation
Descriptive statistics of the corpus
OTC | Rx_Top200 | Rx_Other | ALL | |||
---|---|---|---|---|---|---|
Token
|
Disease/Disorder (DD)
|
All
| 83 | 6,295 | 3,806 | 10,184 |
Unique
| 46 | 1,129 | 737 | 1,423 | ||
Sign/Symptom (SS)
|
All
| 104 | 2,072 | 1,867 | 4043 | |
Unique
| 46 | 525 | 437 | 742 | ||
Medical Conditions (DD&SS)
|
All
| 187 | 8,367 | 5,673 | 14,227 | |
Unique
| 92 | 2,942 | 11,74 | 2,165 | ||
Span
|
Disease/Disorder (DD)
|
All
| 67 | 1,443 | 1271 | 2781 |
Unique
| 39 | 553 | 470 | 860 | ||
Sign/Symptom (SS)
|
All
| 54 | 3,642 | 2144 | 5840 | |
Unique
| 30 | 1,547 | 927 | 2114 | ||
Medical Conditions (DD&SS)
|
All
| 121 | 5,085 | 3415 | 8,611 | |
Unique
| 69 | 2,091 | 1391 | 2,953 |
Evaluation method
A Hybrid pipeline
Preprocessing
CRF-based medical condition extraction
Token features
| |
Current token features |
The original form of the current token, the lowercase form, and the stemmed form of the current token. |
Tokens in the 5-window size |
The previous two tokens and the next two tokens in their original form. |
Bigram of current token | The current token bigram and the previous token bigram. |
Linguistic features
| |
POS features | The Part-Of-Speech (POS) of the tokens in a 5-token window, including the current token, the previous two tokens, and the next two tokens. |
Initial capital features | The features indicating whether the tokens (including the current token, the previous two tokens, and the next two tokens) are upper-case-initial. |
Number or not features | The features indicating whether the current token is digital or alphabetic or mixed. |
Capital feature | The feature indicating whether the current token is all capitalized or mixed with capital characters. |
Prefix and suffix | The prefix and suffix of the current token (first or last two characters). |
Token length | The character length of the current token. |
Semantic features
| |
CUI |
The CUI code of the current token from cTAKES by using dictionary based method. |
TUI |
The TUI code of the current token assigned by cTAKES, which provides the semantic type information contained in the UMLS thesaurus. |
Rule-based post-processing
Experimental settings
-
Baseline I: a rule-based medical condition extraction method. This baseline assumes that semantic types assigned by cTAKES can identify the entities of DD and SS. The cTAKES TUIs lookup generated the UMLS TUI codes for the medical condition terms. These semantic types were selected based on the SHARPn guidelines to achieve consistency with the annotation from our gold standard [24]. The SHARPn guidelines excluded T033 (Finding) from the selection because it is a noisy semantic type and can correspond to several different classes such as Signs or Symptoms, Disease or Syndrome, or Lab Results.
-
Since the TUIs and the entities of DD and SS are not in a one-to-one mapping, in the subsequent step, we used the following semantic types as medical conditions: Congenital Abnormality (with the TUI code of T019), Acquired Abnormality (with the TUI code of T020), Injury or Poisoning (with the TUI code of T037), Pathologic Function (with the TUI code of T046), Disease or Syndrome (with the TUI code of T047), Mental or Behavioral Dysfunction (with the TUI code of T048), Cell or Molecular Dysfunction (with the TUI code of T049), Experimental Model of Disease (with the TUI code of T50), Signs and Symptoms (with the TUI code of T184), Anatomical Abnormality (with the TUI code of T190), and Neoplastic Process (with the TUI code of T191).
-
Baseline II: another rule-based extraction system, where all terms in the test set were tagged as medical conditions, if the same term also appeared in the training set. The pattern matching approach was performed according to the longest exact match method. It assumed the system had a dictionary (generated from the training set) and used this dictionary to detect medical conditions in the testing set. This is considered to be a pattern matching approach.
-
Experimental I: an implementation of the AutoMCExtractor system. The system uses MALLET CRF for the supervised sequence learning with the basic feature set of token features and linguistic features, as described in Table 4. These features were used in our previous project for medication name detection in clinical notes and showed excellent performance.
-
Experimental II: an implementation of our AutoMCExtractor system. The system uses MALLET CRF for the supervised sequence learning with the same token and linguistic features as in Experimental I but added TUIs as features, as described in Table 4.
-
Experimental III: an implementation of our AutoMCExtractor system. The system uses MALLET CRF with token and linguistic features but added CUIs, as described in Table 4. TUIs are not included as a feature in this experiment.
Dictionary | Token features | Linguistic features | Semantic features | ||
---|---|---|---|---|---|
TUI | CUI | ||||
Baseline I
| X | ||||
Baseline II
| X | ||||
Experiment I
| X | X | X | ||
Experiment II
| X | X | X | X | |
Experiment III
| X | X | X | X |
Results
Precision | Recall | F-measure | |||
---|---|---|---|---|---|
Baseline I
| Token | MC_B | 0.661 | 0.575 | 0.615 |
MC_I | 0.890 | 0.21 | 0.338 | ||
Overall | 0.775 | 0.391 | 0.476 | ||
Span | Exact Match | 0.827 | 0.506 | 0.628 | |
Baseline II
| Token | MC_B | 0.804 | 0.733 | 0.767 |
MC_I | 0.811 | 0.473 | 0.597 | ||
Overall | 0.808 | 0.603 | 0.681 | ||
Span | Exact Match | 0.888 | 0.698 | 0.781 | |
Experiment I
| Token | MC_B | 0.910 | 0.782 | 0.841 |
MC_I | 0.936 | 0.660 | 0.773 | ||
Overall | 0.919 | 0.731 | 0.814 | ||
Span | Exact Match | 0.886 | 0.766 | 0.822 | |
Left Match | 0.915 | 0.862 | 0.888 | ||
Right Match | 0.941 | 0.877 | 0.908 | ||
Partial Match | 0.982 | 0.849 | 0.911 | ||
Experiment II
| Token | MC_B | 0.928 | 0.831 | 0.877 |
MC_I | 0.942 | 0.686 | 0.793 | ||
Overall | 0.933 | 0.771 | 0.844 | ||
Span | Exact Match | 0.900 | 0.812 | 0.854 | |
Left Match | 0.931 | 0.841 | 0.886 | ||
Right Match | 0.944 | 0.852 | 0.900 | ||
Partial Match | 0.985 | 0.889 | 0.935 | ||
Experiment III
| Token | MC_B | 0.912 | 0.787 | 0.845 |
MC_I | 0.936 | 0.663 | 0.775 | ||
Overall | 0.920 | 0.735 | 0.817 | ||
Span | Exact Match | 0.886 | 0.769 | 0.824 | |
Left Match | 0.917 | 0.844 | 0.879 | ||
Right Match | 0.944 | 0.861 | 0.900 | ||
Partial Match | 0.982 | 0.852 | 0.912 |
Precision | Recall | F-measure | |
---|---|---|---|
Baseline I vs. Baseline II
|
<0.0001
|
<0.0001
|
<0.0001
|
Baseline II vs. Experiment I
|
0.003
|
<0.0001
|
<0.0001
|
Baseline II vs. Experiment II
|
0.001
|
<0.0001
|
<0.0001
|
Baseline II vs. Experiment III
|
0.0018
|
<0.0001
|
<0.0001
|
Experiment I vs. Experiment II
| 0.039 |
<0.0001
|
<0.0001
|
Experiment I vs. Experiment III
| 0.147 | 0.0409 | 0.0264 |
Experiment II vs. Experiment III
| 0.215 |
<0.0001
|
<0.0001
|