Background
Related work
Word2vec
Applications of Word2vec on biomedical publications
Data and methods
Data
Methods
Three aspects of biomedical publication data
Performance measures
Results
The recency effect
The size effect
Percentage | # of vocabularies | # of pairs identified (sim) | # of pairs identified (rel) |
---|---|---|---|
10% | 1,451,218 | 374 (66%) | 368 (63%) |
20% | 2,339,000 | 406 (72%) | 399 (68%) |
30% | 3,313,239 | 432 (76%) | 429 (73%) |
40% | 3,961,051 | 447 (79%) | 444 (76%) |
50% | 4,572,957 | 467 (83%) | 465 (79%) |
60% | 5,319,879 | 479 (85%) | 480 (82%) |
70% | 5,856,126 | 486 (86%) | 487 (83%) |
80% | 6,369,803 | 486 (86%) | 488 (83%) |
90% | 7,016,215 | 491 (87%) | 494 (84%) |
100% | 7,797,722 | 491 (87%) | 494 (84%) |
The section effect
Section | Similarity | Relatedness | ||
---|---|---|---|---|
Correlation | # of pairs | Correlation | # of pairs | |
Abstract | 0.65 | 344 (61%) | 0.66 | 339 (58%) |
Body | 0.62 | 498 (88%) | 0.59 | 503 (86%) |
Rank | Similarity | Relatedness | ||||
---|---|---|---|---|---|---|
Term 1 | Term 2 | Cosine sim. | Term 1 | Term 2 | Cosine sim. | |
1 | xanax | ativan | 0.76 | xanax | ativan | 0.76 |
2 | spiriva | serevent | 0.75 | zoloft | prozac | 0.73 |
3 | zoloft | prozac | 0.73 | pepcid | zantac | 0.71 |
4 | tylenol | motrin | 0.72 | atacand | avapro | 0.71 |
5 | pepcid | zantac | 0.71 | actonel | fosamax | 0.69 |
6 | actonel | fosamax | 0.69 | medrol | prednisolone | 0.68 |
7 | medrol | prednisolone | 0.68 | cardura | hytrin | 0.67 |
8 | cardura | hytrin | 0.67 | albuterol | serevent | 0.63 |
9 | meningism | hyperesthesia | 0.66 | photophobia | meningism | 0.62 |
10 | cozaar | diovan | 0.65 | mycosis | blastomycoses | 0.61 |