Skip to main content
Erschienen in: Journal of Digital Imaging 1/2019

03.08.2018

Automatic Normalization of Anatomical Phrases in Radiology Reports Using Unsupervised Learning

verfasst von: Amir M. Tahmasebi, Henghui Zhu, Gabriel Mankovich, Peter Prinsen, Prescott Klassen, Sam Pilato, Rob van Ommering, Pritesh Patel, Martin L. Gunn, Paul Chang

Erschienen in: Journal of Imaging Informatics in Medicine | Ausgabe 1/2019

Einloggen, um Zugang zu erhalten

Abstract

In today’s radiology workflow, free-text reporting is established as the most common medium to capture, store, and communicate clinical information. Radiologists routinely refer to prior radiology reports of a patient to recall critical information for new diagnosis, which is quite tedious, time consuming, and prone to human error. Automatic structuring of report content is desired to facilitate such inquiry of information. In this work, we propose an unsupervised machine learning approach to automatically structure radiology reports by detecting and normalizing anatomical phrases based on the Systematized Nomenclature of Medicine—Clinical Terms (SNOMED CT) ontology. The proposed approach combines word embedding-based semantic learning with ontology-based concept mapping to derive the desired concept normalization. The word embedding model was trained using a large corpus of unlabeled radiology reports. Fifty-six anatomical labels were extracted from SNOMED CT as class labels of the whole human anatomy. The proposed framework was compared against a number of state-of-the-art supervised and unsupervised approaches. Radiology reports from three different clinical sites were manually labeled for testing. The proposed approach outperformed other techniques yielding an average precision of 82.6%. The proposed framework boosts the coverage and performance of conventional approaches for concept normalization, by applying word embedding techniques in semantic learning, while avoiding the challenge of having access to a large amount of annotated data, which is typically required for training classifiers.
Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat M. Q. Stearns, C. Price, K. A. Spackman, and A. Y. Wang, SNOMED clinical terms: overview of the development process and project status, in Proceedings of the AMIA Symposium, 2001, p. 662. M. Q. Stearns, C. Price, K. A. Spackman, and A. Y. Wang, SNOMED clinical terms: overview of the development process and project status, in Proceedings of the AMIA Symposium, 2001, p. 662.
2.
Zurück zum Zitat D. B. Johnson, R. K. Taira, A. F. Cardenas, and D. R. Aberle, Extracting information from free text radiology reports, vol. 1, no. 3, pp. 297–308, 1997. D. B. Johnson, R. K. Taira, A. F. Cardenas, and D. R. Aberle, Extracting information from free text radiology reports, vol. 1, no. 3, pp. 297–308, 1997.
3.
Zurück zum Zitat O. Bodenreider, The unified medical language system (UMLS): integrating biomedical terminology, Nucleic acids research, vol. 32, no. suppl_1, pp. D267–D270, 2004. O. Bodenreider, The unified medical language system (UMLS): integrating biomedical terminology, Nucleic acids research, vol. 32, no. suppl_1, pp. D267–D270, 2004.
4.
Zurück zum Zitat Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 17(5):507–513, 2010CrossRefPubMedPubMedCentral Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 17(5):507–513, 2010CrossRefPubMedPubMedCentral
5.
Zurück zum Zitat S. Goryachev, M. Sordo, and Q. T. Zeng, A suite of natural language processing tools developed for the I2B2 project, in AMIA Annual Symposium Proceedings, 2006, vol. 2006, p. 931. S. Goryachev, M. Sordo, and Q. T. Zeng, A suite of natural language processing tools developed for the I2B2 project, in AMIA Annual Symposium Proceedings, 2006, vol. 2006, p. 931.
6.
Zurück zum Zitat G. Hripcsak, C. Friedman, P. O. Alderson, W. DuMouchel, S. B. Johnson, and P. D. Clayton, Unlocking clinical data from narrative reports: a study of natural language processing, vol. 122, no. 9, pp. 681–688, 1995. G. Hripcsak, C. Friedman, P. O. Alderson, W. DuMouchel, S. B. Johnson, and P. D. Clayton, Unlocking clinical data from narrative reports: a study of natural language processing, vol. 122, no. 9, pp. 681–688, 1995.
7.
Zurück zum Zitat C. Friedman, P. O. Alderson, J. H. Austin, J. J. Cimino, and S. B. Johnson, A general natural-language text processor for clinical radiology, vol. 1, no. 2, pp. 161–174, 1994. C. Friedman, P. O. Alderson, J. H. Austin, J. J. Cimino, and S. B. Johnson, A general natural-language text processor for clinical radiology, vol. 1, no. 2, pp. 161–174, 1994.
8.
Zurück zum Zitat C. Friedman, L. Shagina, Y. Lussier, and G. Hripcsak, Automated encoding of clinical documents based on natural language processing, vol. 11, no. 5, pp. 392–402, 2004. C. Friedman, L. Shagina, Y. Lussier, and G. Hripcsak, Automated encoding of clinical documents based on natural language processing, vol. 11, no. 5, pp. 392–402, 2004.
9.
Zurück zum Zitat J.-D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, Introduction to the bio-entity recognition task at JNLPBA, in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, 2004, pp. 70–75. J.-D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, Introduction to the bio-entity recognition task at JNLPBA, in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, 2004, pp. 70–75.
10.
Zurück zum Zitat M. Gerner, G. Nenadic, and C. M. Bergman, An exploration of mining gene expression mentions and their anatomical locations from biomedical text, in Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, 2010, pp. 72–80. M. Gerner, G. Nenadic, and C. M. Bergman, An exploration of mining gene expression mentions and their anatomical locations from biomedical text, in Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, 2010, pp. 72–80.
11.
Zurück zum Zitat A. R. Aronson, Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program, in Proceedings of the AMIA Symposium, 2001, p. 17. A. R. Aronson, Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program, in Proceedings of the AMIA Symposium, 2001, p. 17.
12.
Zurück zum Zitat Taira RK, Soderland SG, Jakobovits RM: Automatic structuring of radiology free-text reports. Radiographics 21(1):237–245, 2001CrossRefPubMed Taira RK, Soderland SG, Jakobovits RM: Automatic structuring of radiology free-text reports. Radiographics 21(1):237–245, 2001CrossRefPubMed
14.
Zurück zum Zitat D. Campos, S. Matos, and J. L. Oliveira, Biomedical named entity recognition: a survey of machine-learning tools, in Theory and Applications for Advanced Text Mining, InTech, 2012. D. Campos, S. Matos, and J. L. Oliveira, Biomedical named entity recognition: a survey of machine-learning tools, in Theory and Applications for Advanced Text Mining, InTech, 2012.
15.
Zurück zum Zitat B. Tang, H. Cao, X. Wang, Q. Chen, and H. Xu, Evaluating word representation features in biomedical named entity recognition tasks, vol. 2014, 2014. B. Tang, H. Cao, X. Wang, Q. Chen, and H. Xu, Evaluating word representation features in biomedical named entity recognition tasks, vol. 2014, 2014.
16.
Zurück zum Zitat Y. Wu, J. Xu, M. Jiang, Y. Zhang, and H. Xu, A study of neural word embeddings for named entity recognition in clinical text, in AMIA Annual Symposium Proceedings, 2015, vol. 2015, p. 1326. Y. Wu, J. Xu, M. Jiang, Y. Zhang, and H. Xu, A study of neural word embeddings for named entity recognition in clinical text, in AMIA Annual Symposium Proceedings, 2015, vol. 2015, p. 1326.
17.
Zurück zum Zitat N. Limsopatham and N. Collier, Normalising medical concepts in social media texts by learning semantic representation, in ACL (1), 2016. N. Limsopatham and N. Collier, Normalising medical concepts in social media texts by learning semantic representation, in ACL (1), 2016.
18.
Zurück zum Zitat Y. Bengio, Deep learning of representations for unsupervised and transfer learning, in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 2012, pp. 17–36. Y. Bengio, Deep learning of representations for unsupervised and transfer learning, in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 2012, pp. 17–36.
19.
Zurück zum Zitat A. Ferré, P. Zweigenbaum, and C. Nédellec, Representation of complex terms in a vector space structured by an ontology for a normalization task, BioNLP 2017, pp. 99–106, 2017. A. Ferré, P. Zweigenbaum, and C. Nédellec, Representation of complex terms in a vector space structured by an ontology for a normalization task, BioNLP 2017, pp. 99–106, 2017.
20.
Zurück zum Zitat Peter Prinsen, Robert van Ommering, Gabe Mankovich, Lucas Oliveira, Vadiraj Hombal, and Amir Tahmasebi, A novel approach for improving the recall of concept detection in medical documents using extended ontologies, in SIIM 2017 Scientific Session, 2017. Peter Prinsen, Robert van Ommering, Gabe Mankovich, Lucas Oliveira, Vadiraj Hombal, and Amir Tahmasebi, A novel approach for improving the recall of concept detection in medical documents using extended ontologies, in SIIM 2017 Scientific Session, 2017.
21.
Zurück zum Zitat S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc., 2009. S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc., 2009.
22.
Zurück zum Zitat T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient Estimation of Word Representations in Vector Space, 2013. T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient Estimation of Word Representations in Vector Space, 2013.
23.
Zurück zum Zitat J. Pennington, R. Socher, and C. D. Manning, Glove: global vectors for word representation, in EMNLP, 2014, vol. 14, pp. 1532–1543. J. Pennington, R. Socher, and C. D. Manning, Glove: global vectors for word representation, in EMNLP, 2014, vol. 14, pp. 1532–1543.
24.
Zurück zum Zitat N. Shazeer, R. Doherty, C. Evans, and C. Waterson, Swivel: improving embeddings by noticing what’s missing, arXiv preprint arXiv:1602.02215, 2016. N. Shazeer, R. Doherty, C. Evans, and C. Waterson, Swivel: improving embeddings by noticing what’s missing, arXiv preprint arXiv:1602.02215, 2016.
25.
Zurück zum Zitat B. Chiu, G. Crichton, A. Korhonen, and S. Pyysalo, How to train good word embeddings for biomedical NLP, Proceedings of BioNLP16, p. 166, 2016. B. Chiu, G. Crichton, A. Korhonen, and S. Pyysalo, How to train good word embeddings for biomedical NLP, Proceedings of BioNLP16, p. 166, 2016.
26.
Zurück zum Zitat Q. V. Le and T. Mikolov, Distributed representations of sentences and documents, in ICML, 2014, vol. 14, pp. 1188–1196. Q. V. Le and T. Mikolov, Distributed representations of sentences and documents, in ICML, 2014, vol. 14, pp. 1188–1196.
27.
Zurück zum Zitat Salton G, Buckley C: Term-weighting approaches in automatic text retrieval. Information processing & management 24(5):513–523, 1988CrossRef Salton G, Buckley C: Term-weighting approaches in automatic text retrieval. Information processing & management 24(5):513–523, 1988CrossRef
28.
Zurück zum Zitat P. Stenetorp, S. Pyysalo, G. Topić, T. Ohta, S. Ananiadou, and J. Tsujii, BRAT: a web-based tool for NLP-assisted text annotation, in Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp. 102–107. P. Stenetorp, S. Pyysalo, G. Topić, T. Ohta, S. Ananiadou, and J. Tsujii, BRAT: a web-based tool for NLP-assisted text annotation, in Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp. 102–107.
29.
Zurück zum Zitat Artstein R, Poesio M: Inter-coder agreement for computational linguistics. Computational Linguistics 34(4):555–596, 2008CrossRef Artstein R, Poesio M: Inter-coder agreement for computational linguistics. Computational Linguistics 34(4):555–596, 2008CrossRef
30.
Zurück zum Zitat Hochreiter S, Schmidhuber J: Long short-term memory. Neural computation 9(8):1735–1780, 1997CrossRefPubMed Hochreiter S, Schmidhuber J: Long short-term memory. Neural computation 9(8):1735–1780, 1997CrossRefPubMed
31.
Zurück zum Zitat X. Ma and E. Hovy, End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF, 2016. X. Ma and E. Hovy, End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF, 2016.
32.
Zurück zum Zitat L. van der Maaten and G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008. L. van der Maaten and G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008.
33.
Zurück zum Zitat Reiner BI, Knight N, Siegel EL: Radiology reporting, past, present, and future: the radiologist’s perspective. Journal of the American College of Radiology 4(5):313–319, 2007CrossRefPubMed Reiner BI, Knight N, Siegel EL: Radiology reporting, past, present, and future: the radiologist’s perspective. Journal of the American College of Radiology 4(5):313–319, 2007CrossRefPubMed
34.
Zurück zum Zitat C. L. Clarke, N. Craswell, and I. Soboroff, Overview of the TREC 2004 Terabyte Track, in TREC, 2004, vol. 4, p. 74. C. L. Clarke, N. Craswell, and I. Soboroff, Overview of the TREC 2004 Terabyte Track, in TREC, 2004, vol. 4, p. 74.
35.
Zurück zum Zitat Porter MF: An algorithm for suffix stripping. Program 14(3):130–137, 1980CrossRef Porter MF: An algorithm for suffix stripping. Program 14(3):130–137, 1980CrossRef
36.
Zurück zum Zitat M. F. Porter, Snowball: A Language for Stemming Algorithms. 2001. M. F. Porter, Snowball: A Language for Stemming Algorithms. 2001.
Metadaten
Titel
Automatic Normalization of Anatomical Phrases in Radiology Reports Using Unsupervised Learning
verfasst von
Amir M. Tahmasebi
Henghui Zhu
Gabriel Mankovich
Peter Prinsen
Prescott Klassen
Sam Pilato
Rob van Ommering
Pritesh Patel
Martin L. Gunn
Paul Chang
Publikationsdatum
03.08.2018
Verlag
Springer International Publishing
Erschienen in
Journal of Imaging Informatics in Medicine / Ausgabe 1/2019
Print ISSN: 2948-2925
Elektronische ISSN: 2948-2933
DOI
https://doi.org/10.1007/s10278-018-0116-5

Weitere Artikel der Ausgabe 1/2019

Journal of Digital Imaging 1/2019 Zur Ausgabe

Update Radiologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.