Skip to main content

14.03.2024

ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax

verfasst von: Zachary Huemann, Xin Tie, Junjie Hu, Tyler J. Bradshaw

Erschienen in: Journal of Imaging Informatics in Medicine

Einloggen, um Zugang zu erhalten

Abstract

Radiology narrative reports often describe characteristics of a patient’s disease, including its location, size, and shape. Motivated by the recent success of multimodal learning, we hypothesized that this descriptive text could guide medical image analysis algorithms. We proposed a novel vision-language model, ConTEXTual Net, for the task of pneumothorax segmentation on chest radiographs. ConTEXTual Net extracts language features from physician-generated free-form radiology reports using a pre-trained language model. We then introduced cross-attention between the language features and the intermediate embeddings of an encoder-decoder convolutional neural network to enable language guidance for image analysis. ConTEXTual Net was trained on the CANDID-PTX dataset consisting of 3196 positive cases of pneumothorax with segmentation annotations from 6 different physicians as well as clinical radiology reports. Using cross-validation, ConTEXTual Net achieved a Dice score of 0.716±0.016, which was similar to the degree of inter-reader variability (0.712±0.044) computed on a subset of the data. It outperformed vision-only models (Swin UNETR: 0.670±0.015, ResNet50 U-Net: 0.677±0.015, GLoRIA: 0.686±0.014, and nnUNet 0.694±0.016) and a competing vision-language model (LAVT: 0.706±0.009). Ablation studies confirmed that it was the text information that led to the performance gains. Additionally, we show that certain augmentation methods degraded ConTEXTual Net’s segmentation performance by breaking the image-text concordance. We also evaluated the effects of using different language models and activation functions in the cross-attention module, highlighting the efficacy of our chosen architectural design.
Literatur
1.
Zurück zum Zitat Paul Zarogoulidis, Ioannis Kioumis, Georgia Pitsiou, Konstantinos Porpodis, Sofia Lampaki, Antonis Papaiwannou, Nikolaos Katsikogiannis, Bojan Zaric, Perin Branislav, Nevena Secen, et al. Pneumothorax: from definition to diagnosis and treatment. Journal of thoracic disease, 6(Suppl 4):S372, 2014.PubMedPubMedCentral Paul Zarogoulidis, Ioannis Kioumis, Georgia Pitsiou, Konstantinos Porpodis, Sofia Lampaki, Antonis Papaiwannou, Nikolaos Katsikogiannis, Bojan Zaric, Perin Branislav, Nevena Secen, et al. Pneumothorax: from definition to diagnosis and treatment. Journal of thoracic disease, 6(Suppl 4):S372, 2014.PubMedPubMedCentral
2.
Zurück zum Zitat Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Y. Ng. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning, 2017. Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Y. Ng. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning, 2017.
3.
Zurück zum Zitat Saban Öztürk and Tolga Çukur. Focal modulation based end-to-end multi-label classification for chest x-ray image classification. In 31st Signal Processing and Communications Applications Conference, SIU 2023, Istanbul, Turkey, July 5-8, 2023, pages 1–4. IEEE, 2023. Saban Öztürk and Tolga Çukur. Focal modulation based end-to-end multi-label classification for chest x-ray image classification. In 31st Signal Processing and Communications Applications Conference, SIU 2023, Istanbul, Turkey, July 5-8, 2023, pages 1–4. IEEE, 2023.
4.
Zurück zum Zitat Şaban Öztürk, Emin Çelik, and Tolga Çukur. Content-based medical image retrieval with opponent class adaptive margin loss. Information Sciences, 637:118938, 2023.CrossRef Şaban Öztürk, Emin Çelik, and Tolga Çukur. Content-based medical image retrieval with opponent class adaptive margin loss. Information Sciences, 637:118938, 2023.CrossRef
5.
Zurück zum Zitat Şaban Öztürk, Adi Alhudhaif, and Kemal Polat. Attention-based end-to-end cnn framework for content-based x-ray image retrieval. Turkish Journal of Electrical Engineering and Computer Sciences, 2021:2680-2693, 10 2021. Şaban Öztürk, Adi Alhudhaif, and Kemal Polat. Attention-based end-to-end cnn framework for content-based x-ray image retrieval. Turkish Journal of Electrical Engineering and Computer Sciences, 2021:2680-2693, 10 2021.
6.
Zurück zum Zitat Ekin Tiu, Ellie Talius, Pujan Patel, Curtis P. Langlotz, Andrew Y. Ng, and Pranav Rajpurkar. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nature Biomedical Engineering, 6(12):1399–1406, September 2022. Ekin Tiu, Ellie Talius, Pujan Patel, Curtis P. Langlotz, Andrew Y. Ng, and Pranav Rajpurkar. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nature Biomedical Engineering, 6(12):1399–1406, September 2022.
7.
Zurück zum Zitat Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022. Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:​2204.​06125, 2022.
8.
Zurück zum Zitat Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022. Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:​2205.​11487, 2022.
9.
Zurück zum Zitat Yuhao Zhang, Hang Jiang, Yasuhide Miura, Christopher D Manning, and Curtis P Langlotz. Contrastive learning of medical visual representations from paired images and text. arXiv preprint arXiv:2010.00747, 2020. Yuhao Zhang, Hang Jiang, Yasuhide Miura, Christopher D Manning, and Curtis P Langlotz. Contrastive learning of medical visual representations from paired images and text. arXiv preprint arXiv:​2010.​00747, 2020.
10.
Zurück zum Zitat Shih-Cheng Huang, Liyue Shen, Matthew P Lungren, and Serena Yeung. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3942–3951, 2021. Shih-Cheng Huang, Liyue Shen, Matthew P Lungren, and Serena Yeung. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3942–3951, 2021.
11.
Zurück zum Zitat Riddhish Bhalodia, Ali Hatamizadeh, Leo Tam, Ziyue Xu, Xiaosong Wang, Evrim Turkbey, and Daguang Xu. Improving pneumonia localization via cross-attention on medical images and reports. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 571–581. Springer, 2021. Riddhish Bhalodia, Ali Hatamizadeh, Leo Tam, Ziyue Xu, Xiaosong Wang, Evrim Turkbey, and Daguang Xu. Improving pneumonia localization via cross-attention on medical images and reports. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 571–581. Springer, 2021.
12.
Zurück zum Zitat Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, and Philip H. S. Torr. LAVT: Language-Aware Vision Transformer for Referring Image Segmentation, April 2022. arXiv:2112.02244 [cs]. Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, and Philip H. S. Torr. LAVT: Language-Aware Vision Transformer for Referring Image Segmentation, April 2022. arXiv:​2112.​02244 [cs].
13.
Zurück zum Zitat Zihan Li, Yunxiang Li, Qingde Li, You Zhang, Puyang Wang, Dazhou Guo, Le Lu, Dakai Jin, and Qingqi Hong. Lvit: language meets vision transformer in medical image segmentation. arXiv preprint arXiv:2206.14718, 2022. Zihan Li, Yunxiang Li, Qingde Li, You Zhang, Puyang Wang, Dazhou Guo, Le Lu, Dakai Jin, and Qingqi Hong. Lvit: language meets vision transformer in medical image segmentation. arXiv preprint arXiv:​2206.​14718, 2022.
15.
Zurück zum Zitat Ayat Abedalla, Malak Abdullah, Mahmoud Al-Ayyoub, and Elhadj Benkhelifa. Chest x-ray pneumothorax segmentation using u-net with efficientnet and resnet architectures. PeerJ Computer Science, 7:e607, 2021.CrossRefPubMedPubMedCentral Ayat Abedalla, Malak Abdullah, Mahmoud Al-Ayyoub, and Elhadj Benkhelifa. Chest x-ray pneumothorax segmentation using u-net with efficientnet and resnet architectures. PeerJ Computer Science, 7:e607, 2021.CrossRefPubMedPubMedCentral
16.
Zurück zum Zitat Alexander Buslaev, Vladimir I Iglovikov, Eugene Khvedchenya, Alex Parinov, Mikhail Druzhinin, and Alexandr A Kalinin. Albumentations: fast and flexible image augmentations. Information, 11(2):125, 2020. Alexander Buslaev, Vladimir I Iglovikov, Eugene Khvedchenya, Alex Parinov, Mikhail Druzhinin, and Alexandr A Kalinin. Albumentations: fast and flexible image augmentations. Information, 11(2):125, 2020.
17.
Zurück zum Zitat Curtis P Langlotz. Radlex: a new method for indexing online educational materials, 2006. Curtis P Langlotz. Radlex: a new method for indexing online educational materials, 2006.
18.
Zurück zum Zitat Sijing Feng, Damian Azzollini, Ji Soo Kim, Cheng-Kai Jin, Simon P Gordon, Jason Yeoh, Eve Kim, Mina Han, Andrew Lee, Aakash Patel, et al. Curation of the candid-ptx dataset with free-text reports. Radiology: Artificial Intelligence, 3(6), 2021. Sijing Feng, Damian Azzollini, Ji Soo Kim, Cheng-Kai Jin, Simon P Gordon, Jason Yeoh, Eve Kim, Mina Han, Andrew Lee, Aakash Patel, et al. Curation of the candid-ptx dataset with free-text reports. Radiology: Artificial Intelligence, 3(6), 2021.
19.
Zurück zum Zitat Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
20.
Zurück zum Zitat Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition, December 2015. arXiv:1512.03385 [cs]. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition, December 2015. arXiv:​1512.​03385 [cs].
21.
Zurück zum Zitat Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger Roth, and Daguang Xu. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images, January 2022. arXiv:2201.01266 [cs, eess]. Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger Roth, and Daguang Xu. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images, January 2022. arXiv:​2201.​01266 [cs, eess].
22.
Zurück zum Zitat Fabian Isensee, Paul F. Jaeger, Simon A. A. Kohl, Jens Petersen, and Klaus H. Maier-Hein. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2):203–211, February 2021. Fabian Isensee, Paul F. Jaeger, Simon A. A. Kohl, Jens Petersen, and Klaus H. Maier-Hein. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2):203–211, February 2021.
23.
Zurück zum Zitat Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020.
24.
Zurück zum Zitat Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:​1907.​11692, 2019.
25.
Zurück zum Zitat An Yan, Julian McAuley, Xing Lu, Jiang Du, Eric Y. Chang, Amilcare Gentili, and Chun-Nan Hsu. RadBERT: Adapting Transformer-based Language Models to Radiology. Radiology: Artificial Intelligence, 4(4):e210258, July 2022. An Yan, Julian McAuley, Xing Lu, Jiang Du, Eric Y. Chang, Amilcare Gentili, and Chun-Nan Hsu. RadBERT: Adapting Transformer-based Language Models to Radiology. Radiology: Artificial Intelligence, 4(4):e210258, July 2022.
26.
Zurück zum Zitat Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018.
27.
Zurück zum Zitat Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020.
28.
Zurück zum Zitat Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE. 2021. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE. 2021.
29.
Zurück zum Zitat Zhigang Li, Haidong Huang, Qiang Li, Konstantinos Zarogoulidis, Ioanna Kougioumtzi, Georgios Dryllis, Ioannis Kioumis, Georgia Pitsiou, Nikolaos Machairiotis, Nikolaos Katsikogiannis, et al. Pneumothorax: observation. Journal of Thoracic Disease, 6(Suppl 4):S421, 2014.PubMedPubMedCentral Zhigang Li, Haidong Huang, Qiang Li, Konstantinos Zarogoulidis, Ioanna Kougioumtzi, Georgios Dryllis, Ioannis Kioumis, Georgia Pitsiou, Nikolaos Machairiotis, Nikolaos Katsikogiannis, et al. Pneumothorax: observation. Journal of Thoracic Disease, 6(Suppl 4):S421, 2014.PubMedPubMedCentral
30.
Zurück zum Zitat Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Roger G Mark, and Steven Horng. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):1–8, 2019. Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Roger G Mark, and Steven Horng. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):1–8, 2019.
31.
Zurück zum Zitat Alexey Tolkachev, Ilyas Sirazitdinov, Maksym Kholiavchenko, Tamerlan Mustafaev, and Bulat Ibragimov. Deep learning for diagnosis and segmentation of pneumothorax: the results on the kaggle competition and validation against radiologists. IEEE Journal of Biomedical and Health Informatics, 25(5):1660–1672, 2020.CrossRef Alexey Tolkachev, Ilyas Sirazitdinov, Maksym Kholiavchenko, Tamerlan Mustafaev, and Bulat Ibragimov. Deep learning for diagnosis and segmentation of pneumothorax: the results on the kaggle competition and validation against radiologists. IEEE Journal of Biomedical and Health Informatics, 25(5):1660–1672, 2020.CrossRef
Metadaten
Titel
ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax
verfasst von
Zachary Huemann
Xin Tie
Junjie Hu
Tyler J. Bradshaw
Publikationsdatum
14.03.2024
Verlag
Springer International Publishing
Erschienen in
Journal of Imaging Informatics in Medicine
Print ISSN: 2948-2925
Elektronische ISSN: 2948-2933
DOI
https://doi.org/10.1007/s10278-024-01051-8

Darf man die Behandlung eines Neonazis ablehnen?

08.05.2024 Gesellschaft Nachrichten

In einer Leseranfrage in der Zeitschrift Journal of the American Academy of Dermatology möchte ein anonymer Dermatologe bzw. eine anonyme Dermatologin wissen, ob er oder sie einen Patienten behandeln muss, der eine rassistische Tätowierung trägt.

Ein Drittel der jungen Ärztinnen und Ärzte erwägt abzuwandern

07.05.2024 Klinik aktuell Nachrichten

Extreme Arbeitsverdichtung und kaum Supervision: Dr. Andrea Martini, Sprecherin des Bündnisses Junge Ärztinnen und Ärzte (BJÄ) über den Frust des ärztlichen Nachwuchses und die Vorteile des Rucksack-Modells.

Endlich: Zi zeigt, mit welchen PVS Praxen zufrieden sind

IT für Ärzte Nachrichten

Darauf haben viele Praxen gewartet: Das Zi hat eine Liste von Praxisverwaltungssystemen veröffentlicht, die von Nutzern positiv bewertet werden. Eine gute Grundlage für wechselwillige Ärztinnen und Psychotherapeuten.

Akuter Schwindel: Wann lohnt sich eine MRT?

28.04.2024 Schwindel Nachrichten

Akuter Schwindel stellt oft eine diagnostische Herausforderung dar. Wie nützlich dabei eine MRT ist, hat eine Studie aus Finnland untersucht. Immerhin einer von sechs Patienten wurde mit akutem ischämischem Schlaganfall diagnostiziert.

Update Radiologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.