nach oben

Journal of Imaging Informatics in Medicine

14.03.2024

ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax

verfasst von: Zachary Huemann, Xin Tie, Junjie Hu, Tyler J. Bradshaw

Erschienen in: Journal of Imaging Informatics in Medicine

Einloggen, um Zugang zu erhalten

Abstract

Radiology narrative reports often describe characteristics of a patient’s disease, including its location, size, and shape. Motivated by the recent success of multimodal learning, we hypothesized that this descriptive text could guide medical image analysis algorithms. We proposed a novel vision-language model, ConTEXTual Net, for the task of pneumothorax segmentation on chest radiographs. ConTEXTual Net extracts language features from physician-generated free-form radiology reports using a pre-trained language model. We then introduced cross-attention between the language features and the intermediate embeddings of an encoder-decoder convolutional neural network to enable language guidance for image analysis. ConTEXTual Net was trained on the CANDID-PTX dataset consisting of 3196 positive cases of pneumothorax with segmentation annotations from 6 different physicians as well as clinical radiology reports. Using cross-validation, ConTEXTual Net achieved a Dice score of 0.716±0.016, which was similar to the degree of inter-reader variability (0.712±0.044) computed on a subset of the data. It outperformed vision-only models (Swin UNETR: 0.670±0.015, ResNet50 U-Net: 0.677±0.015, GLoRIA: 0.686±0.014, and nnUNet 0.694±0.016) and a competing vision-language model (LAVT: 0.706±0.009). Ablation studies confirmed that it was the text information that led to the performance gains. Additionally, we show that certain augmentation methods degraded ConTEXTual Net’s segmentation performance by breaking the image-text concordance. We also evaluated the effects of using different language models and activation functions in the cross-attention module, highlighting the efficacy of our chosen architectural design.

https://github.com/zhuemann/ConTEXTualSegmentation

Paul Zarogoulidis, Ioannis Kioumis, Georgia Pitsiou, Konstantinos Porpodis, Sofia Lampaki, Antonis Papaiwannou, Nikolaos Katsikogiannis, Bojan Zaric, Perin Branislav, Nevena Secen, et al. Pneumothorax: from definition to diagnosis and treatment. Journal of thoracic disease, 6(Suppl 4):S372, 2014.PubMedPubMedCentral

Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Y. Ng. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning, 2017.

Saban Öztürk and Tolga Çukur. Focal modulation based end-to-end multi-label classification for chest x-ray image classification. In 31st Signal Processing and Communications Applications Conference, SIU 2023, Istanbul, Turkey, July 5-8, 2023, pages 1–4. IEEE, 2023.

Şaban Öztürk, Emin Çelik, and Tolga Çukur. Content-based medical image retrieval with opponent class adaptive margin loss. Information Sciences, 637:118938, 2023.CrossRef

Şaban Öztürk, Adi Alhudhaif, and Kemal Polat. Attention-based end-to-end cnn framework for content-based x-ray image retrieval. Turkish Journal of Electrical Engineering and Computer Sciences, 2021:2680-2693, 10 2021.

Ekin Tiu, Ellie Talius, Pujan Patel, Curtis P. Langlotz, Andrew Y. Ng, and Pranav Rajpurkar. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nature Biomedical Engineering, 6(12):1399–1406, September 2022.

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.

Yuhao Zhang, Hang Jiang, Yasuhide Miura, Christopher D Manning, and Curtis P Langlotz. Contrastive learning of medical visual representations from paired images and text. arXiv preprint arXiv:2010.00747, 2020.

10.

Shih-Cheng Huang, Liyue Shen, Matthew P Lungren, and Serena Yeung. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3942–3951, 2021.

11.

Riddhish Bhalodia, Ali Hatamizadeh, Leo Tam, Ziyue Xu, Xiaosong Wang, Evrim Turkbey, and Daguang Xu. Improving pneumonia localization via cross-attention on medical images and reports. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 571–581. Springer, 2021.

12.

Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, and Philip H. S. Torr. LAVT: Language-Aware Vision Transformer for Referring Image Segmentation, April 2022. arXiv:2112.02244 [cs].

13.

Zihan Li, Yunxiang Li, Qingde Li, You Zhang, Puyang Wang, Dazhou Guo, Le Lu, Dakai Jin, and Qingqi Hong. Lvit: language meets vision transformer in medical image segmentation. arXiv preprint arXiv:2206.14718, 2022.

14.

Aimoldin Anuar. SIIM-ACR Pneumothorax Segmentation. https://github.com/sneddy/pneumothorax-segmentation, 2019.

15.

Ayat Abedalla, Malak Abdullah, Mahmoud Al-Ayyoub, and Elhadj Benkhelifa. Chest x-ray pneumothorax segmentation using u-net with efficientnet and resnet architectures. PeerJ Computer Science, 7:e607, 2021.CrossRefPubMedPubMedCentral

16.

Alexander Buslaev, Vladimir I Iglovikov, Eugene Khvedchenya, Alex Parinov, Mikhail Druzhinin, and Alexandr A Kalinin. Albumentations: fast and flexible image augmentations. Information, 11(2):125, 2020.

17.

Curtis P Langlotz. Radlex: a new method for indexing online educational materials, 2006.

18.

Sijing Feng, Damian Azzollini, Ji Soo Kim, Cheng-Kai Jin, Simon P Gordon, Jason Yeoh, Eve Kim, Mina Han, Andrew Lee, Aakash Patel, et al. Curation of the candid-ptx dataset with free-text reports. Radiology: Artificial Intelligence, 3(6), 2021.

19.

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.

20.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition, December 2015. arXiv:1512.03385 [cs].

21.

Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger Roth, and Daguang Xu. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images, January 2022. arXiv:2201.01266 [cs, eess].

22.

Fabian Isensee, Paul F. Jaeger, Simon A. A. Kohl, Jens Petersen, and Klaus H. Maier-Hein. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2):203–211, February 2021.

23.

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020.

24.

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.

25.

An Yan, Julian McAuley, Xing Lu, Jiang Du, Eric Y. Chang, Amilcare Gentili, and Chun-Nan Hsu. RadBERT: Adapting Transformer-based Language Models to Radiology. Radiology: Artificial Intelligence, 4(4):e210258, July 2022.

26.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018.

27.

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020.

28.

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE. 2021.

29.

Zhigang Li, Haidong Huang, Qiang Li, Konstantinos Zarogoulidis, Ioanna Kougioumtzi, Georgios Dryllis, Ioannis Kioumis, Georgia Pitsiou, Nikolaos Machairiotis, Nikolaos Katsikogiannis, et al. Pneumothorax: observation. Journal of Thoracic Disease, 6(Suppl 4):S421, 2014.PubMedPubMedCentral

30.

Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Roger G Mark, and Steven Horng. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):1–8, 2019.

31.

Alexey Tolkachev, Ilyas Sirazitdinov, Maksym Kholiavchenko, Tamerlan Mustafaev, and Bulat Ibragimov. Deep learning for diagnosis and segmentation of pneumothorax: the results on the kaggle competition and validation against radiologists. IEEE Journal of Biomedical and Health Informatics, 25(5):1660–1672, 2020.CrossRef

Titel: ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax
verfasst von: Zachary Huemann
Xin Tie
Junjie Hu
Tyler J. Bradshaw
Publikationsdatum: 14.03.2024
Verlag: Springer International Publishing
Erschienen in: Journal of Imaging Informatics in Medicine
Print ISSN: 2948-2925
Elektronische ISSN: 2948-2933
DOI: https://doi.org/10.1007/s10278-024-01051-8

Update Radiologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.

Newsletter bestellen

Springer Medizin

ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax

Abstract

Neu im Fachgebiet Radiologie

Darf man die Behandlung eines Neonazis ablehnen?

Ein Drittel der jungen Ärztinnen und Ärzte erwägt abzuwandern

Endlich: Zi zeigt, mit welchen PVS Praxen zufrieden sind

Akuter Schwindel: Wann lohnt sich eine MRT?

Update Radiologie

Springer Medizin

Abstract

Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten

Neu im Fachgebiet Radiologie

Darf man die Behandlung eines Neonazis ablehnen?

Ein Drittel der jungen Ärztinnen und Ärzte erwägt abzuwandern

Endlich: Zi zeigt, mit welchen PVS Praxen zufrieden sind

Akuter Schwindel: Wann lohnt sich eine MRT?

Update Radiologie