nach oben

International Journal of Computer Assisted Radiology and Surgery

Erschienen in:

21.02.2024 | Original Article

Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions

verfasst von: Avnish Sood, Nina Mansoor, Caroline Memmi, Magnus Lynch, Jeremy Lynch

Erschienen in: International Journal of Computer Assisted Radiology and Surgery | Ausgabe 4/2024

Einloggen, um Zugang zu erhalten

Abstract

Purpose

AI-image interpretation, through convolutional neural networks, shows increasing capability within radiology. These models have achieved impressive performance in specific tasks within controlled settings, but possess inherent limitations, such as the inability to consider clinical context. We assess the ability of large language models (LLMs) within the context of radiology specialty exams to determine whether they can evaluate relevant clinical information.

Methods

A database of questions was created with official sample, author written, and textbook questions based on the Royal College of Radiology (United Kingdom) FRCR 2A and American Board of Radiology (ABR) Certifying examinations. The questions were input into the Generative Pretrained Transformer (GPT) versions 3 and 4, with prompting to answer the questions.

Results

One thousand seventy-two questions were evaluated by GPT-3 and GPT-4. 495 (46.2%) were for the FRCR 2A and 577 (53.8%) were for the ABR exam. There were 890 single best answers (SBA), and 182 true/false questions. GPT-4 was correct in 629/890 (70.7%) SBA and 151/182 (83.0%) true/false questions. There was no degradation on author written questions. GPT-4 performed significantly better than GPT-3 which selected the correct answer in 282/890 (31.7%) SBA and 111/182 (61.0%) true/false questions. Performance of GPT-4 was similar across both examinations for all categories of question.

Conclusion

The newest generation of LLMs, GPT-4, demonstrates high capability in answering radiology exam questions. It shows marked improvement from GPT-3, suggesting further improvements in accuracy are possible. Further research is needed to explore the clinical applicability of these AI models in real-world settings.

Kelly BS, Judge C, Bollard SM, Clifford SM, Healy GM, Aziz A, Mathur P, Islam S, Yeom KW, Lawlor A, Killeen RP (2022) Radiology artificial intelligence: a systematic review and evaluation of methods (RAISE). Eur Radiol 32(11):7998–8007. https://doi.org/10.1007/s00330-022-08784-6CrossRefPubMedPubMedCentral

Aggarwal R, Sounderajah V, Martin G, Ting DSW, Karthikesalingam A, King D, Ashrafian H, Darzi A (2021) Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. npj Digit Med 4(1):65. https://doi.org/10.1038/s41746-021-00438-zCrossRefPubMedPubMedCentral

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need [Internet]. Accessed 2023 Apr 16. Available from: https://arxiv.org/abs/1706.03762

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D (2023) How does ChatGPT perform on the united states medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312CrossRefPubMedPubMedCentral

OpenAI. (2023) GPT-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf

Lindsay R (2012) SBAs for the final FRCR 2A. Oxford University Press, OxfordCrossRef

R Core Team (2020) R: A language and environment for statistical computing [Internet]. Vienna, Austria: R foundation for statistical computing. Available from: https://www.R-project.org/

Shelmerdine SC, Martin H, Shirodkar K, Shamshuddin S, Weir-McCall JR (2022) Can artificial intelligence pass the fellowship of the royal college of radiologists examination? Multi-reader diagnostic accuracy study. BMJ 379:e072826. https://doi.org/10.1136/bmj-2022-072826CrossRefPubMedPubMedCentral

Yu AC, Mohajer B, Eng J (2022) External validation of deep learning algorithms for radiologic diagnosis: a systematic review. Radiol Artif Intell 4(3):e210064. https://doi.org/10.1148/ryai.210064CrossRefPubMedPubMedCentral

10.

Waisberg E, Ong J, Masalkhi M, Kamran SA, Zaman N, Sarker P, Lee AG, Tavakkoli A (2023) GPT-4: a new era of artificial intelligence in medicine. Ir J Med Sci. https://doi.org/10.1007/s11845-023-03377-8CrossRefPubMedPubMedCentral

11.

Janssen BV, Kazemier G, Besselink MG (2023) The use of ChatGPT and other large language models in surgical science. BJS open. 7(2):zrad032. https://doi.org/10.1093/bjsopen/zrad032CrossRefPubMedPubMedCentral

12.

Hardy M, Harvey H (2020) Artificial intelligence in diagnostic imaging: Impact on the radiography profession. Br J Radiol 93(1108):20190840. https://doi.org/10.1259/bjr.20190840CrossRefPubMedPubMedCentral

13.

Vincoff NS, Barish MA, Grimaldi G (2022) The patient-friendly radiology report: history, evolution, challenges and opportunities. Clin Imaging 89:128–135. https://doi.org/10.1016/j.clinimag.2022.06.018CrossRefPubMed

14.

Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Y, Ishii E, Bang YJ, Madotto A, Fung P (2023) Survey of hallucination in natural language generation. ACM Comput Surv 55(12):1–38. https://doi.org/10.1145/3571730CrossRef

Titel: Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions
verfasst von: Avnish Sood
Nina Mansoor
Caroline Memmi
Magnus Lynch
Jeremy Lynch
Publikationsdatum: 21.02.2024
Verlag: Springer International Publishing
Erschienen in: International Journal of Computer Assisted Radiology and Surgery / Ausgabe 4/2024
Print ISSN: 1861-6410
Elektronische ISSN: 1861-6429
DOI: https://doi.org/10.1007/s11548-024-03071-9

Update Radiologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.

Newsletter bestellen

Springer Medizin

Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions

Abstract

Purpose

Methods

Results

Conclusion

Neu im Fachgebiet Radiologie

Darf man die Behandlung eines Neonazis ablehnen?

Ein Drittel der jungen Ärztinnen und Ärzte erwägt abzuwandern

Endlich: Zi zeigt, mit welchen PVS Praxen zufrieden sind

Akuter Schwindel: Wann lohnt sich eine MRT?

Update Radiologie

Springer Medizin

Abstract

Purpose

Methods

Results

Conclusion

Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten

Weitere Artikel der Ausgabe 4/2024

Model guided medicine and the search for truth

Algorithmically designed flaps in tongue reconstruction: a feasibility analysis

Rapid detection of non-normal teeth on dental X-ray images using improved Mask R-CNN with attention mechanism

Automatic image registration on intraoperative CBCT compared to Surface Matching registration on preoperative CT for spinal navigation: accuracy and workflow

A semantic fidelity interpretable-assisted decision model for lung nodule classification

Efficient intraoral photogrammetry using self-identifying projective invariant marker

Neu im Fachgebiet Radiologie

Darf man die Behandlung eines Neonazis ablehnen?

Ein Drittel der jungen Ärztinnen und Ärzte erwägt abzuwandern

Endlich: Zi zeigt, mit welchen PVS Praxen zufrieden sind

Akuter Schwindel: Wann lohnt sich eine MRT?

Update Radiologie