Skip to main content
Erschienen in: Archives of Gynecology and Obstetrics 6/2023

05.09.2023 | General Gynecology

Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations

verfasst von: Adiel Cohen, Roie Alter, Naama Lessans, Raanan Meyer, Yoav Brezinov, Gabriel Levin

Erschienen in: Archives of Gynecology and Obstetrics | Ausgabe 6/2023

Einloggen, um Zugang zu erhalten

Abstract

Purpose

Previous studies of ChatGPT performance in the field of medical examinations have reached contradictory results. Moreover, the performance of ChatGPT in other languages other than English is yet to be explored. We aim to study the performance of ChatGPT in Hebrew OBGYN-‘Shlav-Alef’ (Phase 1) examination.

Methods

A performance study was conducted using a consecutive sample of text-based multiple choice questions, originated from authentic Hebrew OBGYN-‘Shlav-Alef’ examinations in 2021–2022. We constructed 150 multiple choice questions from consecutive text-based-only original questions. We compared the performance of ChatGPT performance to the real-life actual performance of OBGYN residents who completed the tests in 2021–2022. We also compared ChatGTP Hebrew performance vs. previously published English medical tests.

Results

In 2021–2022, 27.8% of OBGYN residents failed the ‘Shlav-Alef’ examination and the mean score of the residents was 68.4. Overall, 150 authentic questions were evaluated (one examination). ChatGPT correctly answered 58 questions (38.7%) and reached a failed score. The performance of Hebrew ChatGPT was lower when compared to actual performance of residents: 38.7% vs. 68.4%, p < .001. In a comparison to ChatGPT performance in 9,091 English language questions in the field of medicine, the performance of Hebrew ChatGPT was lower (38.7% in Hebrew vs. 60.7% in English, p < .001).

Conclusions

ChatGPT answered correctly on less than 40% of Hebrew OBGYN resident examination questions. Residents cannot rely on ChatGPT for the preparation of this examination. Efforts should be made to improve ChatGPT performance in other languages besides English.
Literatur
1.
Zurück zum Zitat Cox SM et al (1994) Assessment of the resident in-training examination in obstetrics and gynecology. Obstet Gynecol 84(6):1051–1054PubMed Cox SM et al (1994) Assessment of the resident in-training examination in obstetrics and gynecology. Obstet Gynecol 84(6):1051–1054PubMed
2.
Zurück zum Zitat Hollier LM et al (2002) Effect of a resident-created study guide on examination scores. Obstet Gynecol 99(1):95–100PubMed Hollier LM et al (2002) Effect of a resident-created study guide on examination scores. Obstet Gynecol 99(1):95–100PubMed
3.
Zurück zum Zitat Withiam-Leitch M, Olawaiye A (2008) Resident performance on the in-training and board examinations in obstetrics and gynecology: implications for the ACGME outcome project. Teach Learn Med 20(2):136–142CrossRefPubMed Withiam-Leitch M, Olawaiye A (2008) Resident performance on the in-training and board examinations in obstetrics and gynecology: implications for the ACGME outcome project. Teach Learn Med 20(2):136–142CrossRefPubMed
5.
Zurück zum Zitat Pekar Zlotin M et al (2022) Preparation for final board exam in obstetrics and gynecology following the outbreak of the COVID 19 pandemic. Harefuah 161(2):125–126PubMed Pekar Zlotin M et al (2022) Preparation for final board exam in obstetrics and gynecology following the outbreak of the COVID 19 pandemic. Harefuah 161(2):125–126PubMed
7.
9.
Zurück zum Zitat Arif TB, Munaf U, Ul-Haque I (2023) The future of medical education and research: Is ChatGPT a blessing or blight in disguise? Med Educ Online 28(1):2181052CrossRefPubMedPubMedCentral Arif TB, Munaf U, Ul-Haque I (2023) The future of medical education and research: Is ChatGPT a blessing or blight in disguise? Med Educ Online 28(1):2181052CrossRefPubMedPubMedCentral
10.
Zurück zum Zitat Gilson A et al (2023) How Does ChatGPT Perform on the United States medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312CrossRefPubMedPubMedCentral Gilson A et al (2023) How Does ChatGPT Perform on the United States medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312CrossRefPubMedPubMedCentral
11.
Zurück zum Zitat Kung TH et al (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2(2):e0000198CrossRefPubMedPubMedCentral Kung TH et al (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2(2):e0000198CrossRefPubMedPubMedCentral
15.
Zurück zum Zitat Giannos P, Delardas O (2023) Performance of ChatGPT on UK standardized admission tests: insights from the BMAT, TMUA, LNAT, and TSA examinations. JMIR Med Educ 9:e47737CrossRefPubMedPubMedCentral Giannos P, Delardas O (2023) Performance of ChatGPT on UK standardized admission tests: insights from the BMAT, TMUA, LNAT, and TSA examinations. JMIR Med Educ 9:e47737CrossRefPubMedPubMedCentral
16.
Zurück zum Zitat Nakhleh A, Spitzer S, Shehadeh N (2023) ChatGPT’s response to the diabetes knowledge questionnaire: implications for diabetes education. Diabetes Technol Ther 25(8):571–573CrossRefPubMed Nakhleh A, Spitzer S, Shehadeh N (2023) ChatGPT’s response to the diabetes knowledge questionnaire: implications for diabetes education. Diabetes Technol Ther 25(8):571–573CrossRefPubMed
17.
Zurück zum Zitat Subramani M, Jaleel I, Krishna Mohan S (2023) Evaluating the performance of ChatGPT in medical physiology university examination of phase I MBBS. Adv Physiol Educ 47(2):270–271CrossRefPubMed Subramani M, Jaleel I, Krishna Mohan S (2023) Evaluating the performance of ChatGPT in medical physiology university examination of phase I MBBS. Adv Physiol Educ 47(2):270–271CrossRefPubMed
18.
Zurück zum Zitat Hopkins BS et al (2023) ChatGPT versus the neurosurgical written boards: a comparative analysis of artificial intelligence/machine learning performance on neurosurgical board-style questions. J Neurosurg 1:8 Hopkins BS et al (2023) ChatGPT versus the neurosurgical written boards: a comparative analysis of artificial intelligence/machine learning performance on neurosurgical board-style questions. J Neurosurg 1:8
19.
Zurück zum Zitat Fijačko N et al (2023) Can ChatGPT Pass the life support exams without entering the american heart association course? Resuscitation 185:109732CrossRefPubMed Fijačko N et al (2023) Can ChatGPT Pass the life support exams without entering the american heart association course? Resuscitation 185:109732CrossRefPubMed
20.
Zurück zum Zitat Huh S (2023) Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study. J Educ Eval Health Prof 20:1PubMedPubMedCentral Huh S (2023) Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study. J Educ Eval Health Prof 20:1PubMedPubMedCentral
21.
Zurück zum Zitat Wang YM, Shen HW, Chen TJ (2023) Performance of ChatGPT on the pharmacist licensing examination in Taiwan. J Chin Med Assoc 86(7):653–658CrossRefPubMed Wang YM, Shen HW, Chen TJ (2023) Performance of ChatGPT on the pharmacist licensing examination in Taiwan. J Chin Med Assoc 86(7):653–658CrossRefPubMed
22.
Zurück zum Zitat Lum ZC (2023) Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT, Clin Orthop Relat Res Lum ZC (2023) Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT, Clin Orthop Relat Res
24.
Zurück zum Zitat Birkett L, Fowler T, Pullen S (2023) Performance of ChatGPT on a primary FRCA multiple choice question bank. Br J Anaesth 131(2):e34–e35CrossRefPubMed Birkett L, Fowler T, Pullen S (2023) Performance of ChatGPT on a primary FRCA multiple choice question bank. Br J Anaesth 131(2):e34–e35CrossRefPubMed
25.
Zurück zum Zitat Shay D et al (2023) Assessment of ChatGPT success with specialty medical knowledge using anaesthesiology board examination practice questions. Br J Anaesth 131(2):e31–e34CrossRefPubMed Shay D et al (2023) Assessment of ChatGPT success with specialty medical knowledge using anaesthesiology board examination practice questions. Br J Anaesth 131(2):e31–e34CrossRefPubMed
26.
Zurück zum Zitat Bhayana R, Krishna S, Bleakney RR (2023) Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 307(5):230582CrossRef Bhayana R, Krishna S, Bleakney RR (2023) Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 307(5):230582CrossRef
27.
Zurück zum Zitat Deng J, Lin Y (2023) The benefits and challenges of ChatGPT: an overview. Front Comput Intell Syst. 2(2):81–83CrossRef Deng J, Lin Y (2023) The benefits and challenges of ChatGPT: an overview. Front Comput Intell Syst. 2(2):81–83CrossRef
28.
Zurück zum Zitat Levin G et al (2023) Identifying ChatGPT-written OBGYN abstracts using a simple tool. Am J Obstet Gynecol MFM 5(6):100936CrossRefPubMed Levin G et al (2023) Identifying ChatGPT-written OBGYN abstracts using a simple tool. Am J Obstet Gynecol MFM 5(6):100936CrossRefPubMed
29.
Zurück zum Zitat Levin G et al (2023) ChatGPT-written OBGYN abstracts fool practitioners. Am J Obstet Gynecol MFM 5(8):100993CrossRefPubMed Levin G et al (2023) ChatGPT-written OBGYN abstracts fool practitioners. Am J Obstet Gynecol MFM 5(8):100993CrossRefPubMed
30.
Zurück zum Zitat Biswas S (2023) ChatGPT and the future of medical writing. Radiology 307(2):223312CrossRef Biswas S (2023) ChatGPT and the future of medical writing. Radiology 307(2):223312CrossRef
Metadaten
Titel
Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations
verfasst von
Adiel Cohen
Roie Alter
Naama Lessans
Raanan Meyer
Yoav Brezinov
Gabriel Levin
Publikationsdatum
05.09.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
Archives of Gynecology and Obstetrics / Ausgabe 6/2023
Print ISSN: 0932-0067
Elektronische ISSN: 1432-0711
DOI
https://doi.org/10.1007/s00404-023-07185-4

Weitere Artikel der Ausgabe 6/2023

Archives of Gynecology and Obstetrics 6/2023 Zur Ausgabe

Alter verschlechtert Prognose bei Endometriumkarzinom

11.05.2024 Endometriumkarzinom Nachrichten

Ein höheres Alter bei der Diagnose eines Endometriumkarzinoms ist mit aggressiveren Tumorcharakteristika assoziiert, scheint aber auch unabhängig von bekannten Risikofaktoren die Prognose der Erkrankung zu verschlimmern.

Darf man die Behandlung eines Neonazis ablehnen?

08.05.2024 Gesellschaft Nachrichten

In einer Leseranfrage in der Zeitschrift Journal of the American Academy of Dermatology möchte ein anonymer Dermatologe bzw. eine anonyme Dermatologin wissen, ob er oder sie einen Patienten behandeln muss, der eine rassistische Tätowierung trägt.

Erhöhte Mortalität bei postpartalem Brustkrebs

07.05.2024 Mammakarzinom Nachrichten

Auch für Trägerinnen von BRCA-Varianten gilt: Erkranken sie fünf bis zehn Jahre nach der letzten Schwangerschaft an Brustkrebs, ist das Sterberisiko besonders hoch.

Menopausale Hormontherapie für Frauen über 65?

07.05.2024 Klimakterium und Menopause Nachrichten

In den USA erhalten nicht wenige Frauen auch noch im Alter über 65 eine menopausale Hormontherapie. Welche positiven und negativen gesundheitlichen Konsequenzen daraus möglicherweise resultieren, wurde anhand von Versicherungsdaten analysiert.

Update Gynäkologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert – ganz bequem per eMail.