nach oben

Archives of Gynecology and Obstetrics

Erschienen in:

05.09.2023 | General Gynecology

Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations

verfasst von: Adiel Cohen, Roie Alter, Naama Lessans, Raanan Meyer, Yoav Brezinov, Gabriel Levin

Erschienen in: Archives of Gynecology and Obstetrics | Ausgabe 6/2023

Einloggen, um Zugang zu erhalten

Abstract

Purpose

Previous studies of ChatGPT performance in the field of medical examinations have reached contradictory results. Moreover, the performance of ChatGPT in other languages other than English is yet to be explored. We aim to study the performance of ChatGPT in Hebrew OBGYN-‘Shlav-Alef’ (Phase 1) examination.

Methods

A performance study was conducted using a consecutive sample of text-based multiple choice questions, originated from authentic Hebrew OBGYN-‘Shlav-Alef’ examinations in 2021–2022. We constructed 150 multiple choice questions from consecutive text-based-only original questions. We compared the performance of ChatGPT performance to the real-life actual performance of OBGYN residents who completed the tests in 2021–2022. We also compared ChatGTP Hebrew performance vs. previously published English medical tests.

Results

In 2021–2022, 27.8% of OBGYN residents failed the ‘Shlav-Alef’ examination and the mean score of the residents was 68.4. Overall, 150 authentic questions were evaluated (one examination). ChatGPT correctly answered 58 questions (38.7%) and reached a failed score. The performance of Hebrew ChatGPT was lower when compared to actual performance of residents: 38.7% vs. 68.4%, p < .001. In a comparison to ChatGPT performance in 9,091 English language questions in the field of medicine, the performance of Hebrew ChatGPT was lower (38.7% in Hebrew vs. 60.7% in English, p < .001).

Conclusions

ChatGPT answered correctly on less than 40% of Hebrew OBGYN resident examination questions. Residents cannot rely on ChatGPT for the preparation of this examination. Efforts should be made to improve ChatGPT performance in other languages besides English.

Cox SM et al (1994) Assessment of the resident in-training examination in obstetrics and gynecology. Obstet Gynecol 84(6):1051–1054PubMed

Hollier LM et al (2002) Effect of a resident-created study guide on examination scores. Obstet Gynecol 99(1):95–100PubMed

Withiam-Leitch M, Olawaiye A (2008) Resident performance on the in-training and board examinations in obstetrics and gynecology: implications for the ACGME outcome project. Teach Learn Med 20(2):136–142CrossRefPubMed

Association IM Residency information booklet. Available at: https://www.ima.org.il/internesnew/viewcategory.aspx?categoryid=7016#.UnoBaEoUGJA. Accessed 22 August 2023

Pekar Zlotin M et al (2022) Preparation for final board exam in obstetrics and gynecology following the outbreak of the COVID 19 pandemic. Harefuah 161(2):125–126PubMed

Soong TK, Ho CM (2021) Artificial Intelligence in medical OSCEs: reflections and future developments. Adv Med Educ Pract 12:167–173CrossRefPubMedPubMedCentral

van Dis EAM et al (2023) ChatGPT: five priorities for research. Nature 614(7947):224–226CrossRefPubMed

ChatGPT, Available at: https://openai.com/blog/chatgpt. Accessed 22 August 2023

Arif TB, Munaf U, Ul-Haque I (2023) The future of medical education and research: Is ChatGPT a blessing or blight in disguise? Med Educ Online 28(1):2181052CrossRefPubMedPubMedCentral

10.

Gilson A et al (2023) How Does ChatGPT Perform on the United States medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312CrossRefPubMedPubMedCentral

11.

Kung TH et al (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2(2):e0000198CrossRefPubMedPubMedCentral

12.

Humar P, et al (2023) ChatGPT is Equivalent to First Year Plastic Surgery Residents: Evaluation of ChatGPT on the Plastic Surgery In-Service Exam. Aesthet Surg J sjad130. https://doi.org/10.1093/asj/sjad130

13.

Gupta R, et al (2023) Performance of ChatGPT on the Plastic Surgery Inservice Training Examination. Aesthet Surg J sjad128. https://doi.org/10.1093/asj/sjad128

14.

Mihalache A, Popovic MM, Muni RH (2023) Performance of an artificial intelligence Chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. https://doi.org/10.1001/jamaophthalmol.2023.2754CrossRefPubMed

15.

Giannos P, Delardas O (2023) Performance of ChatGPT on UK standardized admission tests: insights from the BMAT, TMUA, LNAT, and TSA examinations. JMIR Med Educ 9:e47737CrossRefPubMedPubMedCentral

16.

Nakhleh A, Spitzer S, Shehadeh N (2023) ChatGPT’s response to the diabetes knowledge questionnaire: implications for diabetes education. Diabetes Technol Ther 25(8):571–573CrossRefPubMed

17.

Subramani M, Jaleel I, Krishna Mohan S (2023) Evaluating the performance of ChatGPT in medical physiology university examination of phase I MBBS. Adv Physiol Educ 47(2):270–271CrossRefPubMed

18.

Hopkins BS et al (2023) ChatGPT versus the neurosurgical written boards: a comparative analysis of artificial intelligence/machine learning performance on neurosurgical board-style questions. J Neurosurg 1:8

19.

Fijačko N et al (2023) Can ChatGPT Pass the life support exams without entering the american heart association course? Resuscitation 185:109732CrossRefPubMed

20.

Huh S (2023) Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study. J Educ Eval Health Prof 20:1PubMedPubMedCentral

21.

Wang YM, Shen HW, Chen TJ (2023) Performance of ChatGPT on the pharmacist licensing examination in Taiwan. J Chin Med Assoc 86(7):653–658CrossRefPubMed

22.

Lum ZC (2023) Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT, Clin Orthop Relat Res

23.

Suchman K, Garg S, Trindade AJ (2023) ChatGPT Fails the multiple-choice American college of gastroenterology self-assessment test. Am J Gastroenterol. https://doi.org/10.14309/ajg.0000000000002320CrossRefPubMed

24.

Birkett L, Fowler T, Pullen S (2023) Performance of ChatGPT on a primary FRCA multiple choice question bank. Br J Anaesth 131(2):e34–e35CrossRefPubMed

25.

Shay D et al (2023) Assessment of ChatGPT success with specialty medical knowledge using anaesthesiology board examination practice questions. Br J Anaesth 131(2):e31–e34CrossRefPubMed

26.

Bhayana R, Krishna S, Bleakney RR (2023) Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 307(5):230582CrossRef

27.

Deng J, Lin Y (2023) The benefits and challenges of ChatGPT: an overview. Front Comput Intell Syst. 2(2):81–83CrossRef

28.

Levin G et al (2023) Identifying ChatGPT-written OBGYN abstracts using a simple tool. Am J Obstet Gynecol MFM 5(6):100936CrossRefPubMed

29.

Levin G et al (2023) ChatGPT-written OBGYN abstracts fool practitioners. Am J Obstet Gynecol MFM 5(8):100993CrossRefPubMed

30.

Biswas S (2023) ChatGPT and the future of medical writing. Radiology 307(2):223312CrossRef

Titel: Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations
verfasst von: Adiel Cohen
Roie Alter
Naama Lessans
Raanan Meyer
Yoav Brezinov
Gabriel Levin
Publikationsdatum: 05.09.2023
Verlag: Springer Berlin Heidelberg
Erschienen in: Archives of Gynecology and Obstetrics / Ausgabe 6/2023
Print ISSN: 0932-0067
Elektronische ISSN: 1432-0711
DOI: https://doi.org/10.1007/s00404-023-07185-4

Update Gynäkologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert – ganz bequem per eMail.

Newsletter bestellen

Klimawandel und Gesundheit: Was Sie in der Hausarztpraxis wissen müssen und tun können

Springer Medizin

Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations

Abstract

Purpose

Methods

Results

Conclusions

Neu im Fachgebiet Gynäkologie und Geburtshilfe

Alter verschlechtert Prognose bei Endometriumkarzinom

Darf man die Behandlung eines Neonazis ablehnen?

Erhöhte Mortalität bei postpartalem Brustkrebs

Menopausale Hormontherapie für Frauen über 65?

Update Gynäkologie

Klimawandel und Gesundheit: Was Sie in der Hausarztpraxis wissen müssen und tun können

Springer Medizin

Abstract

Purpose

Methods

Results

Conclusions

Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten

Weitere Artikel der Ausgabe 6/2023

Reply to: "Letter to the Editor to: Does vegan diet influence umbilical cord vitamin B12, folate, and ferritin levels?"

Perineal suture to maintain pessary for pelvic organ prolapse: some questions

Antenatal prediction models for outcomes of extremely and very preterm infants based on machine learning

Current landscape of hospital information systems in gynecology and obstetrics in Germany: a survey of the commission Digital Medicine of the German Society for Gynecology and Obstetrics

Risk of postpartum depressive symptoms is influenced by psychological burden related to the COVID-19 pandemic and dependent of individual stress coping

Response to Micha et al. (2022) talc powder and ovarian cancer: what is the evidence?

Neu im Fachgebiet Gynäkologie und Geburtshilfe

Alter verschlechtert Prognose bei Endometriumkarzinom

Darf man die Behandlung eines Neonazis ablehnen?

Erhöhte Mortalität bei postpartalem Brustkrebs

Menopausale Hormontherapie für Frauen über 65?

Update Gynäkologie