Skip to main content
Erschienen in: European Archives of Oto-Rhino-Laryngology 4/2024

20.12.2023 | Miscellaneous

Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard

verfasst von: Ryan Chin Taw Cheong, Kenny Peter Pang, Samit Unadkat, Venkata Mcneillis, Andrew Williamson, Jonathan Joseph, Premjit Randhawa, Peter Andrews, Vinidh Paleri

Erschienen in: European Archives of Oto-Rhino-Laryngology | Ausgabe 4/2024

Einloggen, um Zugang zu erhalten

Abstract

Purpose

To conduct a comparative performance evaluation of GPT-3.5, GPT-4 and Google Bard in self-assessment questions at the level of the American Sleep Medicine Certification Board Exam.

Methods

A total of 301 text-based single-best-answer multiple choice questions with four answer options each, across 10 categories, were included in the study and transcribed as inputs for GPT-3.5, GPT-4 and Google Bard. The first output responses generated were selected and matched for answer accuracy against the gold-standard answer provided by the American Academy of Sleep Medicine for each question. A global score of 80% and above is required by human sleep medicine specialists to pass each exam category.

Results

GPT-4 successfully achieved the pass mark of 80% or above in five of the 10 exam categories, including the Normal Sleep and Variants Self-Assessment Exam (2021), Circadian Rhythm Sleep–Wake Disorders Self-Assessment Exam (2021), Insomnia Self-Assessment Exam (2022), Parasomnias Self-Assessment Exam (2022) and the Sleep-Related Movements Self-Assessment Exam (2023). GPT-4 demonstrated superior performance in all exam categories and achieved a higher overall score of 68.1% when compared against both GPT-3.5 (46.8%) and Google Bard (45.5%), which was statistically significant (p value < 0.001). There was no significant difference in the overall score performance between GPT-3.5 and Google Bard.

Conclusions

Otolaryngologists and sleep medicine physicians have a crucial role through agile and robust research to ensure the next generation AI chatbots are built safely and responsibly.
Literatur
5.
Zurück zum Zitat Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C et al (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Heal 2(2):e0000198CrossRef Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C et al (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Heal 2(2):e0000198CrossRef
7.
Zurück zum Zitat Bhayana R, Bleakney RR, Krishna S (2023) GPT-4 in radiology: improvements in advanced reasoning. Radiology 307(5):4–6CrossRef Bhayana R, Bleakney RR, Krishna S (2023) GPT-4 in radiology: improvements in advanced reasoning. Radiology 307(5):4–6CrossRef
8.
Zurück zum Zitat Hoch CC, Wollenberg B, Lüers JC, Knoedler S, Knoedler L, Frank K et al (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Oto-Rhino-Laryngology [Internet]. https://doi.org/10.1007/s00405-023-08051-4CrossRef Hoch CC, Wollenberg B, Lüers JC, Knoedler S, Knoedler L, Frank K et al (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Oto-Rhino-Laryngology [Internet]. https://​doi.​org/​10.​1007/​s00405-023-08051-4CrossRef
10.
Zurück zum Zitat Kumah-Crystal Y, Mankowitz S, Embi P, Lehmann CU (2023) ChatGPT and the clinical informatics board examination: the end of unproctored maintenance of certification? J Am Med Informatics Assoc. 30(9):1558–1560 Kumah-Crystal Y, Mankowitz S, Embi P, Lehmann CU (2023) ChatGPT and the clinical informatics board examination: the end of unproctored maintenance of certification? J Am Med Informatics Assoc. 30(9):1558–1560
13.
Zurück zum Zitat Quan SF, Buysse DJ, Davidson Ward SL, Harding SM, Iber C, Kapur VK et al (2012) Development and growth of a large multispecialty certification examination: Sleep medicine certification—results of the first three examinations. J Clin Sleep Med 8(2):221–224CrossRefPubMedPubMedCentral Quan SF, Buysse DJ, Davidson Ward SL, Harding SM, Iber C, Kapur VK et al (2012) Development and growth of a large multispecialty certification examination: Sleep medicine certification—results of the first three examinations. J Clin Sleep Med 8(2):221–224CrossRefPubMedPubMedCentral
14.
Zurück zum Zitat Grandner MA, Fernandez FX (2021) The translational neuroscience of sleep: a contextual framework. Science (80-) 374(6567):568–573ADSCrossRef Grandner MA, Fernandez FX (2021) The translational neuroscience of sleep: a contextual framework. Science (80-) 374(6567):568–573ADSCrossRef
15.
Zurück zum Zitat Benjafield AV, Ayas NT, Eastwood PR, Heinzer R, Ip MSM, Morrell MJ et al (2019) Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis. Lancet Respir Med 7(8):687–698CrossRefPubMedPubMedCentral Benjafield AV, Ayas NT, Eastwood PR, Heinzer R, Ip MSM, Morrell MJ et al (2019) Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis. Lancet Respir Med 7(8):687–698CrossRefPubMedPubMedCentral
17.
Zurück zum Zitat Marin JM, Carrizo SJ, Vicente E, Agusti AGN (2005) Long-term cardiovascular outcomes in men with obstructive sleep apnoea-hypopnoea with or without treatment with continuous positive airway pressure: an observational study. Lancet 365(9464):1046–1053CrossRefPubMed Marin JM, Carrizo SJ, Vicente E, Agusti AGN (2005) Long-term cardiovascular outcomes in men with obstructive sleep apnoea-hypopnoea with or without treatment with continuous positive airway pressure: an observational study. Lancet 365(9464):1046–1053CrossRefPubMed
18.
Zurück zum Zitat Lloyd-Jones DM, Allen NB, Anderson CAM, Black T, Brewer LC, Foraker RE et al (2022) Life’s essential 8: updating and enhancing the american heart association’s construct of cardiovascular health: a presidential advisory from the American Heart Association. Circulation 146(5):E18-43CrossRefPubMedPubMedCentral Lloyd-Jones DM, Allen NB, Anderson CAM, Black T, Brewer LC, Foraker RE et al (2022) Life’s essential 8: updating and enhancing the american heart association’s construct of cardiovascular health: a presidential advisory from the American Heart Association. Circulation 146(5):E18-43CrossRefPubMedPubMedCentral
21.
23.
Zurück zum Zitat Khan RA, Jawaid M, Khan AR, Sajjad M (2023) ChatGPT-Reshaping medical education and clinical management. Pakistan J Med Sci 39(2):605–607 Khan RA, Jawaid M, Khan AR, Sajjad M (2023) ChatGPT-Reshaping medical education and clinical management. Pakistan J Med Sci 39(2):605–607
29.
Zurück zum Zitat Yu PK, Gadkaree SK, Li J, McCarty JC, Huyett P, Bergmark RW (2021) Characteristics of the dual board-certified sleep otolaryngology workforce. Laryngoscope 131(10):E2712–E2717CrossRefPubMed Yu PK, Gadkaree SK, Li J, McCarty JC, Huyett P, Bergmark RW (2021) Characteristics of the dual board-certified sleep otolaryngology workforce. Laryngoscope 131(10):E2712–E2717CrossRefPubMed
31.
Zurück zum Zitat Reeder K, Lee H (2022) Impact of artificial intelligence on US medical students’ choice of radiology. Clin Imaging 1(81):67–71CrossRef Reeder K, Lee H (2022) Impact of artificial intelligence on US medical students’ choice of radiology. Clin Imaging 1(81):67–71CrossRef
33.
Zurück zum Zitat Oosthuizen RM (2022) The fourth industrial revolution—smart technology, artificial intelligence, robotics and algorithms: industrial psychologists in future workplaces. Front Artif Intell 5(July):1–13 Oosthuizen RM (2022) The fourth industrial revolution—smart technology, artificial intelligence, robotics and algorithms: industrial psychologists in future workplaces. Front Artif Intell 5(July):1–13
34.
Zurück zum Zitat Ali MR, Lawson CA, Wood AM, Khunti K (2023) Addressing ethnic and global health inequalities in the era of artificial intelligence healthcare models: a call for responsible implementation. J R Soc Med 116:1–3CrossRef Ali MR, Lawson CA, Wood AM, Khunti K (2023) Addressing ethnic and global health inequalities in the era of artificial intelligence healthcare models: a call for responsible implementation. J R Soc Med 116:1–3CrossRef
Metadaten
Titel
Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard
verfasst von
Ryan Chin Taw Cheong
Kenny Peter Pang
Samit Unadkat
Venkata Mcneillis
Andrew Williamson
Jonathan Joseph
Premjit Randhawa
Peter Andrews
Vinidh Paleri
Publikationsdatum
20.12.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
European Archives of Oto-Rhino-Laryngology / Ausgabe 4/2024
Print ISSN: 0937-4477
Elektronische ISSN: 1434-4726
DOI
https://doi.org/10.1007/s00405-023-08381-3

Weitere Artikel der Ausgabe 4/2024

European Archives of Oto-Rhino-Laryngology 4/2024 Zur Ausgabe

Ein Drittel der jungen Ärztinnen und Ärzte erwägt abzuwandern

07.05.2024 Medizinstudium Nachrichten

Extreme Arbeitsverdichtung und kaum Supervision: Dr. Andrea Martini, Sprecherin des Bündnisses Junge Ärztinnen und Ärzte (BJÄ) über den Frust des ärztlichen Nachwuchses und die Vorteile des Rucksack-Modells.

Nur selten Nachblutungen nach Abszesstonsillektomie

03.05.2024 Tonsillektomie Nachrichten

In einer Metaanalyse von 18 Studien war die Rate von Nachblutungen nach einer Abszesstonsillektomie mit weniger als 7% recht niedrig. Nur rund 2% der Behandelten mussten nachoperiert werden. Die Therapie scheint damit recht sicher zu sein.

Rezidivierender Peritonsillarabszess nach Oralsex

02.05.2024 Peritonsillarabszess Kasuistik

Die erotischen Dimensionen von Peritonsillarabszessen scheinen eng begrenzt zu sein. Das heißt aber nicht, solche Abszesse und Erotik hätten nichts miteinander gemein, wie ein Fallbericht verdeutlicht.

Endlich: Zi zeigt, mit welchen PVS Praxen zufrieden sind

IT für Ärzte Nachrichten

Darauf haben viele Praxen gewartet: Das Zi hat eine Liste von Praxisverwaltungssystemen veröffentlicht, die von Nutzern positiv bewertet werden. Eine gute Grundlage für wechselwillige Ärztinnen und Psychotherapeuten.

Update HNO

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert – ganz bequem per eMail.