nach oben

European Archives of Oto-Rhino-Laryngology

Erschienen in:

20.12.2023 | Miscellaneous

Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard

verfasst von: Ryan Chin Taw Cheong, Kenny Peter Pang, Samit Unadkat, Venkata Mcneillis, Andrew Williamson, Jonathan Joseph, Premjit Randhawa, Peter Andrews, Vinidh Paleri

Erschienen in: European Archives of Oto-Rhino-Laryngology | Ausgabe 4/2024

Einloggen, um Zugang zu erhalten

Abstract

Purpose

To conduct a comparative performance evaluation of GPT-3.5, GPT-4 and Google Bard in self-assessment questions at the level of the American Sleep Medicine Certification Board Exam.

Methods

A total of 301 text-based single-best-answer multiple choice questions with four answer options each, across 10 categories, were included in the study and transcribed as inputs for GPT-3.5, GPT-4 and Google Bard. The first output responses generated were selected and matched for answer accuracy against the gold-standard answer provided by the American Academy of Sleep Medicine for each question. A global score of 80% and above is required by human sleep medicine specialists to pass each exam category.

Results

GPT-4 successfully achieved the pass mark of 80% or above in five of the 10 exam categories, including the Normal Sleep and Variants Self-Assessment Exam (2021), Circadian Rhythm Sleep–Wake Disorders Self-Assessment Exam (2021), Insomnia Self-Assessment Exam (2022), Parasomnias Self-Assessment Exam (2022) and the Sleep-Related Movements Self-Assessment Exam (2023). GPT-4 demonstrated superior performance in all exam categories and achieved a higher overall score of 68.1% when compared against both GPT-3.5 (46.8%) and Google Bard (45.5%), which was statistically significant (p value < 0.001). There was no significant difference in the overall score performance between GPT-3.5 and Google Bard.

Conclusions

Otolaryngologists and sleep medicine physicians have a crucial role through agile and robust research to ensure the next generation AI chatbots are built safely and responsibly.

AI Principles-Future of Life Institute [Internet]. [cited 2023 Aug 9]. https://futureoflife.org/open-letter/ai-principles/

OpenAI (2022) Introducing ChatGPT [Internet]. OpenAI.com. 2022 [cited 2023 Jul 6]. p. 1–11. https://openai.com/blog/chatgpt

Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, et al (2023) Sparks of artificial general intelligence: early experiments with GPT-4. http://arxiv.org/abs/2303.12712

Google AI updates: Bard and new AI features in Search [Internet]. Google. 2023 [cited 2023 Jul 6]. https://blog.google/technology/ai/bard-google-ai-search-updates/

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C et al (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Heal 2(2):e0000198CrossRef

Skalidis I, Cagnina A, Luangphiphat W, Mahendiran T, Muller O, Abbe E et al (2023) ChatGPT takes on the European Exam in Core Cardiology: an artificial intelligence success story? Eur Hear J Digit Heal [Internet] 4(3):279–281. https://doi.org/10.1093/ehjdh/ztad029CrossRef

Bhayana R, Bleakney RR, Krishna S (2023) GPT-4 in radiology: improvements in advanced reasoning. Radiology 307(5):4–6CrossRef

Hoch CC, Wollenberg B, Lüers JC, Knoedler S, Knoedler L, Frank K et al (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Oto-Rhino-Laryngology [Internet]. https://doi.org/10.1007/s00405-023-08051-4CrossRef

Antaki F, Touma S, Milad D, El-Khoury J, Duval R (2023) Evaluating the performance of ChatGPT in ophthalmology. Ophthalmol Sci Internet. 3(4):100324. https://doi.org/10.1016/j.xops.2023.100324CrossRef

10.

Kumah-Crystal Y, Mankowitz S, Embi P, Lehmann CU (2023) ChatGPT and the clinical informatics board examination: the end of unproctored maintenance of certification? J Am Med Informatics Assoc. 30(9):1558–1560

11.

Birkett L, Fowler T, Pullen S (2023) Performance of ChatGPT on a primary FRCA multiple choice question bank. Br J Anaesth [Internet]. 2023 Aug 1 [cited 2023 Aug 11];131(2):e34–5. http://www.bjanaesthesia.org.uk/article/S0007091223002003/fulltext

12.

Yu PK, Gadkaree SK, Li J, McCarty JC, Huyett P, Bergmark RW (2021) Characteristics of the dual board-certified sleep otolaryngology workforce. Laryngoscope [Internet] 131(10):E2712–E2717. https://doi.org/10.1002/lary.29725CrossRefPubMed

13.

Quan SF, Buysse DJ, Davidson Ward SL, Harding SM, Iber C, Kapur VK et al (2012) Development and growth of a large multispecialty certification examination: Sleep medicine certification—results of the first three examinations. J Clin Sleep Med 8(2):221–224CrossRefPubMedPubMedCentral

14.

Grandner MA, Fernandez FX (2021) The translational neuroscience of sleep: a contextual framework. Science (80-) 374(6567):568–573ADSCrossRef

15.

Benjafield AV, Ayas NT, Eastwood PR, Heinzer R, Ip MSM, Morrell MJ et al (2019) Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis. Lancet Respir Med 7(8):687–698CrossRefPubMedPubMedCentral

16.

Stoller MK. Economic effects of insomnia-PubMed [Internet]. [cited 2023 Aug 12]. https://pubmed.ncbi.nlm.nih.gov/7859246/

17.

Marin JM, Carrizo SJ, Vicente E, Agusti AGN (2005) Long-term cardiovascular outcomes in men with obstructive sleep apnoea-hypopnoea with or without treatment with continuous positive airway pressure: an observational study. Lancet 365(9464):1046–1053CrossRefPubMed

18.

Lloyd-Jones DM, Allen NB, Anderson CAM, Black T, Brewer LC, Foraker RE et al (2022) Life’s essential 8: updating and enhancing the american heart association’s construct of cardiovascular health: a presidential advisory from the American Heart Association. Circulation 146(5):E18-43CrossRefPubMedPubMedCentral

19.

Maintenance of Certification for Sleep Medicine | AASM MOC Program [Internet]. [cited 2023 Aug 13]. https://aasm.org/professional-development/maintenance-of-certification/

20.

Yu PK, Gadkaree SK, Li J, McCarty JC, Huyett P, Bergmark RW (2021) Characteristics of the dual board-certified sleep otolaryngology workforce. Laryngoscope [Internet]. 2021 Oct 1 [cited 2023 Aug 15];131(10):E2712–7. https://pubmed.ncbi.nlm.nih.gov/34216147/

21.

Roche J, Rae DE, Redman KN, Knutson KL, von Schantz M, Gómez-Olivé FX, et al (2021) Sleep disorders in low- and middle-income countries: a call for action. J Clin Sleep Med [Internet]. 2021 Nov 1 [cited 2023 Aug 15];17(11):2341–2. https://pubmed.ncbi.nlm.nih.gov/34666888/

22.

Global Medical Education Market Report 2023: Sector is Expected to Reach $65.92 Billion by 2028 at a CAGR of 6.6% - ResearchAndMarkets.com | Business Wire [Internet]. [cited 2023 Aug 15]. https://www.businesswire.com/news/home/20230608005458/en/Global-Medical-Education-Market-Report-2023-Sector-is-Expected-to-Reach-65.92-Billion-by-2028-at-a-CAGR-of-6.6---ResearchAndMarkets.com

23.

Khan RA, Jawaid M, Khan AR, Sajjad M (2023) ChatGPT-Reshaping medical education and clinical management. Pakistan J Med Sci 39(2):605–607

24.

Maintenance of Certification for Sleep Medicine | AASM MOC Program [Internet]. [cited 2023 Aug 16]. https://aasm.org/professional-development/maintenance-of-certification/

25.

Online Exam Rules and Settings | ESRS [Internet]. [cited 2023 Aug 16]. https://esrs.eu/sleep-medicine-examination/online-exam-rules-and-settings/

26.

Apply for Exam | [Internet]. [cited 2023 Aug 16]. https://worldsleepsociety.org/examination/application/

27.

Susnjak T (2022) ChatGPT: the end of online exam integrity?, pp 1–21. http://arxiv.org/abs/2212.09292

28.

Nori H, King N, McKinney SM, Carignan D, Horvitz E (2023) Capabilities of GPT-4 on medical challenge problems, pp 1–35. http://arxiv.org/abs/2303.13375

29.

Yu PK, Gadkaree SK, Li J, McCarty JC, Huyett P, Bergmark RW (2021) Characteristics of the dual board-certified sleep otolaryngology workforce. Laryngoscope 131(10):E2712–E2717CrossRefPubMed

30.

Sleep Medicine Examination-Requirements and Application for Somnologists | ESRS [Internet]. [cited 2023 Aug 16]. https://esrs.eu/sleep-medicine-examination/requirements/somnologists/

31.

Reeder K, Lee H (2022) Impact of artificial intelligence on US medical students’ choice of radiology. Clin Imaging 1(81):67–71CrossRef

32.

How Much Does ChatGPT Cost to Run? $700K/day, Per Analyst [Internet]. [cited 2023 Aug 18]. https://www.businessinsider.com/how-much-chatgpt-costs-openai-to-run-estimate-report-2023-4?r=US&IR=T

33.

Oosthuizen RM (2022) The fourth industrial revolution—smart technology, artificial intelligence, robotics and algorithms: industrial psychologists in future workplaces. Front Artif Intell 5(July):1–13

34.

Ali MR, Lawson CA, Wood AM, Khunti K (2023) Addressing ethnic and global health inequalities in the era of artificial intelligence healthcare models: a call for responsible implementation. J R Soc Med 116:1–3CrossRef

35.

Google I/O 2023: Making AI more helpful for everyone [Internet]. [cited 2023 Aug 18]. https://blog.google/technology/ai/google-io-2023-keynote-sundar-pichai/#ai-products

36.

Statement on AI Risk | CAIS [Internet]. [cited 2023 Aug 18]. https://www.safe.ai/statement-on-ai-risk#open-letter

37.

Pause Giant AI Experiments: An Open Letter-Future of Life Institute [Internet]. [cited 2023 Aug 19]. https://futureoflife.org/open-letter/pause-giant-ai-experiments/

38.

The Lancet (2023) AI in medicine: creating a safe and equitable future. Lancet [Internet]. 2023 Aug 12 [cited 2023 Aug 18];402(10401):503. http://www.thelancet.com/article/S0140673623016689/fulltext

Titel: Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard
verfasst von: Ryan Chin Taw Cheong
Kenny Peter Pang
Samit Unadkat
Venkata Mcneillis
Andrew Williamson
Jonathan Joseph
Premjit Randhawa
Peter Andrews
Vinidh Paleri
Publikationsdatum: 20.12.2023
Verlag: Springer Berlin Heidelberg
Erschienen in: European Archives of Oto-Rhino-Laryngology / Ausgabe 4/2024
Print ISSN: 0937-4477
Elektronische ISSN: 1434-4726
DOI: https://doi.org/10.1007/s00405-023-08381-3

Update HNO

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert – ganz bequem per eMail.

Newsletter bestellen

Springer Medizin

Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard