nach oben

European Archives of Oto-Rhino-Laryngology

Erschienen in:

12.09.2023 | Laryngology

Validity and reliability of an instrument evaluating the performance of intelligent chatbot: the Artificial Intelligence Performance Instrument (AIPI)

verfasst von: Jerome R. Lechien, Antonino Maniaci, Isabelle Gengler, Stephane Hans, Carlos M. Chiesa-Estomba, Luigi A. Vaira

Erschienen in: European Archives of Oto-Rhino-Laryngology | Ausgabe 4/2024

Einloggen, um Zugang zu erhalten

Abstract

Objectives

To evaluate the reliability and validity of the Artificial Intelligence Performance Instrument (AIPI).

Methods

Medical records of patients consulting in otolaryngology were evaluated by physicians and ChatGPT for differential diagnosis, management, and treatment. The ChatGPT performance was rated twice using AIPI within a 7-day period to assess test–retest reliability. Internal consistency was evaluated using Cronbach’s α. Internal validity was evaluated by comparing the AIPI scores of the clinical cases rated by ChatGPT and 2 blinded practitioners. Convergent validity was measured by comparing the AIPI score with a modified version of the Ottawa Clinical Assessment Tool (OCAT). Interrater reliability was assessed using Kendall’s tau.

Results

Forty-five patients completed the evaluations (28 females). The AIPI Cronbach’s alpha analysis suggested an adequate internal consistency (α = 0.754). The test–retest reliability was moderate-to-strong for items and the total score of AIPI (r_s = 0.486, p = 0.001). The mean AIPI score of the senior otolaryngologist was significantly higher compared to the score of ChatGPT, supporting adequate internal validity (p = 0.001). Convergent validity reported a moderate and significant correlation between AIPI and modified OCAT (r_s = 0.319; p = 0.044). The interrater reliability reported significant positive concordance between both otolaryngologists for the patient feature, diagnostic, additional examination, and treatment subscores as well as for the AIPI total score.

Conclusions

AIPI is a valid and reliable instrument in assessing the performance of ChatGPT in ear, nose and throat conditions. Future studies are needed to investigate the usefulness of AIPI in medicine and surgery, and to evaluate the psychometric properties in these fields.

Nur mit Berechtigung zugänglich

Pernencar C, Saboia I, Dias JC (2022) How far can conversational agents contribute to IBD patient health care-a review of the literature. Front Public Health 10:862432. https://doi.org/10.3389/fpubh.2022.862432CrossRefPubMedPubMedCentral

Wahlster W (2023) Understanding computational dialogue understanding. Philos Trans A Math Phys Eng Sci 381(2251):20220049. https://doi.org/10.1098/rsta.2022.0049ADSCrossRefPubMed

Hill-Yardin EL, Hutchinson MR, Laycock R, Spencer SJ (2023) A Chat(GPT) about the future of scientific publishing. Brain Behav Immun 110:152–154. https://doi.org/10.1016/j.bbi.2023.02.022CrossRefPubMed

Choi JH, Hickman KE, Monahan A, Schwarcz D (2023) ChatGPT goes to law school? Minnesota legal studies research paper No. 23-03

Mohammad B, Supti T, Alzubaidi M, Shah H, Alam T, Shah Z, Househ M (2023) The Pros and Cons of using ChatGPT in medical education: a scoping review. Stud Health Technol Inform 305:644–647. https://doi.org/10.3233/SHTI230580CrossRefPubMed

https://futureoflife.org/open-letter/pause-giant-ai-experiments/

Lechien JR, Georgescu BM, Hans S, Chiesa-Estomba CM (2023) ChatGPT performance in laryngology and head & neck surgery: a clinical case-series. Eur Arch Otorhinolaryngol

Rekman J, Hamstra SJ, Dudek N, Wood T, Seabrook C, Gofton W (2016) A new instrument for assessing resident competence in surgical clinic: the Ottawa clinic assessment tool. J Surg Educ 73(4):575–582. https://doi.org/10.1016/j.jsurg.2016.02.003CrossRefPubMed

Task Force for the Development of Student Clinical Performance Instruments, American Physical Therapy Association (2002) The development and testing of APTA clinical performance instruments. Phys Ther 82(4):329–353

10.

Chen YY, Chiu YC, Chu TS, Hsu HY, Chen HL, Wu CC, Huang TS (2022) Is the rating result reliable? A new approach to respond to a medical trainee’s concerns about the reliability of Mini-CEX assessment. J Formos Med Assoc 121(5):943–949. https://doi.org/10.1016/j.jfma.2021.07.005CrossRefPubMed

11.

Jubraj B, Patel S, Naseem I, Copp S, Karagkounis D (2017) The acute care assessment tool: pharmacy ACAT. Clin Teach 14:184e8CrossRef

12.

Braun LT, Lenzer B, Fischer MR, Schmidmaier R (2019) Complexity of clinical cases in simulated learning environments: proposalfor a scoring system. GMS J Med Educ 36(6):80. https://doi.org/10.3205/zma001288CrossRef

13.

Gercama AJ, de Haan M, van der Vleuten CPM (2000) Reliability of the Amsterdam clinical challenge scale (ACCS): a new instrument to assess the level of difficulty of patient cases in medical education. Med Educ 34(7):519–524CrossRefPubMed

14.

Lee V, Brain K, Martin J (2017) Factors influencing mini-CEX rater judgments and their practical implications: a systematic literature review. Acad Med 92(6):880–887. https://doi.org/10.1097/ACM.0000000000001537CrossRefPubMed

15.

Kogan JR, Holmboe ES, Hauer KE (2009) Tools for direct observation and assessment of clinical skills of medical trainees: a systematic review. JAMA 302(12):1316–1326. https://doi.org/10.1001/jama.2009.1365CrossRefPubMed

16.

Hoch CC, Wollenberg B, Lüers JC, Knoedler S, Knoedler L, Frank K, Cotofana S, Alfertshofer M (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol. https://doi.org/10.1007/s00405-023-08051-4CrossRefPubMedPubMedCentral

17.

Chiesa-Estomba CM, Lechien JR, Vaira LA, Brunet A, Cammaroto G, Mayo-Yanez M, Sanchez-Barrueco A, Saga-Gutierrez C (2023) Exploring the potential of Chat-GPT as a supportive tool for sialendoscopy clinical decision making and patient information support. Eur Arch Otorhinolaryngol. https://doi.org/10.1007/s00405-023-08104-8CrossRefPubMed

18.

Hayois L, Dunsmore A (2023) Common and serious ENT presentations in primary care. InnovAiT 16(2):79–86. https://doi.org/10.1177/17557380221140131CrossRef

19.

Hannaford PC, Simpson JA, Bisset AF, Davis A, McKerrow W, Mills R (2005) The prevalence of ear, nose and throat problems in the community: results from a national cross-sectional postal survey in Scotland. Fam Pract 22(3):227–233. https://doi.org/10.1093/fampra/cmi004CrossRefPubMed

20.

Vasileiou I, Giannopoulos A, Klonaris C, Vlasis K, Marinos S, Koutsonasios I, Katsargyris A, Konstantopoulos K, Karamoutsos C, Tsitsikas A, Marinos G (2009) The potential role of primary care in the management of common ear, nose or throat disorders presenting to the emergency department in Greece. Qual Prim Care 17(2):145–148PubMed

21.

Millstein J, Agarwal A (2023) What can doctors and patients do with ChatGPT? | Expert Opinion. Philadelphia Inquirer

Titel: Validity and reliability of an instrument evaluating the performance of intelligent chatbot: the Artificial Intelligence Performance Instrument (AIPI)
verfasst von: Jerome R. Lechien
Antonino Maniaci
Isabelle Gengler
Stephane Hans
Carlos M. Chiesa-Estomba
Luigi A. Vaira
Publikationsdatum: 12.09.2023
Verlag: Springer Berlin Heidelberg
Erschienen in: European Archives of Oto-Rhino-Laryngology / Ausgabe 4/2024
Print ISSN: 0937-4477
Elektronische ISSN: 1434-4726
DOI: https://doi.org/10.1007/s00405-023-08219-y

Neu im Fachgebiet HNO

Betalaktam-Allergie: praxisnahes Vorgehen beim Delabeling

16.05.2024 Pädiatrische Allergologie Nachrichten

Die große Mehrheit der vermeintlichen Penicillinallergien sind keine. Da das „Etikett“ Betalaktam-Allergie oft schon in der Kindheit erworben wird, kann ein frühzeitiges Delabeling lebenslange Vorteile bringen. Ein Team von Pädiaterinnen und Pädiatern aus Kanada stellt vor, wie sie dabei vorgehen.

Eingreifen von Umstehenden rettet vor Erstickungstod

15.05.2024 Fremdkörperaspiration Nachrichten

Wer sich an einem Essensrest verschluckt und um Luft ringt, benötigt vor allem rasche Hilfe. Dass Umstehende nur in jedem zweiten Erstickungsnotfall bereit waren, diese zu leisten, ist das ernüchternde Ergebnis einer Beobachtungsstudie aus Japan. Doch es gibt auch eine gute Nachricht.

Real-World-Daten sprechen eher für Dupilumab als für Op.

14.05.2024 Rhinosinusitis Nachrichten

Zur Behandlung schwerer Formen der chronischen Rhinosinusitis mit Nasenpolypen (CRSwNP) stehen seit Kurzem verschiedene Behandlungsmethoden zur Verfügung, darunter Biologika, wie Dupilumab, und die endoskopische Sinuschirurgie (ESS). Beim Vergleich der beiden Therapieoptionen war Dupilumab leicht im Vorteil.

Schwindelursache: Massagepistole lässt Otholiten tanzen

14.05.2024 Benigner Lagerungsschwindel Nachrichten

Wenn jüngere Menschen über ständig rezidivierenden Lagerungsschwindel klagen, könnte eine Massagepistole der Auslöser sein. In JAMA Otolaryngology warnt ein Team vor der Anwendung hochpotenter Geräte im Bereich des Nackens.

Update HNO

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert – ganz bequem per eMail.

Newsletter bestellen

Die Highlights vom Kongress des American College of Cardiology 2024

Springer Medizin

Validity and reliability of an instrument evaluating the performance of intelligent chatbot: the Artificial Intelligence Performance Instrument (AIPI)