nach oben

Erschienen in:

01.12.2023 | Original Paper

Evaluating the Performance of Different Large Language Models on Health Consultation and Patient Education in Urolithiasis

verfasst von: Haifeng Song, Yi Xia, Zhichao Luo, Hui Liu, Yan Song, Xue Zeng, Tianjie Li, Guangxin Zhong, Jianxing Li, Ming Chen, Guangyuan Zhang, Bo Xiao

Erschienen in: Journal of Medical Systems | Ausgabe 1/2023

Einloggen, um Zugang zu erhalten

Abstract

Objectives

To evaluate the effectiveness of four large language models (LLMs) (Claude, Bard, ChatGPT4, and New Bing) that have large user bases and significant social attention, in the context of medical consultation and patient education in urolithiasis.

Materials and methods

In this study, we developed a questionnaire consisting of 21 questions and 2 clinical scenarios related to urolithiasis. Subsequently, clinical consultations were simulated for each of the four models to assess their responses to the questions. Urolithiasis experts then evaluated the model responses in terms of accuracy, comprehensiveness, ease of understanding, human care, and clinical case analysis ability based on a predesigned 5-point Likert scale. Visualization and statistical analyses were then employed to compare the four models and evaluate their performance.

Results

All models yielded satisfying performance, except for Bard, who failed to provide a valid response to Question 13. Claude consistently scored the highest in all dimensions compared with the other three models. ChatGPT4 ranked second in accuracy, with a relatively stable output across multiple tests, but shortcomings were observed in empathy and human caring. Bard exhibited the lowest accuracy and overall performance. Claude and ChatGPT4 both had a high capacity to analyze clinical cases of urolithiasis. Overall, Claude emerged as the best performer in urolithiasis consultations and education.

Conclusion

Claude demonstrated superior performance compared with the other three in urolithiasis consultation and education. This study highlights the remarkable potential of LLMs in medical health consultations and patient education, although professional review, further evaluation, and modifications are still required.

Nur mit Berechtigung zugänglich

Raheem OA, Khandwala YS, Sur RL, Ghani KR, Denstedt JD. Burden of Urolithiasis: Trends in Prevalence, Treatments, and Costs. Eur Urol Focus. 2017;3(1):18-26. https://doi.org/10.1016/j.euf.2017.04.001CrossRefPubMed

Zeng G, Zhu W, Robertson WG, Penniston KL, Smith D, Pozdzik A, et al. International Alliance of Urolithiasis (IAU) guidelines on the metabolic evaluation and medical management of urolithiasis. Urolithiasis. 2022;51(1):4. https://doi.org/10.1007/s00240-022-01387-2CrossRefPubMed

Geraghty RM, Davis NF, Tzelves L, Lombardo R, Yuan C, Thomas K, et al. Best Practice in Interventional Management of Urolithiasis: An Update from the European Association of Urology Guidelines Panel for Urolithiasis 2022. Eur Urol Focus. 2023;9(1):199-208. https://doi.org/10.1016/j.euf.2022.06.014CrossRefPubMed

Croft P, Altman DG, Deeks JJ, Dunn KM, Hay AD, Hemingway H, et al. The science of clinical practice: disease diagnosis or patient prognosis? Evidence about "what is likely to happen" should shape clinical practice. BMC Med. 2015;13:20. https://doi.org/10.1186/s12916-014-0265-4CrossRefPubMedPubMedCentral

Baatiah NY, Alhazmi RB, Albathi FA, Albogami EG, Mohammedkhalil AK, Alsaywid BS. Urolithiasis: Prevalence, risk factors, and public awareness regarding dietary and lifestyle habits in Jeddah, Saudi Arabia in 2017. Urol Ann. 2020;12(1):57-62. https://doi.org/10.4103/ua.ua_13_19CrossRefPubMed

Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023. https://doi.org/10.3350/cmh.2023.0089CrossRefPubMedPubMedCentral

Ayers JW, Zhu Z, Poliak A, Leas EC, Dredze M, Hogarth M, et al. Evaluating Artificial Intelligence Responses to Public Health Questions. JAMA Netw Open. 2023;6(6):e2317517. https://doi.org/10.1001/jamanetworkopen.2023.17517CrossRefPubMedPubMedCentral

Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T. Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study. Int J Environ Res Public Health. 2023;20(4). https://doi.org/10.3390/ijerph20043378

Howard A, Hope W, Gerada A. ChatGPT and antimicrobial advice: the end of the consulting infection doctor? Lancet Infect Dis. 2023;23(4):405-6. https://doi.org/10.1016/s1473-3099(23)00113-5CrossRefPubMed

10.

Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare (Basel). 2023;11(6). https://doi.org/10.3390/healthcare11060887

11.

Gortz M, Baumgartner K, Schmid T, Muschko M, Woessner P, Gerlach A, et al. An artificial intelligence-based chatbot for prostate cancer education: Design and patient evaluation study. Digit Health. 2023;9:20552076231173304. https://doi.org/10.1177/20552076231173304CrossRefPubMedPubMedCentral

12.

Zhu L, Mou W, Chen R. Can the ChatGPT and other large language models with internet-connected database solve the questions and concerns of patient with prostate cancer and help democratize medical knowledge? J Transl Med. 2023;21(1):269. https://doi.org/10.1186/s12967-023-04123-5CrossRefPubMedPubMedCentral

13.

Will ChatGPT transform healthcare? Nat Med. 2023;29(3):505–6. https://doi.org/10.1038/s41591-023-02289-5

14.

Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios. J Med Syst. 2023;47(1):33. https://doi.org/10.1007/s10916-023-01925-4CrossRefPubMedPubMedCentral

15.

Rao A, Kim J, Kamineni M, Pang M, Lie W, Succi MD. Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making. medRxiv. 2023. https://doi.org/10.1101/2023.02.02.23285399

16.

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepano C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. https://doi.org/10.1371/journal.pdig.0000198CrossRefPubMedPubMedCentral

17.

D'Amico RS, White TG, Shah HA, Langer DJ. I Asked a ChatGPT to Write an Editorial About How We Can Incorporate Chatbots Into Neurosurgical Research and Patient Care. Neurosurgery. 2023;92(4):663-4. https://doi.org/10.1227/neu.0000000000002414CrossRefPubMed

18.

Mann DL. Artificial Intelligence Discusses the Role of Artificial Intelligence in Translational Medicine: A JACC: Basic to Translational Science Interview With ChatGPT. JACC Basic Transl Sci. 2023;8(2):221-3. https://doi.org/10.1016/j.jacbts.2023.01.001CrossRefPubMedPubMedCentral

19.

Rao A, Kim J, Kamineni M, Pang M, Lie W, Succi MD. Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making. medRxiv. 2023:2023.02.02.23285399. https://doi.org/10.1101/2023.02.02.23285399

20.

The Lancet Digital H. ChatGPT: friend or foe? Lancet Digit Health. 2023;5(3):e102. https://doi.org/10.1016/S2589-7500(23)00023-7CrossRef

21.

Marchandot B, Matsushita K, Carmona A, Trimaille A, Morel O. ChatGPT: the next frontier in academic writing for cardiologists or a pandora's box of ethical dilemmas. Eur Heart J Open. 2023;3(2):oead007. https://doi.org/10.1093/ehjopen/oead007

22.

Stokel-Walker C, Van Noorden R. What ChatGPT and generative AI mean for science. Nature. 2023;614(7947):214-6. https://doi.org/10.1038/d41586-023-00340-6CrossRefPubMed

23.

Lubowitz JH. ChatGPT, An Artificial Intelligence Chatbot, Is Impacting Medical Literature. Arthroscopy. 2023;39(5):1121-2. https://doi.org/10.1016/j.arthro.2023.01.015CrossRefPubMed

24.

Ahn C. Exploring ChatGPT for information of cardiopulmonary resuscitation. Resuscitation. 2023;185:109729. https://doi.org/10.1016/j.resuscitation.2023.109729CrossRefPubMed

25.

Anderson LM, Scrimshaw SC, Fullilove MT, Fielding JE, Normand J, Task Force on Community Preventive S. Culturally competent healthcare systems. A systematic review. Am J Prev Med. 2003;24(3 Suppl):68–79. https://doi.org/10.1016/s0749-3797(02)00657-8

Titel: Evaluating the Performance of Different Large Language Models on Health Consultation and Patient Education in Urolithiasis
verfasst von: Haifeng Song
Yi Xia
Zhichao Luo
Hui Liu
Yan Song
Xue Zeng
Tianjie Li
Guangxin Zhong
Jianxing Li
Ming Chen
Guangyuan Zhang
Bo Xiao
Publikationsdatum: 01.12.2023
Verlag: Springer US
Erschienen in: Journal of Medical Systems / Ausgabe 1/2023
Print ISSN: 0148-5598
Elektronische ISSN: 1573-689X
DOI: https://doi.org/10.1007/s10916-023-02021-3

Springer Medizin

Abstract

Objectives

Materials and methods

Results

Conclusion

Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten

Weitere Artikel der Ausgabe 1/2023

Evaluation of Extracurricular Medical Education in Cardiothoracic Surgery and Cardiology; Students’ Opinion On Current Medical Training

Early Expected Discharge Date Accuracy During Hospitalization: A Multivariable Analysis

Can Artificial Intelligence Ever Develop the Human Touch and Replace a Psychiatrist? - A letter to the editor of the Journal of Medical Systems: Regarding “Artificial Intelligence in Medicine & ChatGPT: De-Tether the Physician”

A Data-Driven Analysis of Ward Capacity Strain Metrics That Predict Clinical Outcomes Among Survivors of Acute Respiratory Failure

Correction to: Systematic Review of Machine Learning Applied to the Prediction of Obesity and Overweight

Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios