Misdiagnosing Bayesian statistics
- 20.01.2025
- LETTER TO THE EDITOR
- Verfasst von
- Vincenzo Guastafierro
- Devin Nicole Corbitt
- Alessandra Bressan
- Bethania Fernandes
- Ömer Mintemur
- Francesca Magnoli
- Susanna Ronchi
- Stefano La Rosa
- Silvia Uccella
- Salvatore Lorenzo Renne
- Erschienen in
- Virchows Archiv | Ausgabe 4/2025
Auszug
In response to the critique raised by Daungsupawong and Wiwanitkit [1] regarding the reliance on expert pathologists and the potential for skewed findings, the study Unveiling the Risks of ChatGPT in Diagnostic Surgical Pathology [2] employed a rigorous methodology designed to address these concerns through careful study design and robust statistical analysis. Below is a detailed explanation of the measures taken to mitigate potential biases and enhance the reliability of the findings: (A) Methodological Rigor in Scenario Creation and Evaluation: (1) Expertly Designed Scenarios: Clinico-pathological scenarios were carefully crafted by the authors and thoroughly reviewed by expert pathologists in each subfield to ensure alignment with current diagnostic guidelines. This ensured the scenarios were both realistic and clinically relevant. (2) Diverse Evaluator Panel: While the scenarios were reviewed by specialists, ChatGPT’s responses were assessed by six pathologists. Although the paper does not explicitly describe the evaluators’ experience levels, their inclusion brought diverse perspectives to the evaluation process, contributing to a balanced assessment. (3) Clear Assessment Criteria: Pathologists evaluated ChatGPT’s outputs based on two distinct criteria: (i) usefulness in supporting the clinico-pathological diagnosis, and (ii) the absolute number of errors. This nuanced approach went beyond simple measures of accuracy, capturing the broader utility and reliability of the AI’s responses. (4) Randomized Evaluation: The order of ChatGPT’s answers was randomized during the assessment to mitigate any potential bias related to the sequence of presentation. (B) Statistical Approaches to Account for Variability: (1) Bayesian Network Modeling: A Bayesian Network was used to model ChatGPT’s latent knowledge across pathology subfields, explicitly marginalizing the influence of individual pathologists. By treating the pathologist (Pa) as a variable and including interactions between pathologists and subspecialties, the model controlled for evaluator-specific biases. (2) Posterior Predictive Simulations: These simulations were used to exclude the effects of individual pathologists, enabling the researchers to measure the usefulness density and error rates more reliably. (3) Subspecialty-Specific Analysis: ChatGPT’s performance was analyzed across ten pathology subspecialties, allowing the study to determine whether shortcomings were consistent or varied by field. Findings indicated minimal variation in ChatGPT’s latent knowledge across subspecialties, reflecting consistent performance patterns. …
Anzeige
- Titel
- Misdiagnosing Bayesian statistics
- Verfasst von
-
Vincenzo Guastafierro
Devin Nicole Corbitt
Alessandra Bressan
Bethania Fernandes
Ömer Mintemur
Francesca Magnoli
Susanna Ronchi
Stefano La Rosa
Silvia Uccella
Salvatore Lorenzo Renne
- Publikationsdatum
- 20.01.2025
- Verlag
- Springer Berlin Heidelberg
- Erschienen in
-
Virchows Archiv / Ausgabe 4/2025
Print ISSN: 0945-6317
Elektronische ISSN: 1432-2307 - DOI
- https://doi.org/10.1007/s00428-025-04027-3
Dieser Inhalt ist nur sichtbar, wenn du eingeloggt bist und die entsprechende Berechtigung hast.