Skip to main content
Erschienen in: European Radiology 12/2020

14.07.2020 | Imaging Informatics and Artificial Intelligence

Automated identification of chest radiographs with referable abnormality with deep learning: need for recalibration

verfasst von: Eui Jin Hwang, Hyungjin Kim, Jong Hyuk Lee, Jin Mo Goo, Chang Min Park

Erschienen in: European Radiology | Ausgabe 12/2020

Einloggen, um Zugang zu erhalten

Abstract

Objectives

To evaluate the calibration of a deep learning (DL) model in a diagnostic cohort and to improve model’s calibration through recalibration procedures.

Methods

Chest radiographs (CRs) from 1135 consecutive patients (M:F = 582:553; mean age, 52.6 years) who visited our emergency department were included. A commercialized DL model was utilized to identify abnormal CRs, with a continuous probability score for each CR. After evaluation of the model calibration, eight different methods were used to recalibrate the original model based on the probability score. The original model outputs were recalibrated using 681 randomly sampled CRs and validated using the remaining 454 CRs. The Brier score for overall performance, average and maximum calibration error, absolute Spiegelhalter’s Z for calibration, and area under the receiver operating characteristic curve (AUROC) for discrimination were evaluated in 1000-times repeated, randomly split datasets.

Results

The original model tended to overestimate the likelihood for the presence of abnormalities, exhibiting average and maximum calibration error of 0.069 and 0.179, respectively; an absolute Spiegelhalter’s Z value of 2.349; and an AUROC of 0.949. After recalibration, significant improvements in the average (range, 0.015–0.036) and maximum (range, 0.057–0.172) calibration errors were observed in eight and five methods, respectively. Significant improvement in absolute Spiegelhalter’s Z (range, 0.809–4.439) was observed in only one method (the recalibration constant). Discriminations were preserved in six methods (AUROC, 0.909–0.949).

Conclusion

The calibration of DL algorithm can be augmented through simple recalibration procedures. Improved calibration may enhance the interpretability and credibility of the model for users.

Key Points

A deep learning model tended to overestimate the likelihood of the presence of abnormalities in chest radiographs.
Simple recalibration of the deep learning model using output scores could improve the calibration of model while maintaining discrimination.
Improved calibration of a deep learning model may enhance the interpretability and the credibility of the model for users.
Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Ehteshami Bejnordi B, Veta M, Johannes van Diest P et al (2017) Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318:2199–2210CrossRef Ehteshami Bejnordi B, Veta M, Johannes van Diest P et al (2017) Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318:2199–2210CrossRef
2.
Zurück zum Zitat Esteva A, Kuprel B, Novoa RA et al (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118CrossRef Esteva A, Kuprel B, Novoa RA et al (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118CrossRef
3.
Zurück zum Zitat Gulshan V, Peng L, Coram M et al (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316:2402–2410CrossRef Gulshan V, Peng L, Coram M et al (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316:2402–2410CrossRef
4.
Zurück zum Zitat Ardila D, Kiraly AP, Bharadwaj S et al (2019) End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med 25:954–961CrossRef Ardila D, Kiraly AP, Bharadwaj S et al (2019) End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med 25:954–961CrossRef
5.
Zurück zum Zitat Byrne MF, Chapados N, Soudan F et al (2019) Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model. Gut 68:94–100CrossRef Byrne MF, Chapados N, Soudan F et al (2019) Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model. Gut 68:94–100CrossRef
6.
Zurück zum Zitat De Fauw J, Ledsam JR, Romera-Paredes B et al (2018) Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med 24:1342–1350CrossRef De Fauw J, Ledsam JR, Romera-Paredes B et al (2018) Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med 24:1342–1350CrossRef
7.
Zurück zum Zitat Kather JN, Pearson AT, Halama N et al (2019) Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat Med 25:1054–1056CrossRef Kather JN, Pearson AT, Halama N et al (2019) Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat Med 25:1054–1056CrossRef
8.
Zurück zum Zitat Hwang EJ, Park S, Jin KN et al (2019) Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw Open 2:e191095CrossRef Hwang EJ, Park S, Jin KN et al (2019) Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw Open 2:e191095CrossRef
10.
Zurück zum Zitat Nam JG, Park S, Hwang EJ et al (2019) Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology 290:218–228CrossRef Nam JG, Park S, Hwang EJ et al (2019) Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology 290:218–228CrossRef
11.
Zurück zum Zitat Lakhani P, Sundaram B (2017) Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284:574–582CrossRef Lakhani P, Sundaram B (2017) Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284:574–582CrossRef
12.
Zurück zum Zitat Rajpurkar P, Irvin J, Ball RL et al (2018) Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med 15:e1002686CrossRef Rajpurkar P, Irvin J, Ball RL et al (2018) Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med 15:e1002686CrossRef
13.
Zurück zum Zitat Taylor AG, Mielke C, Mongan J (2018) Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: a retrospective study. PLoS Med 15:e1002697CrossRef Taylor AG, Mielke C, Mongan J (2018) Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: a retrospective study. PLoS Med 15:e1002697CrossRef
14.
Zurück zum Zitat Annarumma M, Withey SJ, Bakewell RJ, Pesce E, Goh V, Montana G (2019) Automated triaging of adult chest radiographs with deep artificial neural networks. Radiology 291:196–202CrossRef Annarumma M, Withey SJ, Bakewell RJ, Pesce E, Goh V, Montana G (2019) Automated triaging of adult chest radiographs with deep artificial neural networks. Radiology 291:196–202CrossRef
15.
Zurück zum Zitat Park S, Lee SM, Kim N et al (2019) Application of deep learning-based computer-aided detection system: detecting pneumothorax on chest radiograph after biopsy. Eur Radiol 29:5341–5348CrossRef Park S, Lee SM, Kim N et al (2019) Application of deep learning-based computer-aided detection system: detecting pneumothorax on chest radiograph after biopsy. Eur Radiol 29:5341–5348CrossRef
16.
Zurück zum Zitat Chassagnon G, Vakalopolou M, Paragios N, Revel MP (2020) Deep learning: definition and perspectives for thoracic imaging. Eur Radiol 30:2021–2030CrossRef Chassagnon G, Vakalopolou M, Paragios N, Revel MP (2020) Deep learning: definition and perspectives for thoracic imaging. Eur Radiol 30:2021–2030CrossRef
17.
Zurück zum Zitat Park S, Lee SM, Lee KH et al (2020) Deep learning-based detection system for multiclass lesions on chest radiographs: comparison with observer readings. Eur Radiol 30:1359–1368CrossRef Park S, Lee SM, Lee KH et al (2020) Deep learning-based detection system for multiclass lesions on chest radiographs: comparison with observer readings. Eur Radiol 30:1359–1368CrossRef
18.
Zurück zum Zitat Park A, Chute C, Rajpurkar P et al (2019) Deep learning-assisted diagnosis of cerebral aneurysms using the HeadXNet model. JAMA Netw Open 2:e195600CrossRef Park A, Chute C, Rajpurkar P et al (2019) Deep learning-assisted diagnosis of cerebral aneurysms using the HeadXNet model. JAMA Netw Open 2:e195600CrossRef
19.
Zurück zum Zitat Hwang EJ, Nam JG, Lim WH et al (2019) Deep learning for chest radiograph diagnosis in the emergency department. Radiology 293:573–580CrossRef Hwang EJ, Nam JG, Lim WH et al (2019) Deep learning for chest radiograph diagnosis in the emergency department. Radiology 293:573–580CrossRef
20.
Zurück zum Zitat Moons KG, Altman DG, Reitsma JB et al (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 162:W1–W73 Moons KG, Altman DG, Reitsma JB et al (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 162:W1–W73
21.
Zurück zum Zitat Steyerberg EW, Vickers AJ, Cook NR et al (2010) Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 21:128–138CrossRef Steyerberg EW, Vickers AJ, Cook NR et al (2010) Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 21:128–138CrossRef
22.
Zurück zum Zitat Park SH, Han K (2018) Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 286:800–809CrossRef Park SH, Han K (2018) Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 286:800–809CrossRef
23.
Zurück zum Zitat Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR.org, pp 1321–1330 Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR.org, pp 1321–1330
24.
Zurück zum Zitat Damen JA, Pajouheshnia R, Heus P et al (2019) Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis. BMC Med 17:109CrossRef Damen JA, Pajouheshnia R, Heus P et al (2019) Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis. BMC Med 17:109CrossRef
25.
Zurück zum Zitat Winter A, Aberle DR, Hsu W (2019) External validation and recalibration of the Brock model to predict probability of cancer in pulmonary nodules using NLST data. Thorax 74:551–563CrossRef Winter A, Aberle DR, Hsu W (2019) External validation and recalibration of the Brock model to predict probability of cancer in pulmonary nodules using NLST data. Thorax 74:551–563CrossRef
26.
Zurück zum Zitat Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classifiers 10:61–74 Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classifiers 10:61–74
27.
Zurück zum Zitat Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Habbema JD (2004) Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat Med 23:2567–2586CrossRef Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Habbema JD (2004) Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat Med 23:2567–2586CrossRef
28.
Zurück zum Zitat Kull M, Silva Filho T, Flach P (2017) Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. Artificial Intelligence and Statistics, pp 623–631 Kull M, Silva Filho T, Flach P (2017) Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. Artificial Intelligence and Statistics, pp 623–631
29.
Zurück zum Zitat Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 694–699 Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 694–699
30.
Zurück zum Zitat Naeini MP, Cooper G, Hauskrecht M (2015) Obtaining well calibrated probabilities using bayesian binning. Proc Conf AAAI Artif Intell Naeini MP, Cooper G, Hauskrecht M (2015) Obtaining well calibrated probabilities using bayesian binning. Proc Conf AAAI Artif Intell
32.
Zurück zum Zitat Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78:1–3CrossRef Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78:1–3CrossRef
33.
Zurück zum Zitat Spiegelhalter DJ (1986) Probabilistic prediction in patient management and clinical trials. Stat Med 5:421–433CrossRef Spiegelhalter DJ (1986) Probabilistic prediction in patient management and clinical trials. Stat Med 5:421–433CrossRef
34.
Zurück zum Zitat Rufibach K (2010) Use of Brier score to assess binary predictions. J Clin Epidemiol 63:938–939 author reply 939CrossRef Rufibach K (2010) Use of Brier score to assess binary predictions. J Clin Epidemiol 63:938–939 author reply 939CrossRef
35.
Zurück zum Zitat Moons KG, Kengne AP, Grobbee DE et al (2012) Risk prediction models: II. External validation, model updating, and impact assessment. Heart 98:691–698CrossRef Moons KG, Kengne AP, Grobbee DE et al (2012) Risk prediction models: II. External validation, model updating, and impact assessment. Heart 98:691–698CrossRef
36.
Zurück zum Zitat Royston P, Altman DG (2013) External validation of a Cox prognostic model: principles and methods. BMC Med Res Methodol 13:33CrossRef Royston P, Altman DG (2013) External validation of a Cox prognostic model: principles and methods. BMC Med Res Methodol 13:33CrossRef
Metadaten
Titel
Automated identification of chest radiographs with referable abnormality with deep learning: need for recalibration
verfasst von
Eui Jin Hwang
Hyungjin Kim
Jong Hyuk Lee
Jin Mo Goo
Chang Min Park
Publikationsdatum
14.07.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
European Radiology / Ausgabe 12/2020
Print ISSN: 0938-7994
Elektronische ISSN: 1432-1084
DOI
https://doi.org/10.1007/s00330-020-07062-7

Weitere Artikel der Ausgabe 12/2020

European Radiology 12/2020 Zur Ausgabe

Akuter Schwindel: Wann lohnt sich eine MRT?

28.04.2024 Schwindel Nachrichten

Akuter Schwindel stellt oft eine diagnostische Herausforderung dar. Wie nützlich dabei eine MRT ist, hat eine Studie aus Finnland untersucht. Immerhin einer von sechs Patienten wurde mit akutem ischämischem Schlaganfall diagnostiziert.

Screening-Mammografie offenbart erhöhtes Herz-Kreislauf-Risiko

26.04.2024 Mammografie Nachrichten

Routinemäßige Mammografien helfen, Brustkrebs frühzeitig zu erkennen. Anhand der Röntgenuntersuchung lassen sich aber auch kardiovaskuläre Risikopatientinnen identifizieren. Als zuverlässiger Anhaltspunkt gilt die Verkalkung der Brustarterien.

S3-Leitlinie zu Pankreaskrebs aktualisiert

23.04.2024 Pankreaskarzinom Nachrichten

Die Empfehlungen zur Therapie des Pankreaskarzinoms wurden um zwei Off-Label-Anwendungen erweitert. Und auch im Bereich der Früherkennung gibt es Aktualisierungen.

Fünf Dinge, die im Kindernotfall besser zu unterlassen sind

18.04.2024 Pädiatrische Notfallmedizin Nachrichten

Im Choosing-Wisely-Programm, das für die deutsche Initiative „Klug entscheiden“ Pate gestanden hat, sind erstmals Empfehlungen zum Umgang mit Notfällen von Kindern erschienen. Fünf Dinge gilt es demnach zu vermeiden.

Update Radiologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.