Skip to main content
Erschienen in:

10.01.2022 | Original Article

Deep learning prediction of sex on chest radiographs: a potential contributor to biased algorithms

verfasst von: David Li, Cheng Ting Lin, Jeremias Sulam, Paul H. Yi

Erschienen in: Emergency Radiology | Ausgabe 2/2022

Einloggen, um Zugang zu erhalten

Abstract

Background

Deep convolutional neural networks (DCNNs) for diagnosis of disease on chest radiographs (CXR) have been shown to be biased against males or females if the datasets used to train them have unbalanced sex representation. Prior work has suggested that DCNNs can predict sex on CXR, which could aid forensic evaluations, but also be a source of bias.

Objective

To (1) evaluate the performance of DCNNs for predicting sex across different datasets and architectures and (2) evaluate visual biomarkers used by DCNNs to predict sex on CXRs.

Materials and methods

Chest radiographs were obtained from the Stanford CheXPert and NIH Chest XRay14 datasets which comprised of 224,316 and 112,120 CXRs, respectively. To control for dataset size and class imbalance, random undersampling was used to reduce each dataset to 97,560 images that were balanced for sex. Each dataset was randomly split into training (70%), validation (10%), and test (20%) sets. Four DCNN architectures pre-trained on ImageNet were used for transfer learning. DCNNs were externally validated using a test set from the opposing dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUC). Class activation mapping (CAM) was used to generate heatmaps visualizing the regions contributing to the DCNN’s prediction.

Results

On the internal test set, DCNNs achieved AUROCs ranging from 0.98 to 0.99. On external validation, the models reached peak cross-dataset performance of 0.94 for the VGG19-Stanford model and 0.95 for the InceptionV3-NIH model. Heatmaps highlighted similar regions of attention between model architectures and datasets, localizing to the mediastinal and upper rib regions, as well as to the lower chest/diaphragmatic regions.

Conclusion

DCNNs trained on two large CXR datasets accurately predicted sex on internal and external test data with similar heatmap localizations across DCNN architectures and datasets. These findings support the notion that DCNNs can leverage imaging biomarkers to predict sex and potentially confound the accurate prediction of disease on CXRs and contribute to biased models. On the other hand, these DCNNs can be beneficial to emergency radiologists for forensic evaluations and identifying patient sex for patients whose identities are unknown, such as in acute trauma.
Literatur
2.
Zurück zum Zitat Seyyed-Kalantari L, Liu G, McDermott M, et al (2020) CheXclusion: fairness gaps in deep chest X-ray classifiers. arXiv Seyyed-Kalantari L, Liu G, McDermott M, et al (2020) CheXclusion: fairness gaps in deep chest X-ray classifiers. arXiv
5.
Zurück zum Zitat Xue Z, Antani S, Long R, Thoma GR (2018) Using deep learning for detecting gender in adult chest radiographs. In: Zhang J, Chen P-H (eds) Medical Imaging 2018: Imaging Informatics for Healthcare, Research, and Applications. SPIE, p 10 Xue Z, Antani S, Long R, Thoma GR (2018) Using deep learning for detecting gender in adult chest radiographs. In: Zhang J, Chen P-H (eds) Medical Imaging 2018: Imaging Informatics for Healthcare, Research, and Applications. SPIE, p 10
7.
Zurück zum Zitat Wang X, Peng Y, Lu L, et al (2017) ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases Wang X, Peng Y, Lu L, et al (2017) ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases
8.
Zurück zum Zitat Irvin J, Rajpurkar P, Ko M, et al (2019) CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison Irvin J, Rajpurkar P, Ko M, et al (2019) CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison
9.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition
10.
Zurück zum Zitat Szegedy C, Vanhoucke V, Ioffe S, et al (2015) Rethinking the inception architecture for computer vision. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2016-December:2818–2826 Szegedy C, Vanhoucke V, Ioffe S, et al (2015) Rethinking the inception architecture for computer vision. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2016-December:2818–2826
11.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. 3rd Int Conf Learn Represent ICLR 2015 - Conf Track Proc Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. 3rd Int Conf Learn Represent ICLR 2015 - Conf Track Proc
Metadaten
Titel
Deep learning prediction of sex on chest radiographs: a potential contributor to biased algorithms
verfasst von
David Li
Cheng Ting Lin
Jeremias Sulam
Paul H. Yi
Publikationsdatum
10.01.2022
Verlag
Springer International Publishing
Erschienen in
Emergency Radiology / Ausgabe 2/2022
Print ISSN: 1070-3004
Elektronische ISSN: 1438-1435
DOI
https://doi.org/10.1007/s10140-022-02019-3

Neu im Fachgebiet Radiologie

KI-gestütztes Mammografiescreening überzeugt im Praxistest

Mit dem Einsatz künstlicher Intelligenz lässt sich die Detektionsrate im Mammografiescreening offenbar deutlich steigern. Mehr unnötige Zusatzuntersuchungen sind laut der Studie aus Deutschland nicht zu befürchten.

Stumme Schlaganfälle − ein häufiger Nebenbefund im Kopf-CT?

In 4% der in der Notfallambulanz initiierten zerebralen Bildgebung sind „alte“ Schlaganfälle zu erkennen. Gar nicht so selten handelt es sich laut einer aktuellen Studie dabei um unbemerkte Insulte. Bietet sich hier womöglich die Chance auf ein effektives opportunistisches Screening?

Die elektronische Patientenakte kommt: Das sollten Sie jetzt wissen

Am 15. Januar geht die „ePA für alle“ zunächst in den Modellregionen an den Start. Doch schon bald soll sie in allen Praxen zum Einsatz kommen. Was ist jetzt zu tun? Was müssen Sie wissen? Wir geben in einem FAQ Antworten auf 21 Fragen.

Stören weiße Wände und viel Licht die Bildqualitätskontrolle?

Wenn es darum geht, die technische Qualität eines Mammogramms zu beurteilen, könnten graue Wandfarbe und reduzierte Beleuchtung im Bildgebungsraum von Vorteil sein. Darauf deuten zumindest Ergebnisse einer kleinen Studie hin. 

Update Radiologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.