Summary of results
Twelve studies comparing DBT plus s2D to DM alone in screening were included in our meta-analyses. We found that screening with DBT plus s2D compared to DM alone is associated with a higher CDR ([RR, 95% CI] 1.35, 1.20–1.52), decreased recalls (0.79, 0.64–0.98), and a higher cancer detection among recalls (1.69, 1.45–1.96). Cancer detection after recommended and performed biopsies was higher with DBT plus s2D compared to DM alone (PPV-2: 1.57, 1.08–2.28; PPV-3: 1.36, 1.17–1.58). We did not identify any differences in biopsy rates and ICR.
Results interpretation and comparison with literature
Our results regarding CDR, recall rates, and PPV-1 were in line with Alabousi et al. [
13]. In comparison to Giampietro et al. [
14], we found a statistically significant difference in recalls with fewer recalls for DBT plus s2D than for DM alone. However, the inclusion criteria of the latter study differ. The better results in our and Alabousi’s study may indicate lower recall rates for DBT plus s2D than for DBT plus DM. They may also reflect the learning curve from prior to more recent studies. Statistical significance was lost in our sensitivity analyses of recalls, if, e.g. the US studies Aujero et al. [
20] or Freer et al. [
23] were excluded. Since screening characteristics like reading procedure or screening intervals differ in the USA, these could be potential factors impacting heterogeneity. In contrast to the other studies, Houssami et al. [
26] reported a statistically significantly higher risk of being recalled for one pilot screening trial when using DBT plus s2D compared to DM alone. In this population, women screened with DBT plus s2D were younger, reported symptoms more often, and participated more often in the prevalent screening round compared to women screened with DM alone [
26]. Even if recall rates are contrary between studies, the number of cancers detected per 100 recalled women is consistently higher in women screened with DBT plus s2D. DBT plus s2D is associated with higher CDR and concurrently with fewer recalls. Furthermore, higher cancer detection per 100 women with recommended or performed biopsy underlines that DBT plus s2D is more precise in identifying cancers than DM alone. A 9%-point higher sensitivity (83%, 95% CI: 78–87%) for DBT plus s2D compared to DM alone (74%, 95% CI: 65–81%) was also reported by Abdullah et al. [
34].
Since high CDR may be related with overdiagnoses, ICR is the more clinically relevant outcome parameter as it reflects potentially important delays in diagnosis and treatment. We identified two European studies [
21,
30] reporting ICR comparing DBT plus s2D to DM alone. While Bernardi et al. [
21] defined interval cancers as ‘cancers identified over two-year follow-up’ [
21], Hovda et al. [
30] defined interval cancers as ‘cancers diagnosed 0–24 months after negative screening findings or 6–24 months after false positive baseline screening findings’ [
30]. An inconsistent trend of ICR per screening modality and small sample sizes resulted in no statistically significant difference. The same results are shown in a recently published meta-analysis by Houssami et al.[
35]. They assessed ICR in women screened with DBT compared to DM. Sensitivity analyses had shown no statistically significant differences in ICR comparing DBT plus s2D to DM alone [
35].
Published data on interval cancers following DBT plus s2D are limited and inconsistent. In principle, high CDR and unchanged ICR may be associated with a smaller than expected improvement of mortality reduction and with the risk of increased overdiagnosis. However, the effect of over-detection on mortality reduction and the risk of overdiagnosis can only be estimated after results of follow-up rounds, cancer stages, and biology become available from appropriately designed studies. For improved interpretation of the increased CDR, cancer biology and results for different breast densities may play a role. While Winter et al. [
36] reported a lower rate of node positive interval cancers after DBT screening, Bahl et al. [
37] reported comparable biology of interval cancer after DBT versus FFDM. Both of these study designs, however, cannot exclude bias. One very recent study, the only study using wide-angle DBT (without s2D), presented reduced ICR after screening with DBT compared to DM [
38]. While differences might be associated with the different technologies, possible bias in the control group must also be discussed. Considering the differing results of the limited data on interval cancers, a possible correlation between the amount of recall reduction, additional detection, and effect on ICR might also be worth discussing. Also, most of the included studies were originally not designed and powered to show differences in ICR. Meta-analyses with pooled estimates based on data from underpowered studies with small sample sizes are also likely to be underpowered [
39]. Finally, results concerning additional detection and effect on ICR may vary for different ranges of breast densities. To date, these data are not yet available from large studies. Given the fact that the European Commission recommends mammography screening for women aged 45–49 years old [
40], DBT could be a more effective alternative, since younger women tend to have more dense breasts and the accuracy of mammography may be poorer. Furthermore, overdiagnoses may be lower in younger women with a longer remaining life-time, as small cancers have a higher risk, or longer time, respectively, for negative development.
Biases and limitations
This study has several limitations. First, search was carried out in only PubMed and studies that did not have available abstracts and English full text were excluded. Second, in 9 of the ten underlying trials women were not assigned randomly to screening modalities (concerning ~ 385,532 women among a total of 414,281). Also, 4 of the ten underlying trial study groups differ in time periods (concerning ~ 226,419 women among a total of 414,281). Unpaired and non-randomised study designs, for example in which participant characteristics (e.g. breast density, family history, or availability of screening modalities) may differ, lead to potential bias by confounding variables. Since a systematic assessment of potential confounding variables was beyond the scope of our work, a comparison of screening performance of modalities in women with dense breast tissue only seems to be useful in further subgroup analyses. Third, heterogeneity among studies was observed. We used REM and strived to interpret results only considering potential factors impacting heterogeneity. However, our study did not address other influencing or limiting factors. Valuable further data from randomised designs will become available in the near future, for example from a large RCT for which the recruitment of a prospectively acquired study population of 80,000 women [
41] was recently completed.