Introduction
High mammographic density has consistently been shown to be associated with an increased risk of breast cancer [
1]. Hence, there has been a growing interest of evaluating mammographic density for individualized screening programs [
2] and for incorporation in risk prediction models [
3]. However, optimal use of mammographic density requires a reliable measuring method. Today, both qualitative and quantitative mammographic density measurement methods are available [
4]. The most often used clinical classification of mammographic density is the qualitative Breast Imaging-Reporting and Data System (BI-RADS) [
5]. Although afflicted with substantial interobserver variability (kappa 0.43–0.79) [
6‐
12], mammographic density as classified by BI-RADS has consistently been associated with an increased risk of breast cancer [
1,
13]. However, the latest BI-RADS 5th Edition aims to capture the risk of masking of tumors by dense breast tissue, more than the risk of developing breast cancer [
5]. In order to improve objectivity and reproducibility, quantitative breast density measurements have been developed [
4]. The area-based, semi-quantitative measurements, such as Cumulus, are represented by different computer-assisted techniques [
4]. However, these techniques are also user-dependent and time-consuming. Both the breast itself and the dense breast tissue are three-dimensional, and a previous study reported volumetric breast density measurements to more accurately estimate breast cancer risk than breast density estimated with area-based methods [
14]. Previous studies on fully automated volumetric methods of measuring breast density have shown high reproducibility [
15] and association with breast cancer risk [
16,
17]. Furthermore, the volumetric methods have shown to be positively associated with BI-RADS categories [
18‐
21] as well as to magnetic resonance imaging (MRI) measurements of breast fibroglandular tissue [
22,
23]. A previous large study (n = 8867) showed good correlation between two different automated techniques of measuring volumetric breast density, but the agreement with visually estimated mammographic density was poor, albeit better than the agreement with the area-based method [
24]. In addition to a mere value or a category of mammographic density, temporal changes in mammographic density have also rendered attention. A decrease of mammographic density has been shown to be associated with a decreased risk of contra-lateral breast cancer [
25] as well as a positive marker for response to tamoxifen treatment [
26], further motivating a more sensitive measurement than the rather coarse BI-RADS categories.
The aim of this study was to assess the agreement of mammographic density by a fully automated volumetric method with the radiologists’ classification according to BI-RADS 4th Edition. Part of the Malmö Breast Tomosynthesis Screening Trial (MBTST) population, comprising nearly 8500 screening mammography examinations with measured volumetric mammographic density and qualitative classification according to BI-RADS, was used to address the aim of this study.
Discussion
In this large study, we analyzed mammographic density assessment in a screening population with a fully automated volumetric assessment using Volpara software compared to the radiologists’ classification according to BI-RADS, 4th Edition. We found that the agreement between BI-RADS scores was substantial, indicating that the radiologists evaluated the mammographic density in a similar manner. Agreement between VDG and BI-RADS scores was moderate.
Our results are in line with a previous large study showing that different mammographic density measurements did not produce identical results [
24]. Morrish et al. showed a low correlation between Volpara and observers’ visual estimations of mammographic density using the VAS method (Visual Analog Scale), albeit better with volumetric density than with area density [
24]. Other studies have shown positive associations [
21] and good correlations between VBD and BI-RADS [
18,
19]. However, the use of correlation instead of agreement in previous studies makes direct comparison with this present study difficult. Furthermore, correlation may not be the method of choice since correlation only measures the strength of a relation between two variables, not the agreement between them [
32,
33]. However, there was a difference in mammographic density distribution according to BI-RADS between previous studies [
18,
19] and this present study, which may be caused by differences in both age and ethnicity. Asian ethnicity and younger age are known to be associated with higher mammographic density [
13,
34] as could be observed in the aforementioned studies. Gweon et al. reported 62 % of the examinations to be categorized as BI-RADS 3 and 18.8 % to be categorized as BI-RADS 4 in an Asian population with a mean age of 51.7 years [
18]; the corresponding distribution for this study would be 35.2 % for BI-RADS 3 and only 7.5 % for BI-RADS 4 and a mean age of 58 years. The observations of this study, that Volpara classified more examinations in the highest VDG category than the radiologists (BI-RADS) and that there was moderate agreement between VDG and BI-RADS, have also been previously described [
18,
19,
22]. On the other hand, a previous Dutch study reported the BI-RADS distribution to be quite comparable with the VDG distribution, with a weighted kappa value of 0.80 [
21].
There could be several explanations for the lower degree of agreement between Volpara and BI-RADS assessments. First, BI-RADS scores are set based on processed images, while Volpara analyses are performed on raw DM data. Second, VBD is measured on a continuous scale and BI-RADS scores are a coarse estimation into four groups. Therefore, values of mammographic density near the limits in the different VDG categories could be classified into the upper or the lower adjacent BI-RADS category since small differences in mammographic density would not be detected by the radiologists. And finally, both Volpara and the radiologists estimate the amount or percentage of dense breast tissue. However, despite the BI-RADS 4th Edition definitions, it might be that the radiologists are also taking into account the distribution of the mammographic density and the difficulty of detecting a breast tumour, which may not always represent an actual increased amount of dense tissue, albeit a previous study reported high volumetric density to be correlated to decreased mammography sensitivity [
35]. Taken together, this may indicate that radiologists evaluate mammographic density differently than automated software.
The automated method may still be a robust and valuable tool. High mammographic density, whether measured by Volpara or qualitatively with BI-RADS, has been shown to be associated with an increased breast cancer risk [
1,
16,
17]. Previous reviews on mammographic density [
2] and breast cancer risk prediction [
3] have emphasized the need for improved and individualized breast cancer screening programs and risk prediction models. One way of improving these programs and models could be by incorporating a fully automated volumetric assessment of continuously measured mammographic density that may reduce the interobserver variability [
15] and thereby producing a more reliable density estimate. A more reliable density estimate may then be used to stratify women in to different screening and risk groups.
Some limitations of this study require consideration. First, the BI-RADS 4th Edition was standard according to the time period during the main part of the MBTST; the impact of the BI-RADS 5th Edition on the results would have been interesting to analyze. This was, however, out of scope for this study. Second, two previous studies investigating BI-RADS agreement had several radiologists reading the images in the density analyses, which, of course, would have been preferable (11 [
11] and 21 radiologists [
12]). However, five radiologists is still a realistic number of readers in a single-centre study. Third, breast tumours are known to possibly affect the surrounding breast tissue and, thereby, perhaps also the mammographic density and we, therefore, excluded examinations from women with breast cancer. Finally, consistently registered information on previous breast surgery, use of hormone replacement therapy, or reproductive information was not available, all of which are factors known to possibly affect the mammographic density. However, we do not believe this affected our results because the aforementioned factors are not expected to affect the modes of assessment differently.
The population in this study was a screening population representative of the female population in the screening ages 40–74 years in the city of Malmö, Sweden [
27]. Furthermore, the BI-RADS scores were prospectively performed by several radiologists, representing the common mass screening setting. The interobserver variability was low, reflecting a solid evaluation of qualitatively estimated mammographic density. Altogether, this study may well represent everyday screening practice.
In conclusion, there was moderate agreement between Volpara and BI-RADS scores from European radiologists, indicating that radiologists evaluate mammographic density differently than automated software. However, the automated method may still be a robust and valuable tool. In addition to this, the differences in interpretation between radiologists and software will require further investigation. Future studies evaluating fully automated density assessments on different populations are warranted in order to ensure accurate reflection of mammographic density, with an additional focus on breast cancer risk and screening outcomes.
Acknowledgements
Ralph Highnam at Volpara is acknowledged for providing access to the Volpara software. The radiologists participating in the BI-RADS scores: Ingvar Andersson, Annicka Lindahl, Marianne Löfgren, Cecilia Wattsgård and Barbara Ziemiecka, and the nurses responsible for performing the screening examinations: Ulrica Pettersson and Maria Seserin.
The scientific guarantor of this publication is Pontus Timberg, PhD. The authors of this manuscript declare relationships with the following companies: The sponsors (Volpara) of the study had no role in the design and performance of the study, data analysis, or data interpretation. Siemens AG (Erlangen, Germany) sponsored the study by providing the mammography equipment. KL, SZ, and PT have received speakers’ fees and travel grants from Siemens. This work has received funding from government funding for clinical research within the National Health Services, Research Foundation, and The Swedish Cancer Society. One of the authors (AR) has significant statistical expertise. Institutional Review Board approval was obtained. Written informed consent was obtained from all subjects (patients) in this study. Some study subjects or cohorts have been previously reported in:
Lang et al.: “Performance of one-view breast tomosynthesis as a stand-alone breast cancer screening modality: results from the Malmo Breast Tomosynthesis Screening Trial, a population-based study”. European Radiology 2016, 26(1):184–190. Lang et al.: “False positives in breast cancer screening with one-view breast tomosynthesis: an analysis of findings leading to recall, work-up and biopsy rates in the Malmö Breast Tomosynthesis Screening Trial” (accepted in European Radiology 2015). Rosso et al.: “Factors affecting recall rate and false positive fraction in breast cancer screening with breast tomosynthesis - A statistical approach”. The Breast 2015, 24(5):680–686.
Methodology: prospective, diagnostic or prognostic study, performed at one institution.