Background
Magnetic resonance imaging (MRI) has been proposed to have a role in guiding breast cancer surgery by measuring the size of residual tumor after neoadjuvant chemotherapy (NAC), and has been shown to have high sensitivity for detecting residual disease [
1]. Given that guidelines recommend assessment of the largest tumor diameter [
2], estimation of the largest diameter by MRI may guide decisions about whether subsequent mastectomy or breast conserving surgery (BCS) should be attempted, as well as assist in planning resection to achieve clear margins in BCS. Underestimation of tumor size may therefore lead to involved surgical margins and repeat surgery; overestimation may lead to overly radical surgery (including mastectomy when BCS may have been possible), and poorer cosmetic and psychosocial outcomes [
3].
Tumor size measurement is subject to potential errors, and both tumor characteristics and imaging limitations may differentially affect the measurement accuracy of tests used for this purpose. MRI may over- or underestimate tumor size due to artefacts such as partial volume effects [
4] or disruptions to signal intensity from marker placement [
5]. Tumors may not be well visualised by mammography in patients with dense breasts [
6] or multifocal cancer [
7]. Ultrasound (US) measurements may be compromised by unclear margins [
8], acoustic shadowing [
9] or limitations in the field of view [
10]. Imaging modalities also differ in their ability to visualise ductal carcinoma
in situ (DCIS) [
11]. The inherent pliability of breast tissue also means that tumor dimensions may vary depending on patient positioning [
12]; therefore, differences in measurements undertaken in upright (mammography), supine (US) and prone positions (MRI) may arise. Furthermore, the effects of NAC may introduce greater bias in residual tumor measurement relative to the preoperative setting: reactive inflammation, fibrosis or necrosis may be difficult to distinguish from residual tumor [
13], and measurement errors may be additive when tumors regress as multiple, scattered deposits [
2].
While many studies have sought to assess the relative ability of MRI and other tests to estimate tumor size after NAC, conclusions have been hampered by small sample sizes and inadequate statistical methods. A previous study-level meta-analysis demonstrated that misleading conclusions about the accuracy of MRI may result from inappropriate analytic methods that do not measure agreement between clinical measures (e.g. Pearson or Spearman correlation coefficients) [
14]. However, that meta-analysis was limited in its ability to estimate the agreement between MRI and pathologic measurements, and to compare MRI with alternative tests, due to numerous shortcomings in the available data. For example, inconsistencies in measurement between studies, such as the inclusion or exclusion of residual ductal carcinoma in situ (DCIS) in pathologic tumour measurements, may differentially affect the measurement accuracy of MRI and other tests, and also limit the clinical applicability of pooled estimates. Comparison of MRI and other tests was also hampered by the tests being reported for different (or, at best, overlapping) patient groups, for which test performance may vary. Furthermore, a fundamental limitation was that assessing the validity of assumptions underlying the recommended statistical methods (mean differences and limits of agreement [
15]) was often not possible due to inadequate reporting.
To address those limitations, we investigated agreement between MRI-measured and pathologic tumor size after NAC in an individual patient data (IPD) meta-analysis of a large number of breast cancer patients, using appropriate methods for evaluating the agreement between measurements [
15]. Key differences between this and the previous study-level meta-analysis are summarised in Additional file
1: Appendix 1. The IPD methodology allowed us to standardise tumor measurements to include invasive cancer only, explore agreement only when residual tumor is truly present, and describe MRI measurement errors in detail. In addition, our study extended previous work by exploring agreement by characteristics that have been suggested to contribute to inaccurate measurement (NAC agents and HER2 status) [
16,
17], and examining MRI’s agreement compared with and in addition to alternative tests (US, mammography, clinical examination) when the tests were conducted in the same patients [
18].
Discussion
In the neoadjuvant setting, accurate measurement of residual malignancy may assist in guiding surgical management of breast cancer. While past research focussed on the accuracy of MRI to detect the absence of residual tumor (pCR) as a predictor of overall and disease-free survival [
1], MRI measurements of tumor size have the potential to inform decisions about surgical extent (e.g. BCS versus mastectomy). Our IPD meta-analysis assessed the agreement between MRI and pathologic tumor measurements after NAC. Pooled MDs between MRI and pathology indicated that there was no systematic bias in MRI’s estimation of tumor size when residual tumor was present. Measurement variability for agreement was lower than estimated by our previous study-level analysis [
14]; however, both over- and underestimation by MRI were observed, and LOA (+/−3.8 cm) show that substantial disagreement with pathology is possible. MRI measurement errors within that range may be of clinical importance in terms of their implications for the choice of treatment.
The IPD methodology used in this analysis allowed for measurement errors to be explored in greater detail than that permitted by study-level analyses [
14]. Tumors “missed’ by MRI generally measured ≤2.0 cm at pathology; however, MRI measurements >5.0 cm occurred in a small proportion of cases where pCR was achieved. Although descriptive reporting of such overestimation was not standard across included studies, one of the three cases of MRI measurements >5 cm in the presence of pCR observed in this data set was attributed to the presence of extensive DCIS. Other possible causes include reactive inflammation, fibrosis or necrosis induced by NAC [
13]. Description of cases of large overestimation in future studies would be valuable in guiding future research and practice. Assuming that surgeons consider the MRI-determined measurement when planning resection, such overestimation would lead to unnecessarily large excision. Although those patients are likely to benefit from improved disease-free and overall survival conferred by pCR [
47], they are less likely to benefit from a reduction in surgical extent after NAC.
Comparisons of MRI and US
in the same patients showed similar LOA, suggesting comparable performance by MRI and US when residual tumor is present (although substantial heterogeneity for US reflects its operator dependence [
2]). However, contrary to our previous study-level analysis [
14], a small bias towards underestimation of tumor size was found for US; clinical preference for either slight overestimation (MRI) or underestimation (US) of pathologic size should be considered in the choice of test. Furthermore, our analysis extends previous work by suggesting that considering the mean measurement of both tests may further improve tumor measurement. Given that studies may not have interpreted MRI blinded to US, this result is likely to underestimate the value of combining the tests. Clinicians adopting this testing strategy should be aware that the direction of MRI’s systematic bias was reversed (slight underestimation) when the tests were combined.
It is noteworthy that MRI did not estimate tumor size as accurately in patients for whom US measurement was not possible, with (on average) relatively large underestimation and wide LOA. Tumor characteristics are likely to have contributed to measurement being challenging for both tests. Patients without US had larger tumors (and consistent with this, were diagnosed with more advanced disease and were more likely to have undergone mastectomy), reflecting limitations in the US field of view [
10]. The higher rate of non-taxane-based NAC in that group may also have contributed to the larger residual tumor size [
48]. When planning resection, clinicians should note that although tumor measurement by MRI may be possible for such patients, the potential for size underestimation may lead to incomplete excision. This analysis is the first to consider those patients separately, and directly compare MRI and US when measurement by both tests can be undertaken. Our findings highlight the importance of study authors reporting MRI’s agreement with pathology separately for patients with and without alternative tests [
14,
18].
In patients with measurements by both MRI and mammography, a systematic bias in estimating tumor size was found only for MRI (slight overestimation); the larger overestimation for mammography found in a previous analysis (which included fewer studies comparing mammography and MRI) [
14] was not observed. However, the difference between test measurements was small, and mammography’s moderate heterogeneity, wider LOA, and tendency to “miss” smaller tumors (≤2.0 cm) indicate greater variability for agreement with pathology. Consequently, combining MRI and mammography did not improve tumor measurement compared with MRI alone. In addition, a tendency for large mammographic measurements in the presence of pCR suggests that mammography may lead to overly radical surgery when pCR is achieved. Mammographic tumor measurements were frequently not possible due to breast density, reflected in the younger age of those women [
49]. These findings therefore suggest that MRI would be the preferred test in this setting.
Direct comparison of MRI and clinical examination showed no systematic bias in MRI’s measurement of residual tumor; relatively large underestimation, moderate heterogeneity and wider LOA for clinical examination were observed, suggesting greater variability for agreement with pathology. In addition, apart from one case, tumors with pathologic measurements of >2.0 cm were “missed” only by clinical examination, highlighting the potential for inadequate resection if surgical planning was based on clinical examination alone. While better overall agreement between MRI and pathology suggest that MRI is the more appropriate assessment method, it is possible that a combination of US and clinical examination may be superior to either test individually [
50], but that testing strategy could not be explored in this analysis. The relative performance of test combinations should be considered in future studies.
Data from single studies have suggested that underestimation by MRI is common in HER2- patients [
16] or those treated with taxane-containing regimens [
17], but previous study-level meta-analyses were unable to further explore the effect of these variables. Similar effects were not observed in our IPD analysis. For patients with data available on HER2 status, MRI performed comparably regardless of tumor biology. Although that analysis was based on relatively few studies, the combined sample size is substantially larger than the previous study exploring the effect of this variable, and the studies that did not contribute data predate the routine testing of HER2. Furthermore, contrary to previous reports, a slight bias towards underestimation (and poorer overall agreement with pathology) was found in patients treated with non-taxane-based NAC. However, although more detailed analyses were attempted, statistical models were unstable and therefore the results presented are primarily descriptive. Further exploration of the effect of these characteristics on measurement accuracy is warranted in large primary studies, controlling for the effect other potentially important covariates.
Given that not all eligible studies contributed IPD to this meta-analysis, selection bias may have been introduced. Although studies in this analysis were similar in most respects to the broader population of eligible studies [
14], a higher proportion of T3 tumors and stage III disease was apparent. Other differences suggest that included studies are more applicable to current practice (i.e. NAC with taxanes was more common), and less susceptible to changes in tumor dimensions between MRI and pathologic measurement (i.e. shorter interval between tests). Our IPD analysis also included a larger number of studies than the only previous (study-level) meta-analysis utilising appropriate statistical techniques to address this clinical question [
14] (see Additional file
1: Appendix 1).
Although MDs and LOA are the most methodologically appropriate measures of agreement between MRI and pathology [
15], there was no clear indication to consider either absolute or relative differences between the tests in our analysis. Plots of the data suggest that the absolute MDs reported here are likely to be most applicable to mid-sized tumors, but may differ for small or large residual cancers. However, analyses of absolute and relative differences were comparable, and therefore inferences about MRI and its performance compared to alternative tests are likely to be robust.
Due to pCR being achieved in a minority of patients (between 7.1 % and 27.5 % in the included studies), analyses of measurement errors in the presence of pCR are based on relatively small sample sizes and should therefore be interpreted cautiously. Furthermore, to standardise the definition of pCR across studies, this analysis considered the presence of invasive cancer only. This represents an advance in methods over previous analyses by reducing the potential for heterogeneity and improving the clinical applicability of pooled estimates. However, tests may differ in their ability to visualise DCIS or calcifications [
11], and hence the accuracy of MRI and alternative tests to measure those outcomes may differ from our estimates. Our findings that alternative tests could not evaluate residual tumor in a proportion of patients should also be interpreted with awareness that corresponding data about non-evaluable tumors by MRI were unavailable.
Conclusion
Our meta-analysis is the largest and most statistically appropriate evaluation of the agreement between MRI and pathologic residual tumor size post-NAC, and the only meta-analysis on this topic using IPD methodology. Our work suggests that there is no systematic bias in MRI’s measurement of residual invasive tumor, but that both over- and underestimation by MRI is possible, with LOA large enough to be of clinical importance. MRI’s performance was generally superior to that of US, mammography, and clinical examination, and in light of those findings, MRI may be considered the most appropriate test in this setting. However, large MRI measurements are possible in a small proportion of pCR cases, and patient characteristics that render tumors non-evaluable by US may contribute to inaccurate size measurements by MRI; those potential disadvantages should be considered in the choice of test. Furthermore, it is possible that a combination of US and clinical examination may be superior to those tests individually, and such a testing strategy has potential advantages over MRI in terms of lower cost and greater accessibility. Combinations of alternative tests, and their performance relative to MRI, should be explored in future studies.
Competing interests
SCP receives research funding from Philips Healthcare. The other authors declare no competing interests.
Authors’ contributions
MLM conceived and co-ordinated the study, conducted the literature searches and review of studies, performed the statistical analysis, and drafted the manuscript. PM conceived the statistical methods used, advised on data analysis and interpretation, and contributed to drafting the manuscript. LI advised on methodological aspects, data interpretation and contributed to drafting the manuscript. FS advised on MRI technical issues and clinical aspects, and contributed to drafting the manuscript. EPM advised on clinical aspects and contributed to drafting the manuscript. GvM advised on clinical aspects and contributed to drafting the manuscript. VG collected and assembled data and contributed to drafting the manuscript. SCP collected and assembled data and contributed to drafting the manuscript. FCW collected and assembled data and contributed to drafting the manuscript. JHC collected and assembled data and contributed to drafting the manuscript. MB collected and assembled data and contributed to drafting the manuscript. LM collected and assembled data and contributed to drafting the manuscript. EY collected and assembled data and contributed to drafting the manuscript. VL collected and assembled data and contributed to drafting the manuscript. NH conceived the study, advised on literature searches and study eligibility, advised on clinical aspects and data interpretation, and contributed to drafting the manuscript. All authors read and approved the final manuscript.