Introduction

Neoadjuvant chemotherapy (NAC) allows complete surgical resection by downstaging tumors and is now a standard treatment strategy in patients with locally advanced breast cancer (LABC). A pathologic complete response (pCR) after NAC indicates a better prognosis for patients with LABC1. There is much evidence that uptake on 18F-fluorodeoxyglucose (FDG) positron emission tomography (PET)/computed tomography (CT) scans correlates with histopathologic markers, response to treatment, and prognosis in breast cancer2,3,4. The usefulness of decrease in PET parameters between pretreatment and interim 18F-FDG PET/CT in early prediction of a pCR has been reported in LABC treated by NAC5, 6. However, pCR can only be predicted after the start of NAC and additional radiation exposure is involved.

Texture analysis, developed for image pattern recognition, has been identified as a tool for “radiomics” on medical images in recent years7. Radiomics contains all mineable data from medical images for clinical decision support. There have been reports that certain texture features (TFs) indicating metabolic intratumoral heterogeneity (ITH) are good prognostic markers of the likely response of the tumor to treatment and of patient survival8,9,10. Recently, success has been achieved in decoding tumor phenotypes by combining hundreds of features on CT images11, 12.

We hypothesized that tumors with distinctive metabolic radiomics patterns may have certain clinical characteristics. To test this theory, unsupervised clustering on pretreatment 18F-FDG PET/CT scans was applied to LABC tumors as part of an integrated approach to metabolic radiomics. We then investigated the relationship between tumor clusters (TCs), histopathologic characteristics, tumor response to NAC, and risk of recurrence.

Results

Patient Demographics

Seventy-three patients with LABC who satisfied inclusion and exclusion criteria were included in this retrospective study (Fig. 1). The patient demographic characteristics are summarized in Table 1. The median patient age was 48 (24–76) years. Most cases (94.5%) were invasive ductal carcinoma. The clinical stage was II in 38 cases and III in 35 cases. The clinical subgroups were hormone receptor (HR)-positive/human epidermal growth factor receptor 2 (HER2)-negative in 25 cases, HER2-positive in 18 cases, and triple-negative in 25 cases. Five cases with moderate HER2 staining were unclassified due to missed fluorescence in situ hybridization results. Seventeen cases (23.2%) achieved a pCR. Recurrences were observed in 4 cases (6.1%), and median DFS was 25 (16–57) months.

Figure 1
figure 1

Inclusion and exclusion criteria for the study population. Seventy-three patients were involved in radiomics analysis. Sixty-six patients were involved in survival analysis. Abbreviations: NAC, neoadjuvant chemotherapy; TCs, tumor clusters.

Table 1 Patient Demographic Characteristics.

Correlations between Texture Features

Correlations between TFs were expressed in an intuitive way, with the correlogram considering multiple comparison correction (Fig. 2) or not (Supplementary Fig. S1). Maximum of standardized uptake value (SUVmax) and metabolic tumor volume (MTV) were correlated with 68 TFs (62.3%) and 45 TFs (41.3%), respectively. Only 17 TFs (15.6%) were correlated with neither SUVmax nor MTV. When disregarding multiple comparison correction (P < 0.05), only five TFs (4.6%) were correlated with neither SUVmax nor MTV.

Figure 2
figure 2

Correlogram after multiple comparison correction (P < 0.0005). Correlogram shows close association of TFs to each other. Correlation coefficients are expressed by color scale from red to blue. Representative TFs are marked on the correlogram. Notes: c2, NL_HomogeneityGLCM; c14, ZPGLSZM; c16, EntropyTFCCM; c24, MTV; c45, HILZEGLSZM; c50, Skewness; c64, SUVmax; c66, NL_DissimilarityGLCM; c91, TLG; c104, CV; c105, NL_EntropyGLCM. Abbreviations: TFs, texture features; NL, normalized; GLCM, gray level co-occurrence matrix; ZP, zone percentage; GLSZM, gray level size zone matrix; TFCCM, texture feature coding co-occurrence matrix; HILZE, high-intensity large-zone emphasis.

Unsupervised Tumor Clustering using Radiomics Pattern

We assessed TCs by metabolic radiomics patterns via unsupervised clustering. Unsupervised clustering resulted in 3 TCs (Fig. 3) and a separate case not included in any TCs (Supplementary Fig. S2). There were 10, 25, and 37 cases of TC I, II, and III, respectively. We assessed the metabolic characteristics of the unsupervised TCs by several representative TFs as follows: SUVmax, MTV, total lesion glycolysis (TLG), coefficient of variation (CV), normalized entropy measured from gray level co-occurrence matrix (NL_EntropyGLCM), normalized homogeneity from gray level co-occurrence matrix (NL_HomogeneityGLCM), zone percentage from gray level size zone matrix (ZPGLSZM), and skewness (Table 2). The TFs were significantly different between the 3 TCs (all P ≤ 0.001). TC 1 had a large tumor size (MTV; 62.9 [25.2–158.8]), high SUVmax (13.2 [5.0–22.1]), and high ITH as high CV (0.42 [0.19–0.66]) and high NL_EntropyGLCM (5.4 [3.4–5.9]). Like TC I, although TC II had a medium tumor size (MTV; 18.4 [4.2–43.3]), it had a high SUVmax (13.3 [9.2–24.6]), and high ITH (CV; 0.45 [0.35–0.60], and NL_EntropyGLCM; 5.8 [5.1–7.0]). TC III had a small tumor size (MTV; 2.9 [0.8–11.9]), low SUVmax (5.8 [3.7–9.7]), and low ITH (CV; 0.21 [0.10–0.38]), and NL_EntropyGLCM (3.7 [2.5–5.0]). Representative cases of TC I-III groups can be found as Supplementary Fig. S3.

Figure 3
figure 3

Unsupervised radiomics heat map with 109 texture features. Three individual tumor clusters (TCs) with distinctive metabolic radiomics patterns were identified after unsupervised clustering. Notes: Row, cases; column, texture features or clinical information; green circle, positive or high expression; red circle, negative or low expression; h2, ZPGLSZM; h10, NL_HomogeneityGLCM; h26, NL_DissimilarityGLCM; h29, SUVmax; h61, TLG; h70, CV; h71, NL_EntropyGLCM; h91, HILZEGLSZM; h98, MTV. Abbreviations: TNBC, triple negative breast cancer; HR, hormone receptor; ER, estrogen receptor; PgR, progesterone receptor; human epidermal growth receptor 2, HER2; ZP, zone percentage; GLSZM, gray level size zone matrix; NL, normalized; GLCM, gray level co-occurrence matrix; SUVmax, maximum of standardized uptake value; HILZE, high-intensity large-zone emphasis; MTV, metabolic tumor volume.

Table 2 Characteristics of the Tumor Clusters.

Histopathologic Characteristics of the Tumor Clusters

To characterize the unsupervised TCs, we investigated the expression of histopathologic factors, i.e., estrogen receptor (ER), progesterone receptor (PgR), HER2, and Ki67 (Table 3). The only significant difference after multiple comparison correction was Ki67 expression (P = 0.006). The ER and PgR were expressed at relatively higher levels in TC III than in TC I and TC II, but the difference was only statistically significant before multiple comparison correction (all P = 0.018). There was no statistically significant difference in HER2 expression between the unsupervised TCs (P = 0.688).

Table 3 Relationship between the Tumor Clusters and Histopathological Characteristics.

Predictors of a Pathologic Complete Response

The pCR rates were significantly different between the TC I, II, and III groups (20.0%, 48.0%, and 5.4%, respectively; P < 0.001). Univariate analysis revealed that the TC II (odds ratio [OR] 9.923, P < 0.001), ER-negative (OR 10.764, P < 0.001), PgR-negative (OR 11.148, P = 0.003), and high Ki67 (OR 11.587, P < 0.001) groups were significantly associated with achievement of a pCR. The TC II (OR 12.984, P = 0.003) and ER-negative (OR 12.607, P = 0.046) groups were still significant in multivariate analysis. High Ki67 expression (OR 11.051, P = 0.051) had intermediate significance. Table 4 summarizes the results of univariate and multivariate analysis.

Table 4 Logistic Regression Analysis for Pathological Complete Remission.

Recurrence Risk for the Tumor Clusters

Three of the 4 recurrences (30.0%) were in the TC I group (n = 10); the other one (2.9%) was in the TC III group (n = 35). No recurrences occurred in the TC II group (n = 21) during follow-up. All recurrences were non-pCRs. Mean DFS was 35.3 months (95% confidence interval [CI] 29.5–41.1) in the TC I group, 55.0 months (95% CI 55.0–55.0) in the TC II group, and 39.2 months (95% CI 37.7–40.7) in the TC III group. There was a statistically significant difference in DFS between the unsupervised TC I, II, and III groups (P = 0.001, Supplementary Fig. S4a). When we compared DFS of binary groups as the TC I to the others (mean DFS 54.6 months [95% CI 50.0–59.1]), the TC I group had a worse prognosis (P < 0.001, Supplementary Fig. S4b).

The TC I (hazard ratio 19.755, P = 0.010) was identified as a prognostic factor for recurrence in univariate Cox regression analysis (Fig. 4a). Despite discriminating trends, staging (Fig. 4b) and a pCR (Fig. 4c) were not found to be significant prognostic factors in univariate Cox regression analysis because of the lack of recurrences in the stage II and pCR groups during the relatively short follow-up period (all P > 0.05). Further, the histopathologic parameters of ER, PgR, HER2, and Ki67 were not found to be prognostic factors in univariate Cox regression analysis (all P > 0.05, Supplementary Fig. S5). Multivariate Cox regression analysis with TC I and the established parameters of stage III and non-pCR showed that TC I (hazard ratio 10.246, P = 0.045) was an independent prognostic factor regardless of stage or achievement of a pCR.

Figure 4
figure 4

Cox regression analysis with DFS. The TC I has a hazard ratio of 19.755 (P = 0.010) for recurrence (a). Stage III had a trend of poor prognosis compared with stage II; however, this was not statistically significant (P > 0.05) (b). A pCR had a trend of a favorable prognosis compared with non-pCRs but this was not statistically significant (P > 0.05) (c). Abbreviations: DFS, disease free survival; TCs, tumor clusters; AJCC, American Joint Committee on Cancer; NAC, neoadjuvant chemotherapy; pCR, pathologic complete response.

Discussion

We evaluated metabolic radiomics patterns in tumors and their clinical usefulness in patients with LABC. In this study, breast tumors were clustered into 3 TCs in an unsupervised manner according to their metabolic radiomics patterns. TC II, which had a moderate MTV, high SUVmax, and high ITH, was revealed as an independent predictor of achievement of a pCR. In the survival analysis, TC I, which had a high MTV, high SUVmax, and high ITH, was identified as an independent risk factor for recurrence when compared to the established parameters of high stage (III) and non-pCR.

A cancerous tumor is composed of a heterogeneous cell population rather than a homogeneous one, with distinct molecular and phenotypic characteristics13. Biological ITH is suspected to be the main reason for resistance to treatment14. Image-based assessment of metabolic ITH is based on the hypothesis that it may be a projection of underlying tumor biology, including glucose metabolism, necrosis, oxygenation, vascularization and angiogenesis15. With the heightened interest in measurement of metabolic ITH by texture analysis, a number of clinical studies have reported that TFs from PET images have more prognostic ability than conventional SUV parameters in various cancers8,9,10, 16, 17. However, investigators cannot interpret TFs in an intuitive way, because TFs merely offer a mathematical explanation of images that can be interpreted as not only heterogeneous, but also smooth, coarse, rough, or grainy18. Further, it has not been easy to reach a consensus regarding the parameter that best represents ITH. Therefore, an integrated radiomics approach that departs from the traditional approach is needed.

Previous PET studies in breast cancer cohorts have yielded conflicting results regarding the relationship between TFs and the histopathologic parameters of ER, PgR, and HER219, 20. A recent study has reported that a TF of High-Gray-level Run Emphasis (HGRE) was significantly higher in groups of ER-negativity and PR-negativity regardless to SUVmax 19. However, given relatively small sample size (n = 54) and multiple comparison problem, the relationship of HGRE and hormonal receptor expression looks uncertain. Another recent study with a larger cohort (n = 171) has reported that there were not only limited relationship of TFs with hormone receptor expression, but also no additive effect of TFs discriminating breast cancer subtypes compared to SUVmax 20. While the previous two studies used same resampling methods of equally divided SUV ranges of tumor by 64 bins (variable bin width of SUV) and analyzed relationship of individual TFs with hormonal receptor expression, our study used a different resampling method maintaining a constant intensity resolution (fixed bin width of SUV 0.4) and integrated radiomics approach for analysis. Nonetheless, our study suggested concordant results to the latter previous study that there were no sufficient evidences that TFs are associated with hormone receptor expression. Meanwhile, our data provided concordant results to previous studies that HER2 status is not associated with TFs19, 20. On the other hand, Ki67, a proliferative marker, was significantly associated with unsupervised TCs in our results, which makes sense because there has been an observation of high dependency of TFs on MTV21, 22. In this study, our congruent results also suggest that a number of TFs were significantly correlated with MTV and/or SUVmax, meaning that each TF should be interpreted comprehensively with consideration of MTV and/or SUVmax.

Our data suggest that integrated metabolic radiomics has considerable potential for personalized management in LABC. For example, unsupervised TCs from metabolic radiomics can help to identify patients at higher risk for recurrence in addition to the established prognostic factors of stage and achievement of a pCR. Patients with tumors clustered as TC I might also be at high risk of recurrence. Physicians may actively consider NAC in TC II cases because of the good chance of a pCR, whereas TC III cases are less likely to achieve a pCR so are less likely to benefit NAC before surgery. In summary, use of metabolic radiomics may help in the appropriate management of individual patients and avoid the side effects of unnecessary systemic chemotherapy.

The rapid development of applications for omics data means that personalized medicine is now one step closer to becoming a reality23. Genomic profiling of tumors from tissue samples is being used increasingly to tailor the management strategy at the level of the individual patient. Radiomics is expected to have a role complementary to that of genomic profiling, because it has an advantage of being able to provide a non-invasive comprehensive tumor assessment that overcomes sampling error and the invasiveness of repeated biopsies7. Radiomics could be used as a cross-validation tool and provide information over and above that obtained from genomic profiling24.

SUV resampling is one of remaining issues in texture analysis, which is apparently an important methodological factor affecting the results of texture analysis. There has been two ways generally used to resample images. The most widely used method is using a fixed number of bins to divide the tumor SUV range, which results in varying intensity resolution to each case6, 8, 25. The other method is using a fixed bin width, which provides a constant intensity resolution to all analyzed cases26. In this study, we adopted a fixed bin width of SUV 0.4 in range SUV 0–25. A recent study has reported that a constant intensity resolution is more meaningful for inter- and intra-patient comparison of TFs27. Much validation is needed to evaluate the comparison of both two methods. Regardless, our integrated radiomics analysis method is expected to be able to use combination of both TFs obtained by the two fully different resampling methods for comparison of tumor textures on images.

There are several limitations to this study. First, the sample size for texture analysis was moderate at less than 80 cases28. Given that clustering is not an inferential technique, an adequate sample size for clustering is important. To avoid finding patterns in noise, we included biological validation by prediction of the likelihood of a pCR and evaluated the risk of recurrence. Not only were the results of clustering reasonably explained by several meaningful parameters (SUVmax, MTV, and ITH-related TFs), but the biological validation suggests a clinical rationale for clustering. A multicenter trial containing much larger study cohorts is now needed to validate our results. Second, we used unsupervised clustering for this radiomics study. Supervised learning can optimize prediction of certain outcomes like histopathologic markers, response to NAC, and prognosis. Third, our results should be interpreted carefully because of the exclusion of the least metabolically active tumors. However, it should be borne in mind that delineation of tumors with little metabolic activity is usually difficult because of surrounding physiologic uptake in the breast parenchyma. In addition, we used a fixed cutoff method of SUV 2.5 can cause inaccurate tumor segmentation especially in high ITH cases. Although there was one inaccurately tumor-segmented case (Supplementary Fig. S3) with exceptional high ITH, surprisingly the case was automatically excluded during unsupervised clustering because of much different image-texture of it. In this regard, unsupervised clustering is helpful to find out extraordinary cases caused by tumor delineation error. Meanwhile, the tumor segmentation results of other cases even with high ITH were visually acceptable (Supplementary Fig. S6).

Conclusion

LABC clustered by metabolic radiomics patterns have distinctive characteristics with regard to Ki67 expression, response to NAC, and risk of recurrence. The results of this study suggest that an integrated radiomics approach on 18F-FDG PET/CT has potential for personalized management for LABC.

Methods

Subjects

This retrospective study was approved by the Institutional Review Board at our institution. The need for written informed consent was waived. Inclusion criteria were female sex, Korean ethnicity, pretreatment 18F-FDG PET/CT scanning performed at the same institution before NAC for LABC from July 2009 to December 2013, and completion of NAC comprising 4 cycles of cyclophosphamide and doxorubicin or 6 cycles of adriamycin and docetaxel. One hundred cases fulfilled these criteria. Exclusion criteria were: multifocal or multicentric breast cancer (n = 7); inflammatory breast cancer (n = 4); and occult breast cancer or a tumor with so little metabolic activity that it could not be delineated with a SUV cut-off of 2.5 (n = 15). One further patient was excluded because delineation of her cancer was not possible as the primary tumor was abutting the metastatic axillary nodes too closely. Finally, 73 patients with stage LABC IIA–IIIC were enrolled for metabolic radiomics analysis. Six patients who were censored or had a recurrence before 6 months of disease-free survival (DFS) were excluded from survival analysis. One patient who did not fit into any TC was also excluded from survival analysis (Fig. 1). Immunohistochemical (IHC) parameters, including Ki67, ER, PgR, and HER2 were assessed. Ki67 >30% staining on IHC was regarded as high expression. ER or PR positivity was defined as >10% staining on IHC. HER2 positivity was defined as either strong (3+) HER2 staining on IHC or HER2 amplification identified by fluorescence in situ hybridization with moderate (2+) HER2 staining on IHC. The breast cancer subgroups were classified as: ER-positive or PgR-positive/HER2-negative; HER2-positive; or triple-negative. Staging was defined according to the American Joint Committee on Cancer (AJCC) system. A pCR was defined as no residual invasive cancer (AJCC ypT0/Tis ypN0). The patients were followed up for recurrence until December 2015.

Image Acquisition and Reconstruction

Patients underwent 18F-FDG PET/CT on a Discovery VCT scanner (GE Medical Systems, Milwaukee, WI, USA). The blood sugar level was <120 mg/dL after at least 6 hours of fasting, and 5.18 MBq/kg of 18F-FDG were administered intravenously in each patient 1 hour before PET/CT scanning. CT scanning was performed at 120 kVp for attenuation correction and to obtain anatomic information. PET scans were obtained from the skull base to the upper thigh level with a 128 × 128 matrix size. The voxel size was 3.91 × 3.91 × 3.27 mm3. Images were reconstructed with an ordered subset expectation maximization iterative algorithm (2 iterations and 8 subsets).

Image Texture Analysis

CGITA ver.1.4 (Chang-Gung Memorial Hospital, Taiwan) based on MATLAB v.2014a (MathWorks Inc., Natick, MA, USA) was used for analysis of the three-dimensional textures on PET images29. Primary tumors were delineated by a fixed SUV cut-off of 2.530. Next, for calculation of TFs, the gray level was resampled by a fixed bin width method with 0.4 SUV units which was calculated from 64 grey levels of 0 to 25, to minimize the error due to variation of contrast and to improve reproducibility6, 8, 25, 27. Of all the methods available to compute TFs, we chose a statistics-based methodology based on the spatial distribution of gray levels31. Multiple matrixes were used as follows: a gray level co-occurrence matrix32, gray level run-length matrix33, gray level neighborhood intensity-difference matrix34, gray level size zone matrix35, SUV statistics, texture spectrum36, texture feature coding37, texture feature coding co-occurrence matrix37, and neighboring gray level dependence38. Finally, 109 TFs were calculated from the matrices. The matrix parameters were expressed next to the parameter name to avoid mimicking28. Detail on these parameters are provided in Supplementary Table S1 and in a previous report29. We chose NL_EntropyGLCM and CV, which are generally used for measurement of ITH, to classify the extent of metabolic ITH17, 21. MTV (cm3) was defined as the volume of the tumor delineated with an SUV cut-off of 2.5. TLG (g·cm3/mL) was defined as the mean SUV (SUVmean) multiplied by the MTV of the delineated tumor. The CV was defined as the standard deviation of SUVs divided by the SUVmean in a delineated tumor.

Statistical Analysis

We used MedCalc version 14.8.1 (MedCalc Software bvba, Mariakerke, Belgium) for the statistical analysis. R version 3.2.3 (The R Foundation for Statistical Computing, Vienna, Austria) was used to construct the correlograms and heat maps. The correlations among all the 109 TFs were evaluated by Pearson correlation analysis and displayed by correlogram with hierarchical clustering. Radiomics heat maps in red to green coloring were constructed for the TF cases normalized Z-score using the Euclidean method and hierarchical clustering. Kruskal-Wallis test was used to compare representative TF values among the unsupervised TCs. The proportions of ER-positive, PgR-positive, and HER2-positive tumors, tumors with high Ki67 expression, and pCRs were compared between the unsupervised TCs using the chi-square test. The parameters included in univariate logistic regression analysis for prediction of a pCR were age, clinical AJCC stage, T stage, nodal metastasis, ER, PR, HER2, and Ki67 status, and TC. The established parameters of nodal metastasis, ER status, and HER2 status39, along with the parameters that were statistically significant in the univariate analysis were included in the multivariate analysis. PgR was excluded from the multivariate analysis because of its strong association with ER status. Kaplan-Meier survival analysis and the log rank test were used to compare DFS between the unsupervised TCs. A univariate Cox regression survival analysis with binary TC parameters (TC I compared with the other TC groups), including age, clinical stage, pCR, and ER, PgR, HER2, and Ki67 status, was applied to identify predictors of recurrence. Multivariate Cox regression analysis was conducted for binary TCs, disease stage, and pCRs. A P-value less than 0.05 (two tailed) was considered to be statistically significant. Bonferroni’s correction was applied for multiple comparison correction. Continuous values are expressed as the median and range.