Discussion
CMTC was developed to provide a comprehensive test to guide personalized breast cancer treatments. DNA microarray was used as it was the first molecular technology capable of surveying the entire genome reproducibly. While newer technologies exist, such as next-generation sequencing (NGS), it is undeniable that DNA microarray technology is much more mature, low cost and simpler in terms of data storage and analysis, when compared to NGS. Furthermore, there is much more DNA microarray data in the public databases available for independent validation. With a genomic approach, CMTC can provide much more information than other commercially available gene signatures, such as Mammaprint™ [
10], that only examine subsets of genes. Our recent study showed that CMTC correlated with many clinical and biological variables known to have prognostic significance in breast cancers. Although the prognostic significance of CMTC was only demonstrated in the first external validation cohort (n = 2,239), the relapse-free survivals in the 149 training cases were not statistically significant due to a low event rate with a median follow up of 31 months [
4]. In this study, we prospectively followed the training cohort for an additional two years. Figure
1A and B showed that by having a longer follow-up in the training cohort (median follow up of 55 months), CMTC reached a statistically significant difference in relapse-free survival among the CMTC groups. As in the external cohort in the first study [
4], the patients in CMTC-1 had a better relapse-free survival than the patients in CMTC-2 or CMTC-3. To further validate the prognostic significance of CMTC, we used another set of breast cancer patients as an independent internal cohort and a new external cohort in this study. We observed a comparable prognostic significance in the new internal cohort with 284 breast cancers (Figure
1C) and the new external cohort with 2,181 breast cancers (Figure
1E). The prognostic significance of CMTC can also be reproduced in 431 overall internal breast cancers (Figure
1D), and in 4,420 overall external breast cancers (Figure
1F), as well in 4,851 of all available breast cancers by combining all internal and external cohorts (Figure S2D in Additional file
2). Thus, the prognostic significance of CMTC can be reproduced in different independent microarray gene expression datasets.
The gene expression pattern of the CMTC profile can also be used to correlate with independently developed prognostic gene signatures and oncogenic pathway activities as in our previous study [
4]. All the gene signatures predicted a poor prognosis in either CMTC-3 alone, or in both CMCT-2 and CMTC-3, but rarely in CMTC-1 (Figure
2B). In both the internal training cohort and the validation cohort (Figure S1A and S1B in Additional file
2), pairwise correlation analyses showed CMTC-3 centroid values were most closely correlated to the scores of prognostic gene signatures 70GS [
10], P53GS [
11] and SDPP [
12]. Cox proportional analysis of the new internal (n = 284) and external (n = 2,181) breast cancer cohorts (n = 431) (Table S2 in Additional file
2), yielded the highest HR between CMTC-1 and CMTC-3 when we analyzed CMTC and all other prognostic gene signatures. This was the first time that we have demonstrated that CMTC was the best predictor of relapse-free survival among all other prognostic gene signatures using prospective data. To ensure that the prognostic significance of CMTC was not confounded by the expression levels of genes that were part of other prognostic gene signatures, we removed all 501 overlapping genes with 14 known prognostic gene signatures (including two non-microarray-based gene signatures) and two gene signatures for molecular subtypes [
4] and hence, none of the 803 genes in the CMTC signature can be found in these gene signatures. In this study, once again, we have confirmed that CMTC was reproducible and it was an independent prognostic factor from those known gene signatures using a number of independent breast cancer datasets. We also showed that the CMTC-3 group had the highest activities in the oncogenic pathways Myc, E2F1, Ras and β-catenin with higher activities in HER2, TNF and IFN pathways (Figure
2C). The scores of these pathway activities are most closely correlated with CMTC-3 centroid values in both internal cohorts (Figure S1C and 1D in Additional file
2). Conversely, the scores of oncogenic pathway activity of ER, PR and wild-type p53 were the lowest in CMTC-3 group. These results suggest that the gene expression pattern of CMTC-3 with the worst clinical outcome can be linked to specific networks of oncogenic pathways which may help us to understand the molecular derangements in these cancers.
Using an unsupervised hierarchical clustering analysis of genome-wide expression microarray data, the classical molecular classification divided breast cancers into five intrinsic subtypes: normal-like, luminal A, luminal B, HER2+ and basal-like subtypes [
23]. The PAM50 classification was later developed using a training set of breast cancers with known molecular intrinsic subtypes (supervised) to select 10 genes from each of the 5 intrinsic subtypes so that quantitative reverse transcriptase polymerase chain reactions of the 50 genes could be used to reproduce the molecular classification [
24]. Recently, PAM50 has been commercialized using the NanoString platform under the trade name Progsigna (NanoString Technologies Inc, Seattle, WA, USA). For comparison purposes, PAM50 classifications in this study were done using the available genome-wide microarray data rather than the NanoString platform. Although likely representative, we understand that this is one of the limitations of our study as comparison across different molecular technology platforms (NanoString and Illumina) can be problematic. Interestingly, we observed that the PAM50 subtype appeared to be more comparable to CMTC both in the internal training cohort [
4] and in the internal validation cohort than the classical molecular classification (Figure
1A): most normal-like and luminal A subtypes were found in the CMTC-1 group; most luminal B were found in the CMTC-2 group; and HER2 and basal-like were found in the CMTC-3 group. Because the normal-like subtype has been regarded as normal ‘contamination’ in the tumor specimen, it has been removed from PAM50-based classification [
24,
25]. In CMTC, HER2+ and TN (similar to basal-like subtype) were grouped together in CMTC-3, and most normal-like and luminal A subtypes were grouped together in CMTC-1. Here, we report that the CMTC can predict clinical outcome better than the classical molecular subtype classification in the 284 internal and 2,181 external breast cancers (Table S2 in Additional file
2) and in the 4,851 overall breast cancers (Figure S2C and S2D in Additional file
2).
CMTC was shown to have a close association with clinical receptor status and a number of clinicopathological variables in the previous study [
4]. In this study, the association has been reproduced in the internal validation cohort and the new external validation cohort (Table
1), as well in the 4,851 overall breast cancers (Figure
3). CMTC-1 breast cancers were generally ER+, smaller tumor size, low grade and node negative; CMTC-2 breast cancers were ER+, but larger tumor size, higher grade and more nodal disease; and CMTC-3 breast cancers are more commonly found in younger patients, HER2+/TN, larger tumor size, and more nodal disease.
Comparing with the receptor status of ER, PR, HER2, TN and HER2+/TN, CMTC was proven to be the best prognostic indicator in the 284 internal and 2,181 external patients (Table S2 in Additional file
2). The superiority of CMTC prognostic prediction can be further demonstrated in the analysis using the 4,851 overall patients in the multivariate models (Figure S2 in Additional file
2). The prognostic significance of CMTC remained very strong even if we divided the breast cancers into ER- or ER+ tumors, and non-HER2+/TN or HER2+/TN tumors (Figure S3 in Additional file
2).
Using 3,936 patients with complete clinicopathological data, factors such as younger patient age, larger tumor size, positive node disease and higher tumor grade were associated with an increased risk of poor clinical outcome in Kaplan-Meier analyses (Figure S4 in Additional file
2) and in multivariate analyses using Cox proportional hazards (Table
2). However, CMTC-2 and CMCT-3 groups were found to have the worst outcome when compared to all these clinical and pathological factors. CMTC remained the strongest prognostic predictor even when we controlled for age, and tumor size, tumor grade and nodal status (Figure S5 in Additional file
2). These results clearly show that the prognostic significance of CMTC is independent of these clinical and pathological factors.
A number of gene signatures for breast cancer are commercially available to predict prognosis in specific patient populations using central laboratory facilities [
2,
3]. In this study, we were able to reproduce the CMTC using three different major commercial microarray platforms and independent external datasets (Figure S6 in Additional file
2). We chose to use a genome-wide microarray platform because we believe that this approach can provide breast cancer patients more comprehensive information on prognosis, treatment prediction and pathway patterns for personalized medicine than other commercial gene signatures. Once the genome-wide gene expression data is collected, it can be used for personalized medicine and it can also be anonymized and deposited into public databases to help us understand the disease better.
In the previous study, we proposed that CMTC could also be used as a platform to personalize treatments: CMTC-1 breast cancers in general can be treated with surgery and endocrine therapy alone; CMTC-2 breast cancers may require additional treatments such as chemotherapy; and neo-adjuvant chemotherapy should be considered for CMTC-3 tumors [
4]. Our future study will aim to verify the prediction of treatment outcomes using CMTC classification.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
DYW and WLL designed the project and analyzed the data. DRM and WLL contributed to the collection of clinical material. SJD was the associated pathologist. The manuscript was prepared by DYW and WLL, and proofread by SJD and DRM. All authors read and approved the final manuscript.