Introduction
Though only 7 % of all invasive breast cancers are diagnosed in women <40 years old [
1], breast cancer represents the most frequent non-skin cancer (30−40 %) among younger women [
2]. Breast cancers in younger women tend to be associated with poorer survival [
2‐
4], and are more often diagnosed at a later stage of the disease [
5,
6]. Several retrospective cohort studies have examined differences in clinical biomarkers in premenopausal (preM) and in postmenopausal (postM) tumors. Age has been shown to be an independent risk factor even after correction for stage, treatment, and tumor characteristics [
2], and younger women are more likely to develop tumors with less favorable prognostic characteristics [
6]. Also, young women with breast cancer are reported to have less favorable histopathological and survival outcomes as compared to elderly women [
7].
Genomic and molecular alterations play a significant role in breast cancer biology. The well-known 50-gene subtype predictor, PAM-50, was developed using microarray data to provide prognostic and predictive information [
8]. However, studies that address the unique molecular changes in preM and postM by multiple
omic approaches are limited. The most notable study compared DNA copy number and messenger RNA (mRNA) gene expression data in preM and postM breast cancer and concluded that transcriptomic changes, more than genotypic variation, account for age-associated differences in sporadic breast cancer incidence and prognosis [
9]. Anders et al. analyzed microarray data from 784 early-stage breast cancers to discover gene sets able to distinguish breast tumors arising in younger women from tumors of older women [
10]. A number of genes were identified with different expression between breast cancers in younger and older women. A subsequent update reported that after adjusting for clinical variables, there were no gene expression differences between the previously defined age groups [
11].
To our knowledge, the effect of aging on molecular changes in breast cancer has never been comprehensively examined in multiple omic datasets (gene expression, methylation, somatic mutation, copy number variation (CNV) data, etc.). Recently, the establishment of large databases, which comprehensively characterize large numbers of breast cancers, including The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), has provided the opportunity to analyze preM breast cancer and shed light on the possibility of personalized treatment. We show here that estrogen receptor-positive (ER+) preM breast cancer is molecularly distinct from ER+ postM breast cancer, including changes in gene expression, methylation, copy number, and somatic mutation patterns. We observed activation of druggable pathways in preM tumors, which might represent unique targets in the treatment of preM breast cancer.
Discussion
In this study we used TCGA and METABRIC to identify unique genetic and transcriptomic changes in preM breast cancer compared to postM breast cancer. Differences in gene expression between preM and postM breast cancer were found exclusively in ER+ breast cancer. Integration of multi-omic data analysis identified enrichment of integrin and laminin signaling pathways in preM breast cancer, and TGFβ was identified as the top upstream regulator in both TCGA and METABRIC. In addition, EGFR signaling was activated in preM breast tumors. Semi-supervised clustering using gene expression data from preM ER+ tumors identified three distinct groups of patients with significantly different outcomes.
Using TCGA we only identified significant gene expression differences in ER+ preM and postM breast cancer. None were found in ER− disease. METABRIC also showed only a very minor difference in ER− preM and postM despite containing a much greater number of samples. These findings suggest that the majority of differences between preM and postM breast tumors are driven by altered hormone levels, and thus areonly observed in ER+ disease. Intriguingly, comparing ER+ preM and postM breast cancer, the most altered gene was ESR1 itself, an observation that has previously been reported [
10]. PreM breast cancer involved hyper-methylated
ESR1, lower levels of
ESR1 gene expression, and lower levels of ER protein expression. Conversely, postM ER+ breast cancer involved hypo-methylated ESR1, increased gene expression and increased protein levels. Prior studies have shown an association between ER expression and age or menopausal status [
21‐
25], however, we were unable to find a report on differential
ESR1 methylation comparing preM and postM breast cancer. Intriguingly, one report has shown increased
ESR1 promoter methylation in colon cancer, as a function of age [
26]. A preliminary analysis did not reveal significant differences in gene expression of classical ER-target genes between ER+ tumors with hypo-methylated vs hyper-methylated
ESR1 (data not shown), but we plan on performing additional studies, including detailed analyses of different expression, methylation and roles of ER in breast cancer. Our future studies will not only address alteration of ER expression as a function of menopausal status, but also age. This is critical because age is more strongly associated with ER expression than menopausal status in both TCGA and METABRIC (data not shown). Future studies should also address whether there is efficacy of combining epigenetic therapies with endocrine treatment in preM breast cancer patients, possibly using ESR1 methylation as a predictive biomarker.
An intriguing finding is the increased activity of integrin and laminin signaling in preM breast cancer. There are compounds in development targeting this pathway, including volociximab, a chimeric monoclonal antibody that targets integrin α5β1. To date, studies in renal cell carcinoma, pancreatic cancer, malignant melanoma and lung cancer have had promising results [
27]. Intetumumab, a monoclonal antibody that targets all members of the αv integrin family, demonstrated increased overall survival when combined with cytotoxic therapy in phase II studies in melanoma [
28], but did not improve outcomes in prostate cancer [
29].
In addition, the activation of EGFR signaling in preM breast cancer is clearly of potential clinical interest. Other studies have previously identified overexpression of EGFR [
10], and its ligand, amphiregulin (AREG), [
9] in young breast cancer patients. While therapies targeting EGFR have been studied in breast cancer, these have focused predominately on triple-negative breast cancer, and there have not been previous studies specifically in ER+ patients [
30]. The efficacy results have been mixed, and neither overexpression of EGFR by IHC, nor assessment of EGFR pathway analysis microarrays has been an adequate surrogate to predict responsiveness [
31‐
35]. EGFR signaling has been implicated in tamoxifenresistance in preclinical models [
36,
37], which could have significant implications for treatment, and lends further credence for EGFR pathway overexpression contributing to worse clinical outcomes in ER+ preM breast cancers. Furthermore, there is evidence for crosstalk between integrin and EGFR signaling in both breast [
38] and lung cancer [
39], suggesting that successful targeting of these pathways in preM ER+ breast cancer may require a multi-pronged approach.
Our somatic mutation analysis using MutSig identified five genes (
CDH1, GATA3, MLL3, GPS2, and
PI3KCA) for which mutation rates were significantly different between preM and postM tumors. After correction for multiple comparisons only one gene (
CDH1) remained differentially mutated in the preM and postM groups. This is consistent with the fact that mutations in
CDH1 are found almost exclusively in lobular cancers that are enriched in older patients. Interestingly, GATA3, an ER-interacting transcription and chromatin remodeling factor with a role in luminal cell fate and breast tumorigenesis [
40‐
42], was recently shown to be overexpressed in preM breast cancer, and high expression of GATA3 was significantly associated with improved survival in preM women, but not in postM women [
43]. Collectively, these data suggest a menopausal status-dependent role for GATA3 in breast cancer.
As expected, we did find lower overall mutation rates in preM compared to postM cancer. Increase of mutation rates with age is likely a general effect of oxidative damage during aging rather than an endocrine response as a result of menopause. Interestingly, further analysis of the mutation spectra showed that postM cancers were enriched for C>T mutations in the context of a 5′ T and 3′G (TCG>TTG), and mutations within TCW motifs that are associated with APOBEC-induced changes. The latter changes were limited within the context of TCT>TAT, an APOBEC motif not typically seen in breast cancer [
44]. Together, the increase in these two mutation types matches signature 10 from a recent characterization of trinucleotide mutation context [
45]. This signature is thought to be related to defects in
POLE and DNA mismatch repair genes [
44,
46] and thus suggests that defects in
POLE and DNA mismatch repair genes may play a larger role is postM breast cancers than preM.
In the analysis of gene expression data from TCGA, we combined the differentially expressed genes detected in Agilent array and RNA-Seq data. While platform differences between microarray and sequencing data has been a controversial topic [
47,
48], we found relatively good concordance
\( \left(\overline{r}=0.70\right) \) when examining both platforms performed on the same tumor. Indeed, when we conducted differential analysis for the two datasets using the same set of samples, we identified similar pathways to be activated, with small differences in order/significance level.
Semi-supervised machine learning of preM ER+ patients revealed three groups with strikingly different outcomes. In part, this is expected because we have used the survival information when training the classifier. Unfortunately we are unaware of another large dataset of preM breast cancer with gene expression and outcome data to validate this finding. To avoid overfitting of the data, and provide a more fair comparison, we performed a cross-validation approach, and the results suggest that the semi-supervised machine learning and BCI are equally good predictors, that seem to outperform Oncotype Dx. However, additional studies are necessary, and further research specifically on preM breast cancer will hopefully identify prognostic tests specific for this type of breast cancer and will ultimately lead to personalized therapies.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
All authors have made substantial contributions to the design of the study. SL, RJH, SL, TM, and UC performed analyses of TCGA and METABRIC data, SB carried out integrated analysis using PARADIGM, and FM participated in ESR1 methylation analysis. SL drafted the manuscript. KMcG, RB, ND, and SP were critical in the overall design of the study and interpretation of the data. AL, GT, and SO developed the original concept of the study, oversaw the entire project, and helped to draft the manuscript. All authors have participated in critically revising the manuscript, have read the final draft, and given final approval of the version to be published, and agree to be accountable for all aspects of the work.
Adrian V. Lee, George C. Tseng and Steffi Oesterreich share senior authorship.