Introduction
Age is the strongest demographic risk factor for most human malignancies, including breast cancer [
1]. About 80% of all breast cancers occur in women older than age 50; the 10-year probability of developing invasive breast cancer increases from <1.5% at age 40, to about 3% at age 50 and to >4% by age 70, resulting in a cumulative lifetime risk of 13.2% (one in eight) and a near ninefold higher incidence rate in women older than age 50 as compared with their younger counterparts [
2,
3]. Despite awareness that breast cancer and other cancers are primarily age-related diseases, molecular and cellular hypotheses explaining the cancer–aging relationship have only recently emerged and remain clinically unproven [
4].
At the subcellular level, normal human aging has been linked to increased genomic instability [
5,
6], to global and promoter-specific epigenetic changes [
7,
8], and to altered expression of genes involved in cell division and extracellular matrix remodeling [
5,
6]. These associations have led to the hypothesis that the cancer-prone phenotype of an older individual results from the combined effects of cumulative mutational load, increased epigenetic gene silencing, telomere dysfunction and altered stromal milieu [
9]. Given the worrisome social, economic and medical consequences of an aging worldwide population, proposed biological mechanisms linking cancer with aging must be established in order to develop effective interventions.
As with normal organs and tissues, tumor biology can also change with aging [
10,
11]. For sporadic breast cancer in particular, correlations between patient age at diagnosis, tumor biology and clinical prognosis have long been appreciated if not fully understood [
12‐
16]. Younger age at diagnosis (≤ 45 years old) is associated with more aggressive breast cancer biomarkers, including overexpression of ERBB2/HER2 and ERBB1/HER1 growth factor receptors [
13], abnormal p53 expression [
13,
15], estrogen receptor (ER) negativity [
12‐
16], higher nuclear grade and higher Ki-67 proliferation index [
12‐
14,
16]. These breast cancer biomarkers are also interdependent, however; in particular, ER expression is inversely correlated with abnormal p53 [
15], overexpression of ERBB2 [
15], high Ki-67 and nuclear grade, and poor patient prognosis [
17]. It therefore remains unclear whether the age-specific biomarker features of breast cancer reflect the pleotropic background effects of aging on the normal mammary gland or age-specific differences in breast tumorigenesis; also, since most age-specific biomarkers strongly associate with the ER status, the effects of aging must be studied in histologically similar breast cancer phenotypes controlled for ER status.
The molecular and cellular effects of aging on both normal and malignant breast tissue are superimposed on a continuum of developmental changes that normally occur between puberty and menopause, heavily influenced by menstrual history and parity. In general, the normal mammary gland ER content (fmol receptor/g tissue) as well as the proportion of ER-expressing (ER-positive) ductal epithelial cells increase with each decade of age, and reach a plateau with menopause at about age 50 [
18,
19]. In contrast, breast cancer ER expression continues to rise beyond menopause, reaching a near 25-fold differential between normal and malignant mammary gland ER expression in patients by age 70 [
18].
Curiously, expression of some ER-inducible gene markers, such as progesterone receptor (PR), pS2, Bcl2 and cathepsin D, does not show any significant relationship with the age at diagnosis [
13,
18], while other markers show increased expression in breast cancers arising earlier in life [
20] – suggesting that the effects of aging may in part be attributed to age-related differences in estrogen-inducible ER pathways. Important in this regard is the age-related change in PR coexpression within ER-positive breast cancers, since PR has long been used as a clinical indicator for a functioning ER pathway in tumors likely to respond to endocrine therapy [
21]. Among all ethnic patient groups, ER-positive/PR-negative breast cancers show the greatest age-related increase in incidence after age 40 [
22]. Potentially relevant to this ER-positive/PR-negative phenotype is the fact that growth-factor-activated pathways downregulate PR expression [
22‐
25], and that the inverse correlation between overexpression of the ERBB2 growth factor receptor and PR positivity is only seen in breast cancers arising after age 40 [
26]. Surprisingly, the natural perimenopausal decline in ovarian-produced estrogen serum levels do not fully account for age-related changes in ER-regulated mammary epithelial pathways, since the marked age-related increase in stromal and epithelial aromatase expression produces postmenopausal mammary gland estrogen levels comparable with those measured in premenopausal women [
27].
To better understand the molecular and cellular influences of aging on breast cancer biology and clinical behavior, we performed a detailed study of phenotypically similar breast cancers arising in two disparate patient age groups. The DNA and the RNA were prospectively extracted from cryobanked samples of stage-matched and histology-matched ER-positive breast cancers diagnosed in either younger (age ≤ 45 years) or older (age ≥ 70 years) Caucasian women. These samples were analyzed by array comparative genomic hybridization (CGH) and by high-throughput expression microarrays to look for genetic and epigenetic differences between the age cohorts. Unsupervised hierarchical clustering of the combined data from both cohorts was used to search for age biases in clustered subsets, and this was followed by supervised comparisons between the two cohorts to delineate potential age-related genomic and transcriptome differences. Finally, a predictive analysis of microarrays (PAM) performed on the two age cohorts produced an age-specific expression signature that proved to have >80% predictive accuracy when validated against two other independent breast cancer datasets.
Discussion
Although there have been numerous studies of clinical factors addressing the relationship between age at diagnosis and breast cancer prognosis [
12,
14,
16,
53‐
55], few studies have comprehensively investigated the age dependency of the many well-established prognostic breast cancer biomarkers, and no studies have used a prospective study design [
13,
18]. Concerned about the established inverse relationship between the ER status and poor-risk biomarker surrogates of breast cancer proliferation and genomic instability [
13,
18], the present study aimed to identify genomic and transcriptome changes associated with aging using DNA and RNA prospectively collected from stage-matched and histology-matched ER-positive breast cancers from younger women (age ≤ 45 years) and older women (age ≥ 70 years), analyzed by array CGH and high-throughput expression microarrays.
Similar bioinformatics-based approaches have been used to characterize aging effects in human fibroblasts [
5,
6], lymphocytes [
5] and myoblasts [
56]; however, comparable efforts to investigate aging influences on human cancer biology have not been reported. Moreover, while ER-positive breast cancers have been well studied as a subgroup within unselected breast cancer phenotypes using array CGH [
28,
57] or expression profiling [
38,
39,
49‐
51], the present study represents the largest study reported to date using these powerful techniques to subset ER-positive breast cancers, while employing a statistical design powered to detect age-specific differences.
Array CGH analysis of 71 DNA samples confirmed that our ER-positive breast cancers were composed of two basic genotypes [
28]: a simple subtype characterized by few genomic copy number changes other than gain of 1q and loss of 16q, and a mixed amplifier subtype characterized by recurrent amplifications but otherwise low levels of genomic gains and losses. A third genomic subtype of breast cancer, referred to as complex, known to be almost exclusively composed of ER-negative breast cancers [
28], was not observed in either of the two age cohorts studied. Neither the simple nor the mixed amplifier genomic subtypes of ER-positive breast cancer showed any particular age bias. Direct comparison of the two age cohorts for multiple array CGH parameters also revealed no significant differences in the fraction of genome altered, in whole chromosome changes or in total or site-specific amplicon frequencies. Although nonsignificant trends suggested slightly fewer oncogene amplifications within the older cohort, overall amplification frequencies for the most common oncogenes were as expected for ER-positive breast cancers [
51,
58]:
MYC (27%),
CCND1 (23%),
ZNF217 (17%),
AIB1 (16%),
MDM2 (8%),
ESR1 (7%),
ERBB2 (7%), and
TOPO2A (7%). At the level of genomic resolution (~1 MB) achievable by BAC-based array CGH, there appeared to be few if any genetic differences between ER-positive breast cancers arising in women whose ages differ by more than 25 years. Future studies employing higher density genomic arrays are warranted to confirm this conclusion.
Microarray profiling of 101 RNA samples showed an average 65-fold range in
ESR1 transcript levels across the entire collection of ER-positive breast cancers, with the older cohort showing significantly higher
ESR1 levels as compared with the younger cohort, consistent with earlier biomarker studies [
13]. There was the expected close correlation between the
ESR1 transcript levels and commonly observed
ESR1 coexpressed genes (for example,
GATA3) as well as other genes (for example,
KRT8,
KRT18) that characteristically define luminal-type breast cancer, although this tumor collection also contained several ERBB2-positive cases (10/101) that are not characteristically found in microarray-defined clusters of luminal-type breast cancer [
38,
39,
49‐
51]. Hierarchical clustering of the ~5.1 K variably expressed genes also identified six transcriptome subtypes of ER-positive breast cancer with significant age biases (
P < 0.05) but not associated with differing PR status. Based on relapse-free survival analyses of the 54 cases with known clinical outcome (30 younger women, 24 older women), there was a trend supporting a less favorable prognosis for the younger age cases (
P = 0.09) and PR-negative cases (
P = 0.08). The six age-biased transcriptome clusters, however, showed significantly different relapse-free survival outcomes (
P = 0.025, log-rank analysis), suggesting that these transcriptome subtypes represent clinically relevant phenotypes of ER-positive breast cancer. Previous expression array studies analyzing fewer ER-positive cases have identified no more than two or three subsets of luminal-type breast cancer [
38,
39,
49‐
51].
Reported gene signatures representing luminal, proliferation and MAPK markers were tested for their enrichment in one or the other of the age-stratified cohorts, and only the proliferation gene signature showed any significant age bias when multiple testing was accounted for, being more highly expressed in the younger cohort. This finding is consistent with earlier studies showing higher tumor grade and proliferation markers (for example, mitotic index and Ki-67 positivity) in younger age breast cancer patients [
13]. While none of the >1,000 curated gene sets in the Molecular Signature Database that were similarly evaluated demonstrated any significant age biases when multiple testing was account for, a trend was observed for enrichment of cell cycle genes in the younger cohort cases. Nine genes common to both the GO biological process cell cycle set and the proliferation signature set (
BUB1,
CCNB1,
CCNE2,
CDC25A,
CDC7,
MAD2L1,
MCM4,
ORC6L,
PTTG1) were also present in our significant probe set. Among these, four genes (
BUB1,
CCNE2,
MAD2L1,
ORC6L) have been previously associated with poor-prognosis ER-positive breast cancers in a well-established 70-gene prognostic signature [
58]; these genes are therefore probably important contributors to the more aggressive tumor characteristics of ER-positive breast cancers arising in younger patients.
Using only the proliferation gene signature to perform unsupervised hierarchical clustering of the 101 cases generated two comparably sized ER-positive subsets, one with higher expression and another with lower expression of the proliferation genes; the higher expressing subset contained most of the younger age cases (34/52) and all but one of the ERBB2-positive cases. When this proliferation signature was also used to dichotomize the 54 cases with known clinical outcome, the higher expressing cases showed significantly worse disease-free survival as compared with the lower expressing cases, consistent with reports on the association of a similar proliferation signature with poor outcome in patients with ER-positive breast cancer [
59]. Interestingly, despite a presumed mechanistic link between activation of growth factor receptors, MAPK signaling and cell proliferation, there was minimal overlap between genes in the reported MAPK and proliferation signatures, and no significant association was observed between the MAPK signature, age and ERBB2 positivity.
Despite the observed positive association between the
ESR1 expression level and older age, no age association was seen for the luminal gene signature that included
ESR1,
ESR1-associated genes and estrogen-inducible genes. This finding is consistent with our previous report showing increased breast cancer ER protein with aging without comparably increased levels of such estrogen-inducible markers as PR, pS2, Bcl2 and cathepsin D [
13], and suggesting reduced estrogen signaling in breast tumors of older patients. In keeping with these protein biomarker observations, differential gene expression analysis in the present study did not identify any known estrogen-inducible genes such as
TFF1,
PGR,
IRS1,
IGFBP4,
PCNA,
MYC,
CCNA2 or
DLEU2 as being more highly expressed in the older cohort despite higher expression of
ESR1 in this cohort. In contrast, two estrogen-inducible growth-regulating genes,
GREB1 and
AREG, showed significantly higher expression levels in the younger cohort, in keeping with a recent study demonstrating a negative correlation between these estrogen-inducible genes and age [
20]. As
GREB1 and
AREG are known to induce cell proliferation upon estrogen activation [
60,
61], their increased expression in the younger cohort offers some mechanistic basis for increased proliferative activity and gene expression in the younger cohort.
Of the 75 unique genes differentially expressed between younger and older cohorts, 24 genes showed increased expression in younger cases relative to older cases (including
GREB1 and
AREG) while 51 genes showed increased expression in older cases relative to younger cases (including
ESR1). Comparison with a well-studied estrogen-inducible gene signature set [
20] revealed that ~25% (19/75) of these differentially expressed genes overlapped with known early or late estrogen-responsive genes, and thus potentially reflected hormonal changes associated with menopause rather than aging effects. While two-thirds (13/19) of these potential estrogen-responsive genes showed appropriate directional changes according to cohort menopausal status, supporting this possibility, at least 75% of the differentially expressed genes would appear to be independent of menopausal differences in circulating estrogen levels and, therefore, potentially informative of age-related differences in ER-positive breast cancer biology. A comprehensive database search confirmed that at least 40% of these differentially expressed genes have reported direct links with malignancy; and while none have reported links with premature aging, one of the differentially expressed genes (
KIF2C) has been previously implicated in aging studies of lymphocytes and fibroblasts [
5], while six other genes (
COBLL1,
HPGD,
HOXB2,
PDE4A,
SLC25A12,
TP73L) were recently reported as differentially expressed with age in human skeletal muscle [
62].
A search for annotated enrichment of the differentially expressed genes for specific biological processes (GO Biological Processes, Expression Analysis Systematic Explorer score < 0.05) indicated that 'development' and 'cell cycle/M-phase' were the most overrepresented functional gene categories. In keeping with the GSEA observation indicating a trend for enrichment of cell-cycle-associated genes in the younger cohort cases, differentially expressed cell cycle/M-phase genes (including positive regulators such as
STK6,
FGFR1 and
DLG7) represented 20% (5/25) of all genes overexpressed in the younger cohort but only 8% (4/51) of those overexpressed in the older cohort. In contrast, the older cohort cases showed differentially increased expression of negative cell cycle regulators (such as
SASHI and
RHOB) and four developmentally essential homeobox genes (
HOXB2,
HOXB5,
HOXB6,
HOXB7), the latter finding also in keeping with the GSEA observed trend showing enrichment in the older cohort of HOX-regulated (NUP90-HOXA9 repressed) genes. Two of the overexpressed HOXB genes (
HOXB6,
HOXB7) have been specifically linked to mammary gland development and are known to be expressed in ER-positive breast cancer cells [
63].
HOXB7, in particular, known to be dependent on stromal (extracellular matrix) signaling, is transcriptionally upregulated in breast cancers metastatic to bone (relative to primary tumors), and is thought to play a role in promoting angiogenesis, growth factor-independent proliferation and DNA double-strand break repair, conferring breast cancer resistance to the genome destabilizing effects of DNA damage [
64].
PAM was used to derive an age signature that consisted of 128 unique genes, including 44 of the 75 differentially expressed genes determined by our conditional permutation approach. The age signature was independently validated against two other age-matched ER-positive breast cancer microarray datasets and proved to have >80% accuracy in distinguishing younger from older ER-positive breast cancer cases. ESR1 and AREG were among the genes in common between the age signature and the differentially expressed gene sets; it is therefore not surprising that the age-signature-defined subsets from the two independent databases showed similar differences in the mean expression levels of these two genes as found in our age-defined cohorts. Only 28% of the age signature genes overlap with known early or late estrogen-responsive genes, suggesting that this age signature largely reflects age-related differences in the phenotype of ER-positive breast cancer rather than differences in circulating estrogen levels associated with menopausal status.
The fact that a PAM-derived PR signature did not perform well upon validation implies substantial heterogeneity between ER-positive breast cancers with the same PR status, and possibly indicates that confounding age-related gene expression changes are of greater biological importance than PR-related gene expression differences. Misclassification errors using the age signature were more prevalent among the older cohort cases, also suggesting greater variation in expression of the age signature genes with aging. Of further interest, the 128 age signature gene set was unable to accurately subset ER-negative cases identified from the two independent breast cancer datasets [
47,
48], consistent with expression-array-based conclusions that the biology of ER-positive and ER-negative breast cancers are fundamentally distinct, and supporting the likelihood that the PAM-derived age signature incorporates biological profiles specific to ER-positive breast cancers but not ER-negative breast cancers.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
CY carried out all the RNA expression array studies, collated all data and performed the biostatistical and informatic analyses, interpreted all the results and generated all the figures and tables pertaining to the expression array studies, and produced a preliminary draft of the manuscript. VF obtained and processed frozen primary tumors, generated all DNA extracts and many of the RNA extracts, carried out all the array CGH studies and participated in the statistical and bioinformatic analyses of these results, and contributed to drafting the manuscript. RR, JF, AH and DHM designed, supervised and/or conducted all of the biostatistical, clinical and informatic analyses supporting this study. JWG, KC, SHD, FS, ST, and AP provided all of the breast cancer study samples and/or contributed to the RNA and DNA processing of these samples. DGA helped conceive the entire study, provided laboratory support, developed all methods for and supervised the performance of all array CGH analyses, and helped interpret all results. CCB conceived the study design, identified and secured all breast cancer study samples, coordinated all DNA and RNA studies and their analyses, formulated all conclusions and drafted the final manuscript.