Introduction
Different molecular subtypes of breast cancer have been described [
1]. The most profound effects on gene expression profiles in breast cancer are related to estrogen (ER), and proliferation status, and to a lesser extent to Human Epidermal Growth Factor Receptor 2 (HER2) status. Not surprisingly, molecular classification and current prognostic signatures mainly reflect these molecular features [
2]. However, substantial clinical and molecular heterogeneity remains within current molecular subsets, particularly among ER, progesterone (PgR) and HER2 receptor negative (that is, triple negative breast cancers, TNBC [
3]). Furthermore the relationship between clinically defined TNBC and the gene expression profile-based basal-like breast cancer subtype (BLBC) [
4] is not fully defined [
5]. Some authors use these two terms synonymously given the substantial overlap between the two definitions [
6,
7]. However, immunohistochemical and molecular profiling studies have shown that only a subset of TNBC express the combination of basal cell markers (for example, CK5 and CK14) that is required for the molecular definition of this disease [
5]. The prognostic significance and therapeutic implications of molecular heterogeneity within TNBC remains to be established. From a clinical point of view, further understanding of TNBC is important because better prognostic markers and new treatments are needed [
8].
The goal of this analysis was to assemble all currently available TNBC gene expression datasets generated on Affymetrix gene chips and search for molecular structures in the data to define gene expression-based subsets within TNBC. We defined metagenes as the average expression of groups of highly co-expressed genes in the data without considering any clinical outcome variable. These metagenes identified several molecular subsets within TNBC, some with good prognosis even in the absence of systemic therapy. Our results also suggest possible new therapeutic strategies for TNBC. This study represents the largest attempt to define clinically important molecular subsets within TNBC [
9].
Discussion
It has been suggested that TNBC represent a group of several molecularly [
3] and clinically [
41,
42] distinct disease subtypes. We used gene expression data of a cohort of 394 TNBC to identify molecular subsets within this tumor type. The definition of TNBC was based on gene expression data which is not the standard definition used in the clinic. This might be a caveat but holds the promise that samples erroneously characterized as receptor-negative by immunohistochemistry do not introduce noise into our analysis. We identified 16 metagenes associated with several distinct biological processes that showed variable expression across TNBC (Table
2). Some of the metagenes seem to point to the distinct origins of these cancers [
43,
44]. These include the basal-like [
4], the apocrine [
18,
19], and the claudin-low [
28,
29] subtypes of TNBC. Other metagenes were related to non-neoplastic cellular constituents of the tumor microenvironment including stroma [
26,
27], blood cell [
30] and adipocytes [
4], as well as signatures for angiogenesis [
23,
34] and inflammation [
31‐
33]. Five metagenes appear to reflect the variable presence of immune cells and may contribute to the clinical behavior of the cancer [
4,
20‐
25,
27,
45] (Table
2).
Kreike
et al. [
9] detected similar metagenes among 97 TNBC analysed with a different microarray platform. That study suggested that the TNBC clinical phenotype can be equated to the BLBC molecular class determined by the centroid method [
46] since 95% of the TNBCs were assigned
basal-like molecular class [
47]. However, the centroid method is highly susceptible to the composition of the dataset that is used to define the reference centroids [
48] and variants of the method can lead to different results [
49]. Bertucci
et al. [
50] identified only 71% of their 172 TNBC cases as
basal-like when using a slightly different version of the centroid method for molecular classification. When we applied different versions of the centroid method to 1,364 breast cancers, 65% to 90% of the TNBC samples (
n = 172) were assigned to the basal-like class depending on the method used (Additional file
2, Supplementary Table S6). In this paper we took a different approach and first identified metagenes and used these metagenes to define molecular subsets among TNBC. One of our metagenes corresponded closely to the gene signatures that are used to define BLBC in the centroid based methods. Our results indicate that BLBC defined based on the
basal-like metagene expression represent around 73% of TNBC (Table
3 and Additional file
2, Supplementary Table S2).
The proportion of BLBC among TNBC in our study is similar to results from an immunohistochemical study by Rakha
et al. [
7] that defined BLBC by the expression of CK5/6, CK14, CK17 or EGFR. These authors observed a worse survival of the 165 patients with BLBC compared to the remaining 67 TNBC cases, which expressed none of these markers. However, we did not detect differences in the prognosis of BLBC and non-BLBC type triple negative cancers (Additional file
1, Supplementary Figure S7). In the study by Rakha
et al. the prognostic effect was mainly confined to 103 untreated patients. Still, even when we analyzed untreated patients (
n = 186) separately, we detected no prognostic value of the BLBC phenotype (not shown). Our results are also contrary to the immunohistochemical study of Cheang
et al. [
51], which used CK5/6 and EGFR antibodies for TNBC stratification. They also observed a worse prognosis of 336 BLBC TNBC compared to 303 non-BLBC TNBC. However, our study is not directly comparable to these prior reports because our definition of BLBC is fundamentally different from the IHC-based methods. Our results are in line with several other genomic profiling studies that reported limited prognostic value for the BLBC molecular class among clinically triple negative cancers [
18,
19,
50].
We observed strong prognostic value for several of the other metagenes (Additional file
2, Supplementary Table S4). An improved prognosis was observed for patients with tumors displaying high expression of immune system related metagenes which supports recent reports [
20,
23‐
25,
27,
39,
40,
52,
53]. An association with decreased survival was observed for high expression of inflammation (IL-8), an angiogenesis/hypoxia signature (VEGF) [
34], and histone-related metagenes (Additional file
2, Supplementary Table S4 and Figure
1). A simple combination of high B-Cell and low IL8 metagene expression identifies a subset of TNBC patients (32% of all) with a favorable prognosis and a five-year event-free survival of 84%. In multivariate analysis, only this metagene ratio and lymph node status were significant predictors of TNBC in our cohort of patients (Table
4 and Figure
4D, E). Other known prognostic factors in breast cancer, such as age, tumor size and histological grade, were not significant in our cohorts, even in univariate analysis. Most TNBC are high grade and, therefore, grade is not as important for prognosis in this subtype as it is in ER positive disease. TNBCs are also often associated with younger age but the impact of age and tumor size for prognosis within this subtype is not yet fully clear. Still it cannot be excluded that a bias in our cohort is the reason for the lack of the significance of these factors. Our analyses of neoadjuvant treated TNBC samples suggest modest predictive value of the B-cell/IL8 metagene ratio for currently used chemotherapies [
22,
54] (Additional file
1, Supplementary Figure S10). We also observed a pure prognostic value in untreated patients of finding the cohort in line with other reports on B-cell metagene [
24,
27]. Treatment information on the samples from the validation cohort was not available.
Our observation is important since every currently available genomic prognostic signature, (for example, the 70-gene profile [
55], Recurrence Score [
36], Genomic Grading Index [
37]), assigns poor prognostic risk status to all TNBC samples despite their variable outcome [
56‐
58]. One of these signatures, the Rotterdam-76-gene prognostic signature [
59], was developed in a way to allow prognostic stratification of ER-negative cancers. However, similar to other reports [
9] we were not able to demonstrate a prognostic value for this signature (Additional file
1, Supplementary Figure S12).
We used an unsupervised class discovery approach to first identify the main molecular subtypes within the data and then assess the prognostic differences between the molecular subsets. Interestingly, when we performed an independent supervised analysis that compared TNBC cases with or without recurrence, we also identified IL-8 as the top ranked gene associated with poor prognosis (Additional file
1, Supplementary Figure S13 and Additional file
2, Supplementary Table S8). However, gene signatures obtained through supervised analysis were not superior to the molecular structure based prognostic predictions in validation (Additional file
1, Supplementary Figure S14). In addition, the biological interpretation of the empirically derived prognostic signature is more difficult than the interpretation of metagenes. In summary, we performed the largest unsupervised analysis of pooled gene expression data from TNBC. We describe a new prognostic signature for these cancers that identify about one-third of TNBC as relatively low risk for recurrence. These cancers are characterized by high B-cell and low IL-8 metagene expression and have about 84% recurrence-free survival at five-years. Whereas, this may not be sufficiently high to forego adjuvant chemotherapy, these observations pave the way to develop a clinically useful multivariate prognostic model for TNBC. A combined, prognostic score, including clinical variables, such as nodal status and perhaps tumor size, and molecular variables, such as optimized B-cell and IL-8 metagenes (measured by an RT-PCR or array-based method), may identify patients with very low risk of recurrence even with ER-, PgR- and HER2-negative breast cancer. Equally important, the prognostic importance of B-cells and the negative impact of IL-8 suggest potential novel therapeutic strategies for TNBC that can be tested in the clinic [
31,
32]. It could allow the selection of those patients who could profit most from novel immune stimulating drugs like anti-CTLA-4 antibodies that have shown promise in melanoma [
60,
61]. IL8 could also directly increase the survival of breast cancer stem cells after chemotherapy [
62], which can be blocked with IL8 directed drugs [
63]. Such an effect might explain the triple negative paradox with high relapse rates despite a good initial response to chemotherapy.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
AR, TK and UH conceived the study, carried out the analyses and wrote the manuscript. CL and LP added experimental data, participated in the interpretation of the data and in writing the manuscript. ER, LH, RG, CS AA, MS and VM provided patients and samples, obtained follow-up data and helped to draft the manuscript. DM and TK performed the statistical analysis. MK initiated the study and participated in the design and writing of the manuscript. All authors read and approved the final manuscript.