Introduction
The tumor microenvironment (TME) is comprised of fibroblasts, endothelial cells, immune cells and extracellular matrix (ECM). The cells within the TME have been demonstrated to play significant roles in the development and progression of cancer [
1‐
8], but most studies view each of these cell groups as a relatively uniform population that is similar across different patients.
Profiling the non-neoplastic cells within the TME directly is difficult due to the variety and relative paucity of these cells in the tissue and practical issues with the isolation of these cells. Our approach rests on the hypothesis that, similar to lymphomas where each tumor is a clonal outgrowth of a particular lymphoid cell type, each soft tissue tumor type can also be regarded as a clonal outgrowth of a particular connective tissue cell type to represent subclasses of stromal proliferation that occur in epithelial tumors.
In our approach, soft tissue tumors (STTs) which are a homogenous collection of a single mesenchymal cell type have phenotypes distinct from each other, can be easily profiled and act as “discovery tools” for various types of TME expression patterns to yield a relatively uniform signature. Using gene array-based expression profiling of fresh frozen specimens of fibroblastic tumors (desmoid type fibromatosis-DTF and solitary fibrous tumor-SFT) and macrophage-rich tumors (tenosynovial giant cell tumor-TGCT/CSF1), we previously discovered novel types of stromal reaction patterns that emphasize the variations in the fibroblast and macrophage compartment in breast cancer between different patients [
1,
2,
8,
9]. The biological significance of the identification of these stromal reaction patterns was borne out by the fact that several of these stromal expression patterns have prognostic significance independent from traditional prognosticators such as tumor size, tumor grade and even lymph node status [
1,
2,
8].
Based on our previous findings, the DTF signature robustly defined a stromal pattern for 25 to 35% of invasive breast cancers [
2], while the TGCT/CSF1 signature was found in 17 to 28% of breast cancers [
1]. However, a significant number of breast cancers were not classified by these signatures. In order to find additional stromal patterns we performed gene expression profiling on a spectrum of fibroblastic lesions. As most of these lesions are quite small, they are routinely entirely submitted as formalin fixed paraffin embedded (FFPE) tissue. Here, we have applied an RNA-Seq method for the expression profiling of archival, FFPE tissue, termed 3SEQ (3′end RNA sequencing), which we have previously developed. This method can be used to perform global gene expression profiling of FFPE material [
10‐
12], as well as to discover and characterize expression levels for lncRNAs [
13]. Within this study, we have used 3SEQ to determine specific gene expression signatures for 10 types of fibroblastic tumors and found that 2 can identify breast cancers with distinct clinical outcome. Taken together with the two previously identified stromal signatures (DTF and TGCT/CSF1 signatures), the combined four stromal signatures now classify 74% to 90% of breast carcinomas.
Discussion
Stromal components within the TME are known to be involved in cancer initiation, progression and prognosis [
1‐
4,
6,
7,
9,
22‐
27]. In many studies, the different cellular components of the TME are treated as relatively invariable factors that are assumed to play a similar role in tumor samples from different patients. However, through systematic analysis of breast cancer H&E images with a novel machine learning based method, C-Path, we have recently shown that the morphological features of the tumor stroma vary markedly between tumor samples and that they are not only significantly associated with survival in breast cancer, but that their impact on outcome is even stronger than the features of the epithelial component itself [
9].
By clustering breast carcinoma expression profile datasets using only the genes that are specific for distinct fibroblastic tumors, we can observe subsets of cancer that contain different fibroblastic subtypes in the tumor stroma. In contrast, hierarchical clustering that uses all genes in the dataset obtained from an entire tumor specimen usually groups samples together based predominantly on the gene expression pattern in carcinoma cells as these cells often represent the majority of the cells within a tumor and often show the most variation in expression patterns. Thus, their transcript levels represent the strongest signal in the sample. As a result, differences between tumors based on their stromal expression patterns are often not observed in datasets where the entire gene expression profile of the sample is used.
It is difficult to obtain gene expression profiles from normal fibroblast subtypes as in normal and tumor tissue they are typically closely associated with other cell types, such as epithelial cells, and techniques, such as micro-dissection, are laborious. Our approach rests on the hypothesis that, similar to lymphomas where each tumor is a clonal outgrowth of a particular lymphoid cell type, each fibroblast tumor type can be regarded as a clonal outgrowth of a particular connective tissue cell type [
28]. Fibroblastic tumors thus represent neoplasms of different normal fibroblast subtypes and differentially express genes typical for various fibroblast functions. Moreover, each fibroblastic tumor type represents a largely homogenous population of cells and can be robustly profiled. By using this approach, our group has previously demonstrated that a specific stromal gene expression pattern, the DTF fibroblast signature, could robustly and reproducibly define a subgroup of breast cancer patients with good prognosis [
2,
8] and that a second stromal pattern, the TGCT/CSF1 macrophage signature, is associated with breast cancers of a higher tumor grade, with decreased expression of ER/PR, and increased mutations of TP53 [
1]. Subsequent studies have shown that these different TME variants can even be identified in cases of pre-invasive ductal carcinoma [
6]. These findings indicated that the type of TME can vary between patients and that expression profiles obtained from STTs form a useful tool to distinguish these TME variants.
Our prior studies allowed us to identify distinct TME subtypes in up to 50% of breast cancers. In order to extend our findings we determined the gene expression profile for an additional eight fibroblastic tumors. Previous studies required fresh frozen tumor samples, but for many of the fibroblastic lesions we intended to analyze only FFPE material was available. We therefore used a novel gene expression profiling approach (3SEQ) that uses next generation sequencing of RNA fragments purified by oligo-dT selection from FFPE material (Additional file
12). Applying eight novel fibrous signatures to four publicly available breast cancer expression profiling datasets, we found that three of these signatures were not expressed in the breast cancers in a coordinated manner. Of the five remaining signatures, three did not show differences in outcome analysis. In contrast, the EF and FOTS signatures could stratify the breast cancer samples into two groups, through highly coordinated gene expression with consistent association with outcome in all the four breast cancer datasets. The EF signature positive breast cancers demonstrated good outcome, while the FOTS signature positive breast cancers showed bad outcome. In this study, the SFT signature is significantly associated with worse outcome only in the NKI dataset, but there is no clear pattern in the other three datasets, consistent with our previous findings [
2]. The current DTF-(3SEQ) signature, which is similar to the DTF signature previously defined against SFT [
2,
8], is associated with good outcome in three of the four breast cancer datasets, though the association in this analysis is not statistically significant. The difference in the significances of old and new signatures related to outcome can be explained by the fact that the genes differentially expressed for a particular lesion is to a great extent determined by the other samples in the dataset to which it is compared. The original DTF was determined through a comparison with SFT only while the currently defined DTF-(3SEQ) signature was determined through a comparison with a much larger number of distinct fibrous tumor types. As a result, the current DTF-(3SEQ) signature contains 42 genes from the comparison between DTF and the other nine types of fibrous tumors including SFT, while the original DTF signature contains 237 genes from the comparison between DTF and only one other type of fibrous tumor, SFT.
The EF and the previously identified DTF fibroblast signature both identify good outcome in breast cancer, while the FOTS and the previously identified TGCT/CSF1 macrophage signature both identify bad outcome in breast cancer. In order to explore the relationships between the good or bad signatures, we compared the breast cancer sample assignments between them. The comparison results showed that 11 to 16% of breast cancers were positive for both EF and DTF core gene sets, while 44 to 47% of breast cancers were negative for both, 23 to 33% of breast cancers were EF-/DTF+, and 9 to 20% of breast cancers were EF+/DTF-. In addition, 6 to 12% of breast cancers were positive for both of FOTS and TGCT/CSF1 core gene sets, 41 to 65% of breast cancers were negative for both FOTS and TGCT/CSF1 core gene sets, 5 to 24% of breast cancers were FOTS+/TGCT/CSF1-, and 23 to 24% of breast cancers were FOTS-/TGCT/CSF1+. EF+/DTF + breast cancer cases were more likely to be ER+/PR+, low grade, with less lymph node than EF-/DTF- breast cancers. FOTS+/TGCT/CSF1+ breast cancers are more likely to be ER-/PR-, high grade, base-like breast cancers than FOTS-/TGCT/CSF1- breast cancer cases.
In order to test the prognosis power of the combined core gene sets of EF, FOTS, DTF and TGCT/CSF1, we pooled the outcome data from four breast cancer datasets in overall survival (OS), disease specific survival (DSS) and disease free survival (DFS). Kaplan-Meier analysis for the combined core gene sets in the pooled dataset showed that EF-/DTF- breast cancers were associated with worst outcome in OS, DSS and DFS, while EF + breast cancers, no matter whether they are DTF- or DTF+, were associated with better outcome in OS, DSS and DFS. FOTS-/TGCT/CSF1- breast cancers were associated with better outcome in OS, DSS and DFS, while FOTS+/TGCT/CSF1- breast cancers were associated with worse outcome in OS, DSS and DFS.
To better understand the potential functions of the two new stromal core gene set signatures (EF and FOTS) in breast cancer, Gene Ontology (GO) and KEGG PATHWAY analysis were performed, which show that the 41 EF core genes are significantly enriched in biological processes including ‘response to wounding’ and BMP signaling. Within the 41 EF core genes, almost one fourth of them (9/41), such as
AOC3,
AOX1,
C6,
CFD,
CFH,
CLU,
GSN,
LYVE1 and
MECOM, were related to ‘response to wounding’. The association with wound healing has been previously identified in a study of prognostic gene signatures in breast cancer [
29]. However, a comparison between the EF gene list and the previously identified gene signatures of the “minimum number core serum response” (CSR) genes [
29] necessary for tumor classification, we found that only one gene,
GSN, was shared between EF core signature and CSR genes (from quiescent samples, in contrast to activated samples). The BMP pathway is well known to modulate cross-talk between stromal cells and epithelial cells [
30‐
32], and three genes (
CHRDL1,
GREM2 and
MSX1) involving BMP signaling were comprised of the EF core signature. For the FOTS core gene set, biological processes, including glycolysis, were enriched based on the 16 core genes. This suggests that the increase in glycolysis, which serves a critical role in cancer cell growth and invasion [
33], may also influence stromal fibroblast.
FOXM1 is an example of FOTS core genes involved in both of epithelial cancer cells and the cancer associated fibroblasts. By comparing the gene expression between isolated breast cancer-associated fibroblasts (CAFs) and normal mammary fibroblasts (NFs) isolated from the same patient, Mercier
et al. found that
FOXM1 was up-regulated in CAFs rather than the NFs [
34].
TPI1 gene in FOTS core signature was up-regulated in the CL4 fibroblast, which could supply epithelial cancer cells with pyruvate/lactate as “fuel” to help epithelial cancer cells to escape the anti-angiogenic treatment [
35]; therefore, the TPI1 targeted therapy (or FOTS gene targeted therapy) for cancer-associated fibroblast will be an actionable way to control the progression of cancer in conjunction with anti-angiogenic therapy.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
XG, MVD and RW designed the research and wrote the paper. XG and AB analyzed the data. SXZ performed the experiments. MVD and RW supervised the research. All authors read and approved the final manuscript.