Introduction
Since the establishment of the HeLa cell line in 1951, cell lines have been an integral part of cancer research, and their use has tremendously advanced understanding of molecular cancer biology [
1]. However, the suitability of these models has come into question, as many in vitro phenomena are challenging to replicate in vivo. Interpreting the potential clinical significance of discoveries made using cell lines requires an understanding of the extent to which these cell lines represent in vivo tumours.
Since the first breast cancer cell line, BT-20, was established in 1958 [
2], various other immortalized primary tumour cell lines have been established at exceptionally poor efficiencies [
3,
4]. This low efficiency has often been attributed to slow growth rates of tumour cells in culture as compared with associated stromal cells, such as fibroblasts [
5]. To overcome this issue, most established breast cancer lines have been derived from pleural effusions, which provide an abundance of dissociated, aggressive tumour cells with very few contaminating cell types. The pattern of growth of these tumour cells is characterized by a slow initial proliferation, followed by exponential expansion of a few cells, suggestive of clonal selection for cells that are particularly proliferative and amenable to culture [
6‐
8].
Another caveat of cell culture is the loss of the in vivo microenvironment (changes summarized in [
9]). During the derivation process, tumour cells are removed from a very complex, partially hypoxic three-dimensional microenvironment; maintained in nutrient media supplemented with a surplus of growth factors, including glucose; and passaged indefinitely at relatively high atmospheric oxygen levels. In such a drastically altered microenvironment, it would not be surprising if cell lines differed substantially from the tumours they were established to represent.
Genomic and transcriptional differences between cancer cell lines and tumour samples have been investigated in several studies [
10‐
13]. For example, in gliomas, it was shown that expression profiles of tumour cell primary cultures were much closer to profiles obtained from clinically resected tumours than to profiles of immortalized cancer cell lines [
14]. In breast cancer, clustering based on expression profiles has elucidated the many clinically relevant subtypes in cell lines and tumours (summarized in [
15]) [
16‐
20]. However, modern RNA-sequencing (RNA-seq) data have not yet been used to directly compare the expression profiles of breast cancer cell lines with breast tumours. As well, in vitro signatures are the combined effect of adaptation to cell culture and selection for specific cellular subtypes. Dissecting out the influence of either of these two phenomena has remained a substantial obstacle in any cell line–tumour transcriptional comparison.
Recent transcriptional profiling of a collection of breast cancer cell lines [
21] and hundreds of tumours from The Cancer Genome Atlas (TCGA) [
19] has enabled a direct mRNA comparison of cell lines and tumours. In this study, we focus on RNA-seq transcriptional profiles mined from TCGA and the Gene Expression Omnibus (GEO) series [GEO:GSE48213] and investigate the strengths and weaknesses of cell lines as in vitro breast cancer models. In addition, we seek to identify the breast cancer cell lines that are most transcriptionally representative of their respective tumour subtype. Importantly, we are able to correlate most of the highly differentially expressed genes to tumour stromal or immune signatures, highlighting the importance of considering the entire niche in cancer modelling. Finally, we summarize relevant breast cancer cell line genomic alterations. In our study, we used RNA-seq data to broaden the dynamic range of transcript detection and extend earlier efforts by including more cell lines and by considering and quantifying stromal and immune cell contributions to help elucidate the origin of detected differences.
Discussion
This study is the first transcriptional comparison of cancer cell lines and tumours to methodologically account for the contributions of tumour stromal and immune cellular components. We demonstrate, for the first time to our knowledge, using RNA-seq data, that breast cancer cell lines generally represent breast tumours, with notable exceptions. First, many extracellular proteins thought to be lost in breast cancers may actually be supplied in situ by the stroma. Second, many genes associated with proliferation and metabolism are highly expressed in culture. Hence, whereas certain aspects of breast cancer biology can be studied using breast cancer cell lines alone, others (in particular those involving factors in the extracellular space) should include additional relevant cell types.
This study revealed that, in general, basal/ER− cell lines were more representative of their respective tumours than luminal/ER+ cell lines. In addition, 60 % of cell lines in this study were ER−, as compared with only 23 % of the primary tumours (
p < 0.0001 by two-tailed Fisher’s exact test), an overrepresentation of the ER− status in cell lines, which has been observed previously [
1]. The reason for this discrepancy remains unknown. However, it may be due to the fact that most cell lines were obtained from metastatic tumours and pleural effusions and thus represent the most aggressive variants that could be adapted to culture (a trend previously reported in renal cancer [
36]). We would expect this phenomenon to be especially pronounced for the ER+/luminal subtype, which is characteristically a less aggressive subtype of breast cancer. Additionally, as cells are grown in culture, the epithelial phenotype is lost in favour of more mesenchymal traits, a type of in vitro epithelial–mesenchymal transition which would result in greater transcriptional distance between the more epithelial ER+/luminal cell lines and the respective tumours [
1].
Despite the transcriptional differences between cell lines and tumours, we were nonetheless interested in determining the most transcriptionally representative breast cancer cell lines. In our analysis, we found that the correlation coefficients of individual breast cancer cell lines versus tumours varied from 0.41 to 0.58. This was remarkably similar to the range of 0.43–0.60 that was observed in an analysis of ovarian cell lines and tumours [
11]. Interestingly, the top correlation of any individual cell line could be exceeded by a fictional cell line composed of the averages of all cell line gene expression values (luminal, 0.52 for BT483 vs. 0.62; basal, 0.58 for HCC70 vs. 0.60) (data not shown). This points to the importance of including multiple cell lines in any analysis to ensure that any observed phenomenon is not a product of a single outlier.
A fundamental limitation of cell culture models is that the environment created by culture conditions is markedly different from the breast cancer microenvironment [
9]. The loss of stromal and immune cells in culture is one major drawback of monoculture models. Emerging evidence supports the notion that tumour stromal cells play exceptionally important roles in tumour initiation, progression and metastasis [
37‐
39]. In fact, studies have shown that depletion of fibroblast activation protein–expressing stromal cells leads to suppression of primary tumour growth and metastasis [
40]. Our research indicates that loss of the stromal and immune components is the principal transcriptional difference between cell lines and tumours. It also suggests that the stroma has a unique and significant role that often is not accounted for in in vitro studies. For example, several studies have looked at the expression levels and functional roles of various Wnt antagonists (e.g., secreted frizzled-related proteins [
SFRPs]) in cell culture, and researchers have drawn conclusions about their absence and mechanisms of action in this context [
41‐
44]. However, given that we found the expression levels of various
SFRPs to be high in tumours and strongly correlated with stromal scores, we should recognize that looking at these proteins in tumour cell monoculture may not be appropriate. In fact, given their roles as matricellular proteins, it would not be surprising if their effects in vivo are quite different than those observed in vitro.
In broader investigations using gene set enrichment analysis, we observed an enrichment in cell line proliferative and metabolic gene sets, similar to those reported in other studies [
45‐
47]. The upregulation of these gene sets could be due to two phenomena: (1) malignant cellular adaptation/selection or (2) genes more highly expressed in the malignant cells are upregulated in cell lines as a result of the enrichment of this cell subtype in culture. If the latter is true, we would expect a negative correlation with stromal/tumour purity score. For one of the gene sets, DNA replication, we observed such a negative correlation with stromal score (
r = −0.27). Thus, the expansion of malignant cells in cell culture likely plays a role in the upregulation of this gene set. However, none of the other upregulated proliferative/metabolic gene sets display this correlation. This suggests, on the one hand, that either the derivation process or the continuous culturing of cell lines selects for a highly proliferative subset of cells. On the other hand, many of the underrepresented gene sets were matrix- or immune-related and tightly correlated with stromal or immune scores, once again indicating that loss of the stromal and immune compartments has pronounced consequences in transcriptional programs observed in cell culture.
Competing interests
The authors declare they have no competing interests.
Authors’ contributions
KMV, SDF and LMP conceived and designed the analysis. KMV prepared the data and performed all data analyses. All authors participated in interpreting the results. KMV wrote the manuscript. SDF and LMP participated in revising the manuscript. All authors read and approved the final manuscript.