Introduction
Global messenger RNA expression analyses of human breast cancers have established five “intrinsic” molecular subtypes: luminal A, luminal B, basal-like, HER2-enriched, and the recently characterized claudin-low group [
1]. These molecular entities have shown significant differences in incidence, survival, and responsiveness to therapies [
1‐
4], and their information complements and expands the information provided by the classical clinical–pathological markers [
5‐
8]. Importantly, studies focused on intrinsic molecular subtyping are improving our understanding of the biologic heterogeneity of breast cancer and its developmental cell(s) of origin [
1,
9‐
11].
Although, the ideal preclinical study should be performed with human tumor samples that represent the complete spectrum of the disease, this type of research is being hampered, in part, by the lack of appropriate in vivo assays. Complementary to this approach are in vitro studies focused on tumor- or normal tissue-derived cell lines, all of which are being extensively used by the breast cancer research community [
12]. Many of these cell lines have served as model systems to either dissect the biology of breast cancer and/or develop novel treatment strategies that are further tested in patients. In some cases, these studies have led to improvements for cancer patients. For example, the estrogen receptor (ER)-positive MCF-7 cell line has been useful for the study of the estrogen pathway and the development of efficacious anti-hormonal therapies such as tamoxifen [
13,
14], while HER2-amplified SKBR3 and BT474 cell lines have helped to elucidate various mechanisms of resistance to anti-HER2 therapies [
15,
16]. However, these preclinical studies have had limited impact in the management of breast cancer patients [
17,
18], partly due to the incomplete understanding of the similarities and differences between these in vitro model systems and their relevant in vivo tumor counterparts.
Previous work has shown that the main genetic and transcriptional features of breast tumors are present in cell lines [
19‐
22]. In 2006, Neve et al. [
19] identified two major groups (basal and luminal) in a panel of ~50 breast cancer cell lines by independently comparing the global expression profiles of cell lines and primary breast tumors. Interestingly, the basal cluster was further subdivided into two subgroups: basal-A, which resembled the basal-like signature in primary breast tumors [
2,
3] and basal-B, which exhibited a mesenchymal and a cancer stem cell (CSC)-like profile that was less similar to primary basal-like tumors. The identification of the basal-B group has been confirmed by three other groups [
21‐
23], with one group calling them normal-like [
23]. More recently, we and others have shown that a subgroup of 9 (MDA-MB231, SUM159PT, MDA-MB157, BT549, SUM1315MO2, MDA-MB436, MDA-MB435, HBL100 and Hs578T) of the 12 basal-B breast cancer cell lines best resemble the recently characterized claudin-low tumor subtype [
1,
24]. However, it is still unknown if all the intrinsic subtypes are represented in cell lines.
In the human mammary gland, four subpopulations of cells have been identified and functionally characterized [
9]. By utilizing a combination of fluorescence-activated cell sorting (FACS) with EpCAM and CD49f cell surface markers and a series of in vitro and in vivo experiments, Lim et al. [
9] observed that the normal breast tissues have at least four subpopulations enriched with mammary stem cells/bipotent progenitors (MaSC/BiPs), luminal progenitors (pLs), mature luminal cells (mLs), and stromal cells (after excluding lineage positive cells, i.e., lymphocytes, red blood and endothelial cells). Using Lim et al.’s [
9] gene expression data, we subsequently reported a differentiation model that tracks the epithelial differentiation hierarchy (MaSC/BiP → pL → mL) and is prognostically relevant. More importantly, we showed that the tumor intrinsic subtypes recapitulate the normal breast epithelial differentiation hierarchy, where claudin-low tumors and cell lines are the most similar to the MaSC/BiPs [
1,
10]. These and other findings have led to new hypotheses regarding the potential cell of origin and/or transformation of the different breast cancer subtypes [
10,
25,
26]. However, it is unknown where other cell lines, including normal human mammary epithelial cells (HMECs), fall into this hierarchy. Still less is known about the relationship of adult human mesenchymal stem cells (hMSCs) and embryonic stem cells (hESCs) to different breast tumor subtypes and cell lines.
In this report, we evaluated a large in vitro panel of breast cell lines and compared their features with (1) tumors, (2) four cell subpopulations of the normal breast, and (3) hMSC and hESC. Specifically, we show that all of the tumor subtypes except the luminal A and normal breast-like are well represented in cell lines. In addition, we observed that the cell lines recapitulate many of the features of each normal breast cell subpopulation identified using FACS.
Discussion
In this report, we have characterized the phenotypic and molecular features of a large panel of cell lines derived from breast cancers and normal mammary tissues, and we have linked these features with the intrinsic subtypes of breast tumors, FACS enriched cell subpopulations of the normal mammary gland, and two types of true stem cells. Specifically, we made the following observations: (1) BCCLs in general resemble all the intrinsic subtypes of breast cancer except for luminal A, (2) BCCLs recapitulate all the differentiation statuses observed in the normal breast with HMECs best resembling the MaSC/BiP-enriched subpopulation, (3) subpopulations of cells with claudin-low and basal-like features are typically found within the subset of triple-negative cancer cell lines with overall basal-like features, and (4) within these mixed basal-like cell lines (or primary tumor xenografts WashU-WHIM2) the EpCAM+/CD49f+ cells are more proliferative and more tumorigenic than the Claudin-low-like EpCAM−/CD49f+ fraction, which is more motile.
Established in vitro breast cancer cell lines are being extensively used by the research community to address various aspects of cancer biology [
12,
38‐
40]. Our data indicate that cell lines do recapitulate most of the differentiation states observed in breast cancer; however, we did not identify cell lines that resemble the good prognosis luminal A tumor type, which is the most frequent subtype identified in breast cancer [
1‐
4]. One potential explanation for this finding is that the vast majority of luminal cell lines have been derived from metastatic tumor samples, such as pleural effusions (i.e., MCF7, T47D) or ascites (i.e., ZR75-1), therefore, introducing a selection bias toward more aggressive subtypes, such as those observed in the poor prognostic luminal B subtype. In addition, 2D in vitro assay itself and/or the media conditions used for cell culture might be a harsh environment for luminal A-like cells which is also reflected by the fact that despite the observation that 10 % of lineage-negative cells in the normal breast FACS experiments are mL or pL. However, none of these cells could be readily identified in our 2D cultures of primary HMECs. In fact, the percentage of success of obtaining a cell line from ER+ primary tumors has been reported to be <10 % [
41,
42]. This suggests that only cells with low adherence, high proliferation, and migration capabilities are more likely to be selected for further passage, thus precluding the establishment of low proliferative and highly adherent luminal A/mL cells. This hypothesis could explain why among the 65 BCCLs evaluated, 66 % (43/65) are ER-negative, which is clearly not representative of the subtype incidence in patients.
The overall gene expression profiles of the cell lines that technically overlapped (
n = 52) across four independent cell line data sets were highly similar. However, seven (13 %) discrepancies were noted. Most of these discrepancies occurred in cell lines whose gene expression profiles were found borderline between two subtypes, except for HCC1500 cell line (Supplemental material). For example, ER-negative/HER2-negative MDA-MB468 cell line is basal-like in two data sets (Hollestelle et al. [
21] and UNC105), and shows borderline significance for HER2-enriched in the other two data sets, while ER-positive/HER2-amplified BT474 is called HER2-enriched in three data sets and luminal B in Kao et al. [
22]. This finding could be explained by the specific genotypic/phenotypic features of these cell lines that are also observed in the two subtypes. For example, BT474 is a known ER+/HER2-amplified cell line [
43,
44]; while MDA-MB468 is a ER-negative/HER2-negative cell line with EGFR amplification [
45], which might activate, in part, the HER2 pathway as in a HER2-amplified tumor.
The cell line data presented here also support our previously reported relationship between the basal-like and the claudin-low phenotypes [
1]. Namely, we observed that the three ER-negative/HER2-negative cell lines classified as basal-like (HCC1143, SUM149PT) or claudin-low (HCC38) have basal-like and claudin-low subpopulations of cells within them, albeit with different proportions. Besides, similar to EpCAM
−/low/CD49f
+ cells in SUM149PT [
1], claudin-low EpCAM
−/low/CD49f
+ cells from HCC1143 cell lines can differentiate and give rise to basal-like EpCAM
+/CD49f
+ cells. In vivo, tumors obtained from the EpCAM
−/low and EpCAM
+/high fractions show a FACS profile similar to the starting cell line (or tumor for WashU-WHIM2). Thus, even when only EpCAM
−/low claudin-low-like cells are used, the natural state and balance are re-established both in vitro and in vivo.
Furthermore, we have shown that despite expressing different levels of surface markers CD44 and CD24, the gene expression differences between EpCAM
−/low/CD49f
+ versus EpCAM
+/CD49f
+ cells within each cell line are highly similar across all the three cell lines, suggesting that the similar biological events (e.g., migration capability) are occurring between these two fractions. However, it is important to note that we did not evaluate other stem cell or TIC markers such as ALDH1 [
46], and that the Matrigel used during the xenotransplantation assay can influence the properties of stem cells and TICs [
47,
48]. In any case, recent RNAi knockdown experiments in the SUM149PT cell line have identified Smarcd3/Baf60c, and thus the SWI/SNF chromatin-remodeling complex, as a key mediator of this EMT by activating WNT signaling pathways [
49].
Human epithelial cell lines derived from normal breast tissue are being extensively used by the research community either as primary cells or after immortalization by exogenous hTERT transduction [
30,
50,
51]. Although, speculation of their basal origin and MaSC/BiP capacity has been previously suggested by others [
51], no study to the best of our knowledge has specifically addressed to which epithelial cell-type these cell lines best resemble. Using a genomic, FACS and IF staining analyses with luminal, basal and mesenchymal markers, we observed that both immortalized and primary HMECs in the pre-stasis stage [
52] resemble a phenotype similar to the MaSC/BiPs-enriched subpopulation as defined by Lim et al. [
9]. Indeed, we observed that the vast majority of cells within HMECs express high levels of basal keratin 5 and are vimentin-positive. This is concordant with our data and Lim et al.’s [
9] data showing that the highest percentage of keratin 5 and vimentin positivity is observed in the MaSC/BiP subpopulation. On the other hand, when compared to tumors, HMECs showed a differentiation state between the claudin-low and the basal-like tumor subtype, concordant overall with a simultaneous mesenchymal and basal state within these cells.
We and others have previously shown that the claudin-low tumors and cell lines are enriched for CSC biological processes [
1,
53‐
56]. In this report, we have observed that although this subtype is more similar to the MaSC/BiP-enriched subpopulation than the other breast cancer subtypes, claudin-low cell lines show a loss of epithelial markers with acquisition of a stromal state that also resembles the stromal-enriched subpopulation (i.e., fibroblasts) as defined by Lim et al. [
9]. This is concordant with the seminal article by Mani et al. [
30] showing that the acquisition of a full epithelial-to-mesenchymal transition after transfecting EMT-inducing transcription factors TWIST1 or SNAI1 into an immortalized HMEC increases the self-renewal capacity (a feature of stemness [
51]) of the cells, and when transformed with KRAS oncogene allows to form tumors more efficiently in nude mice. In this report, using the same cell line variants developed by Mani et al. [
30], together with a combination of genomics and EpCAM and CD49f surface markers, we have shown that this mesenchymal transformation actually resembles a MaSC/BiP → stromal direction. Nonetheless, Battula et al. [
34] have further characterized these EMT-derived HMECs and have shown that these cells are similar to bone marrow-derived mesenchymal stem cells with the capacity to differentiate into multiple tissue lineages such as osteoblasts, chondrocytes, and adipocytes. Intriguingly, transformation into tissue types other than the ones found in the mammary gland, such as, bone or cartilage is also observed in metaplastic tumors [
57,
58], a rare histological type of breast cancer associated with poor prognosis and enriched for CSC/claudin-low profiles [
56,
59]. Overall, these data suggest that the acquisition of a full mesenchymal state induces a multi-potent state more similar to mesenchymal stem cells than the more restricted MaSC/BiP, which seem to be in a partial mesenchymal and basal state. Thus, claudin-low tumors and cell lines might have an origin in a yet unidentified cell-type that is less differentiated than the MaSC/BiP-enriched subpopulation as defined in Lim et al. [
9]. Conversely, the cell of origin of claudin-low and basal-like tumors could still be a MaSC/BiP phenotype, featuring various degrees of the EMT induction with claudin-low cells going to the full EMT state. Alternatively, the cell of origin of claudin-low tumors could be a highly undifferentiated normal cell that already expresses these stromal features, thus without the need for an EMT transition. Further studies that combine molecular profiling and lineage tracing experiments are needed to determine the cell of origin of each subtype.
To conclude, the integration of global gene expression data of cell lines with tumors and normal cell subpopulations is a novel strategy and could be used in other tumor types since it allows determining objectively which tumor or cell-type each cell line best resembles. The results presented here should also help to improve our understanding of the widely used encyclopedia of breast cell line models, and provide more precise tools for the study of breast cancers.
Materials and methods
UNC human breast tumor and cell line microarray data sets
For human tumor and normal tissue samples, we used all the microarrays and clinical data from Prat et al. (UNC337, GSE18229) [
1]. For cell lines and sorted tissue, RNA was purified using RNeasy Mini kit and profiled as described previously using oligo microarrays (Agilent Technologies, USA) [
60]. All microarray cell line data has been deposited in the Gene Expression Omnibus under the accession number GSE50470 (referred to here as UNC105). The probes or genes of the combined UNC337 and UNC105 data set for all analyses were filtered by requiring the lowess normalized intensity values in both sample and control to be >10. The normalized log 2 ratios (Cy5 sample/Cy3 control) of probes mapping to the same gene (Entrez ID as defined by the manufacturer) were averaged to generate independent expression estimates.
Integration of three independent cell line data sets to the UNC337-UNC105 set
We used our cohort of cell lines (UNC105) and three publicly available microarray cell line data from the following data sets: Neve et al. (
http://icbp.lbl.gov/ccc/index.php) [
19], Hollestelle et al. (GSE16795) [
21], and Kao et al. (
http://smd.stanford.edu/) [
22]. For all publicly data sets, raw data was normalized using the robust multi-array analysis normalization approach. To integrate all the datasets, we assumed that the five matched cell lines that are common to all four cohorts were the same and thus used them as controls. In supplemental material, a diagram summarizes the different microarray data sets analyzed in the different figures and the combination strategy for molecular subtyping each cell line.
Intrinsic subtype classification of cell lines
For the basal-like, HER2-enriched, luminal A, luminal B, and normal breast-like intrinsic subtype classification, we calculated the distance of each cell line to each of the tumor subtype centroids, and assigned a subtype call where the lowest distance was identified. Next, claudin-low cell lines were identified using the previously reported 9-cell line claudin-low predictor [
1]. Samples identified as claudin-low were called claudin-low regardless of the previous subtype call. Euclidian distances and subtype calls for all cell lines are provided in Supplemental data.
Breast cancer cell lines, and immortalized HMEC/HMFs
SUM159PT (Asterand) and SUM1315O2 cells (Asterand) were maintained in Ham’s F12 with 5 % fetal bovine serum (FBS), insulin (5 μg/ml), hydrocortisone (1 μg/ml, SUM159PT-only), and EGF (10 ng/ml, SUM1315O2-only). MCF-7, BT474, SKBR3, HCC1428, HCC1187, HCC1143, BT549, HCC1395, HCC38, UAC893, ZR75-1, HCC1500, T47D, and HCC1937 were cultured in RPMI with 10 % FBS [
61]. SUM149PT was maintained in HuMEC media with supplements (Gibco) with 5 % FBS [
62]. MDA-MB231, Hs578T, and MDA-MB436 were cultured in DMEM (high glucose) with 10 % FBS. HME-CC (BABE) [
61], SUM102PT, HMLE, HMLE-SNAI1, HMLE-TWIST1, and HME31-hTERT no. 16C (ME16C) [
61] were cultured in HuMEC media with supplements (Gibco). MDA-MB468 was cultured in Leibovitz’s L-15 medium with 10 % FBS. HMLE, HMLE-SNAI1, and HMLE-TWIST1 cell lines were a kind gift of Sendurai A. Mani (University of Texas M.D. Anderson Cancer Center). An immortalized human mammary fibroblast cell line (called here HMF4) was a kind gift of Charlotte Kuperwasser (Tufts University School of Medicine). All cell lines were grown at 37 °C and 5 % carbon dioxide, and were obtained from the American Type Culture Collection unless otherwise specified. We also obtained total RNA from the following collaborators: Jeffrey M. Rosen and Rachel Schiff (Baylor College of Medicine; MCF10A, MDAMB415, MDAMB435, MDAMB134; BT483, CAMA1, UACC812, ZR75B); Ned Sharpless (UNC; UACC893); Sendurai A. Mani and Wendy Woodward (University of Texas M.D. Anderson Cancer Center; MCF12A, MCF12F, MDAIBC3, SUM190PT).
Mammary tissue and xenograft tumor tissue preparations
Fresh human normal breast tissues from five reduction mammoplasties were obtained using Institutional Review Board approved protocols. Unless otherwise stated, all reagents were from Stem Cell Technologies. Samples were minced and digested at 37 °C for 16 h in DMEM/F12 (GIBCO #11330) containing 0.5 μg/ml hydrocortisone, 5 μg/ml insulin, and 1× collagenase/hyaluronidase (#07912). Xenograft tumor tissues were dissociated for 2 h. The pellet from digested tissue was resuspended by pipetting for 5 min in warm 0.05 % trypsin–EDTA (GIBCO # 25300054) followed by addition of 1:10 mixture of DNase I (#07900), and Dispase (#07923). Red blood cells were removed by lysis in 1:4 mixture of cold Hanks’ balanced salt solution (#37150) containing 2 % FBS (HF) and 0.8 % ammonium chloride solution (#07850). Cells were resuspended in HF and filtered through a 40 μm cell strainer (BD Falcon #352340) to obtain single cell suspensions.
Isolation of primary HMECs
Tissue obtained from four reduction mammoplasties were processed to obtain organoids. For this purpose dissociated tissue, as described above, were passed through 40 μm cell strainers. Organoids were collected from the top of the strainers using HMEC culture media, plated in 2D cultures and maintained in HuMEC media with supplements (Gibco). RNA was purified from all primary HMECs before passage 3 (pre-stasis stage) [
52]. We also obtained total RNA of four primary HMECs isolated by Pilar Blancafort (UNC; HMECPB1, HMECPB2, HMECPB3, HMECPB4) [
63].
Isolation of primary HMFs
Single cells suspensions obtained from dissociation of three independent reduction mammoplasties as described above were cultured in DMEM/F-12 medium with 10 % FBS.
hESC and hMSCs
Two independent NIH hESC cell lines (H9 and H7) were obtained from the University of North Carolina Embryonic Stem Cell Core directed by B. Matthew Fagan. Commercially available hMSCs were purchased from Millipore, PromoCell, and Lonza.
Flow cytometry
Cells obtained from dissociated normal or tumor tissue, or trypsinized cell lines were counted, washed with HF, and stained for 30 min at 4 °C with antibodies specific for human cell surface markers from BD Pharmingen, except otherwise noted: EpCAM-FITC (Stem Cell Technologies, #10109), CD49f-PE-Cy5, (#551129), CD24-PE (#555428), CD44-APC (#559942), CD31-FITC (#555445), and CD45-FITC(#555482). Cells were washed from unbound antibodies and immediately analyzed using Beckman-Coulter (Dako) CyAn ADP or sorted using iCyt Reflection instrument. Cell viability was determined by using either blue-fluorescent reactive dye (Molecular Probes #L23105) or 7AAD (Molecular Probes #A1310). Dead cells and cells positive for lineage markers CD31 and CD45 were removed during sorting experiments. RNA was purified from sorted cells using RNeasy Mini kit (Qiagen).
Cell proliferation assay
Thousand cells from each sorted fraction were plated in 36 wells of a 96-well plate. At each time point, 20 μl of MTS-PES reagent was added in each well as provided in the CellTiter 96® AQueous One Solution Cell Proliferation Assay (Promega, USA), and we recorded its absorbance at 490 nm after 1 h of incubation. Three replicates for each time point and cell line were measured.
Immunofluorescence
Cell lines and normal breasts were processed using standard immunofluorescence staining methods as previously described [
4]. The primary antibodies and their dilution were anti-vimentin (mouse anti-human IgG1-Kappa, dilution 1:100; Invitrogen/Zymed), anti-cytokeratin 5 (rabbit anti-human/mouse, dilution 1:50; Abcam, #ab24647), anti-cytokeratin 8 (CAM 5.2, mouse anti-human, dilution 1:2; Becton–Dickinson, #349205 and Zymed 18-0213, monoclonal, dilution 1:50).
TIC experiments
Luciferase stable SUM149PT cell line and tumors obtained from WashU-WHIM2 xenograft model were FAC sorted into subpopulations based on EpCAM and CD49F expression as described earlier. FAC-sorted cell fractions were placed on HuMEC media with supplements, 5 % FBS and 5 % Matrigel™. For SUM149PT cell line, three different aliquots containing 100, 1,000, and 10,000 cells were injected into five nude mice each. Tumor volume was measured every 5–7 days by caliper in two dimensions. Experiments were done in triplicate. For the WashU-WHIM2 model, 250,000 cells of each fraction were injected in 4 NOD scid gamma mice.
Statistical analyses
Biologic analysis of microarray data was performed with DAVID annotation tool (
http://david.abcc.ncifcrf.gov/) [
64. SAM was performed in Excel as previously described) [
1]. ANOVA, Student’s
t tests, and exact hypergeometric probability for gene expression data and Pearson correlation for protein–gene expression were performed using R (
http://cran.r-project.org). Reported
p are two-sided.