Introduction
Breast cancer is a heterogeneous disease with significant mortality associated withmetastatic progression. Classification subdivides human breast cancer into sixcategories including Luminal A, Luminal B, HER2+, Basal, Claudin-low and normal-like[
1]. Recent work suggests additionalsubclasses exist within each intrinsic subtype including three basal subtypes withstriking differences in overall survival [
2].Further, The Cancer Genome Atlas (TCGA) and the Encyclopedia of DNA Elements (ENCODE)projects show remarkable variability in genetic alterations beyond gene expression bothacross and within subtypes of human breast cancer. Together these genomic analysesdemonstrate the complex nature of human breast cancer.
To more readily study mechanisms leading to breast cancer, research has turned to themouse as a model. Mouse models of breast cancer have employed various methods ofinitiation, including mouse mammary tumor virus (MMTV) infection, chemical mutagenesisand genetically engineered mice (GEM). This pioneering work identified and tested therole of many oncogenes in breast cancer. With the insertion of MMTV into the genome,numerous key oncogenes were uncovered [
3,
4]. The later development of MMTV driven transgenics allowed fordevelopment of spontaneous models. With the identification of human epithelial growthfactor receptor 2 (HER2) amplification in human breast cancer [
5,
6], the observation that MMTV drivenexpression of the activated rat form of HER2 (NeuNT) resulted in breast cancerreinforced the importance of HER2 as a driving oncogene [
7]. More recently, models have been refined to include tissuespecific activation resulting in gene amplification, analogous to human HER2+ breastcancer [
8], as well as temporal control wheretransgene expression can be activated or inactivated [
9].
Individual mouse models have been used to model aspects of human breast cancer and theselection of the appropriate model to compare to human breast cancer has been directedby phenotype or known genetic events. For instance, the MMTV-PyMT model is widely usedto examine metastasis [
10] while P53 knockoutmammary epithelium transplanted into wild type hosts results in tumors with variousgenetic mutations [
11]. Another aspect is thehistological subtype associated with various tumors in GEM models and the metastaticability can be altered with background [
12].Indeed, similarities between mouse models such as Neu and Wnt as well as their humancounterparts have been previously noted [
13,
14]. Importantly, in both human breast cancer and inmany GEM models, there is significant histological heterogeneity [
15‐
17].These attributes illustrate the importance and utility of mouse models to examine breastcancer.
With the number and variety of GEM models, it is important to consider how accuratelythese various systems model human breast cancer. Initial studies using intrinsicclustering revealed similarities between mouse models and human breast cancer, albeit ina limited number of samples [
18]. Yet, a moredetailed characterization of a larger number of p53 null tumors revealed a variety ofsubtypes with strong similarities to human breast cancer [
11], revealing the importance of examining a large number ofsamples to capture tumor heterogeneity and variability. Further, expanding the number ofMyc induced tumors revealed that a subpopulation of Myc induced tumors had similaritiesto claudin-low human breast cancer [
19]. Takentogether, recent comparative studies [
11,
17,
19‐
22] highlighted a clear need for acomprehensive examination of the genomic features of mouse models of breast cancer andtheir relation to human breast cancer. To this end, we assembled an expansive dataset ofmouse models of breast cancer. This dataset reveals the genomic heterogeneity of mousemodels and offers a predictive resource for essential cell signaling pathways.Importantly, all comparisons between all models are made available with our report.These data demonstrate the similarities and differences of the various subtypes of mousemodels to the key subtypes of human breast cancer and underscore the necessity for aninformed choice of the appropriate mouse model for studying specific types of humanbreast cancer.
Discussion
Here we have described the genomic analysis of a dataset composed of publicly availablegene expression data for mouse models of breast cancer. These data have been analyzedthrough a variety of mechanisms to ask how mouse models are distinct, what propertiesthey share and how they reflect human breast cancer. These data indicate that great careshould be taken to appropriately choose the mouse model to use and that a genomic andhistological characterization of tumors should be completed followingexperimentation.
In the examination of mouse models in the database, unsupervised hierarchical clusteringrevealed significant heterogeneity both between models and within models and waspronounced in tumor models with a large number of samples. Between model differenceswere fully expected given the unique initiating events causing tumor formation. However,prior studies with relatively few samples for each model did not demonstrate extensivewithin model heterogeneity [
18]. In comparison,we have demonstrated extensive heterogeneity within many models. In part this is due todifferences between intrinsic clustering methods [
80] and unsupervised hierarchical clustering. However, given that wehave noted corresponding differences in fold change, GSEA predictions and pathwaysignature probabilities, it is likely that this is a reflection of the number of samplesused in the analysis. As such, this provides an important caution to characterize asufficiently large population of tumors to capture heterogeneity in the analysis.
Given that there is typically a predominant histological pattern associated with a givenGEM tumor type [
81], it is not surprising thatthere is a predominant genomic pattern. Indeed, we noted for many models that histologyis predictive of the genomic subtype. Interestingly, this histological and genomicinteraction is capable of spanning tumor initiating events from different mouse models.Indeed, EMT and spindle-type tumors from diverse models clustered together and weredistinct from the non-EMT samples originating in the same model system. Thus, it is alsocritical for investigators to analyze all tumors from a given model for bothhistological and genomic patterns.
Mouse models were also investigated individually in comparison to the entire datasetusing a variety of methods. This revealed characteristic gene expression patterns at thefold change level, specific GSEA enrichment effects and key pathway signaturedifferences. In many cases, these results correlated with prior studies. For instance,annotation of fold change results predicted that Neu induced tumors upregulated Krox 20which is consistent with previous chromatin immunoprecipitation (ChIP) results[
82]. When pathway signatures wereexamined, there were a large number of predictions that could be made for pathways usedin specific GEM tumor models. Importantly, while these pathway signatures havepreviously been validated [
2], the model by modelpathway predictions shown in Table
2 are highly consistentwith previously published tests. For instance, the pathway signatures predicted a highprobability of Src activation in PyMT tumors in the FVB background and recent work hasdemonstrated the necessity for c-Src in PyMT induced tumors [
76]. Collectively, for the pathways listed in Table
2, we note agreement between the pathway signature predictions andthe reported genetic crosses. Moreover, the pathway signature predictions are alsoreflective of additional mutations that accumulate in the samples. This was noted in theMyc and TAG induced tumors where the Ras signature was predicted to be elevated,consistent with the large number of Ras activating mutations in these strains[
15,
77]. Given thatnumerous published genetic tests are in agreement with the pathway predictions, theremaining cell signaling pathway predictions offer a large number of testablehypotheses. In the future, pathway predictions in the various models should prove to bean important resource for initiating studies into investigating the importance ofvarious signaling pathways in tumor biology.
One of the key aspects of this study was the comparison between mouse models and humanbreast cancer. These data demonstrated similarities and differences between the twogroups and should serve as an important consideration when attempting to extend thecomparison of mouse models to human cancer. Taking into account the clustering data, wereadily noted that the heterogeneity between human breast cancer samples was presentwithin individual mouse models. Despite capturing the genomic diversity of the samples,we noted several samples with no genomic similarity to human breast cancer, includingtumors from strains with other samples that had clear similarity to human breast cancer.This clearly suggests that if conclusions are to be drawn from mouse models of breastcancer, that the mouse samples should be compared and clustered with a variety of humantumors.
In addition to clustering of genomic data, we compared mouse models to human breastcancer through signaling pathway activation predictions. These results showed that forany given group of human breast cancer samples, there was a mouse model with similarpathway activation profiles. Using these results, it is possible to select the mousemodel that most closely represents a group of human breast cancer for the signalingpathways of interest. However, it is critical to consider both clustering and pathwayactivation and to combine these methods to choose the most appropriate model to mimichuman breast cancer. For example, to model HER2+ breast cancer and to study the role ofHER2 in tumor development, research initially used the MMTV-Neu mice [
7]. However, the gene expression data reveals that thisstrain does not associate with the HER2+ human samples through genomic clustering.However, mixture modeling indicated that a proportion of HER + human cancersdid group with the MMTV-Neu samples at the level of pathway activation. This indicatesthat in some aspects the mouse model is appropriately related to human HER2+ breastcancer. Further, recent reports demonstrate that a strain of mice with conditionalactivation of Neu under the control of the endogenous promoter which undergoamplification [
8] far more closely recapitulatehuman HER2+ breast cancer [
21]. Taken together,these data illustrate the importance of fully characterizing and using all genomicinformation to select the appropriate model for examination.
Recent reports have described the development of serially transplantable human breastcancer samples that are grown in a murine host with clear genomic similarity to theprimary human breast cancer samples [
83] andobviously this is an optimal model for specific studies. However, there is clear utilityfor GEM models, especially with regard to the ability to ask defined genetic questionswith regard to key signaling pathways in tumor biology. As such, the priorcharacterization of mouse and human breast cancer similarities was a criticaldevelopment [
18]. The expanded number of samplesand methods of analysis in this report have clearly illustrated additional components ofmouse breast cancer biology that require careful consideration. Indeed, the extent ofgenomic heterogeneity was only appreciated previously for select models [
11,
15‐
17], but our work indicates that this is a generalcharacteristic across the majority of breast cancer model systems. As such, this workunderscores the requirement to fully characterize mouse tumor biology at histologicaland genomic levels before a valid comparison to human breast cancer may be drawn. Thus,we have provided the complete files for all of the comparisons made in this manuscript,from fold change between models to GSEA and pathway predictions, with the intent of thisbeing used as a resource to choose and compare mouse models in breast cancerresearch.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
DH and EA conceived of the study, and participated in its design and coordination andhelped to draft the manuscript. DH performed the experiments in this study. DH and EAinterpreted the data. Both authors read and approved the final manuscript.