Introduction
Breast cancer is a clinically heterogeneous disease and consists of many different cell types, including normal and reactive stromal components in addition to the malignant neoplastic compartment. Moreover, it comprises a series of distinct malignant tumours that present diverse cellular features with varying differentiation status, distinct genetic changes, responses to therapy and outcome [
1]. Likewise, the normal breast is also composed of different parenchymal and stromal cell types, with the terminal ductal-lobular unit being the most important feature with regard to neoplasia. The latter is composed of two morphologically recognisable cell types, epithelial cells on the luminal surface and basally located myoepithelial cells. While typical breast cancers have been traditionally regarded as exhibiting characteristics akin to luminal epithelial cells, recent data have shown that some also exhibit, in part or whole, myoepithelial/basal features [
2‐
4]. Based on the restricted expression of genes representing the phenotypes of luminal epithelial and basal cells [
4], major subtypes of breast cancer have been defined and linked to both long term survival [
5] and their response to therapy [
6]. Therefore, detailed characterisation of the normal luminal and myoepithelial/basal phenotypes is a prerequisite for understanding the genetic alterations that occur in breast cancers and how they may impact on disease progression and outcome.
The use of solid tissues, as in most previous breast cancer gene expression analyses, results in greatly enhanced complexity of data because of the widely varying degrees of stromal responses (desmoplasia) and inflammatory infiltrates in individual tumours. Laser capture microdissection partially alleviates this problem in respect to tumour samples, but is unsuited to the large-scale separation of the normal epithelial cell types in breast because of the close contact between these cells. Immunomagnetic separation of individual cell types from normal human breast tissue [
7,
8] and primary breast cancers [
9] has enabled direct comparisons of normal epithelial and malignant epithelial cells to be made. Previous proteomic [
9,
10] and gene expression analyses of such samples [
10‐
13] have established a partial molecular characterisation of the epithelial compartment in the normal breast and breast cancer [
2], but, due to the limitations of technology available at the time of these studies, did not provide a comprehensive comparison of all proteins or transcripts.
Multiple large-scale analytical techniques now make it possible to capture entire transcriptomes of defined cell populations. Breast cancers have been extensively analysed with both expression arrays [
14] and with direct sequencing techniques such as serial analysis of gene expression (SAGE) [
15]. Although several studies have correlated expression data based on microarray and SAGE [
16,
17], a comprehensive genome-wide expression profile using a combination of complementary technologies has not yet been achieved for purified malignant epithelial breast cells in comparison with purified normal breast epithelial cells. In this study, massively parallel signature sequencing (MPSS) [
18,
19] and multiple genome-wide microarrays have been used to analyse immunomagnetically separated normal luminal epithelial cells and primary breast cancers substantially enriched for the neoplastic epithelial component. The aim of this study was to establish a virtually complete coverage of transcripts deregulated in the neoplastic cells of human breast cancer. In addition, expression profiles from normal luminal and myoepithelial cells have been used to identify cell-type specific transcripts and ontologically related gene sets in the differentially expressed tumour epithelial transcriptome. The use of highly enriched cell preparations in combination with a multiplatform approach to their expression analysis has revealed novel markers and potential targets, the clinical significance of some of which has also been examined, using tissue microarrays.
Discussion
Using highly enriched populations of malignant breast epithelial cells and normal epithelial cells, obtained from immunomagnetic cell sorting, we have established genome-wide molecular signatures specific to the epithelial compartments of both the normal and the malignant human breast. Combining gene profiles obtained from different expression platforms, including direct high-throughput sequencing (MPSS) and multiple microarray platforms, yielded a validated transcriptome comprising 8,051 differential transcripts. These data provide a basis for the molecular changes that occur in the transition from normal luminal to malignant epithelial cells, and also allow further analysis of solid breast tumour (neoplastic plus stroma) gene expression studies, enabling those genes of specific epithelial origin to be identified in respect to progression, prediction of outcome and metastasis. The expression data obtained from the normal luminal and myoepithelial cells have extended our previous analysis of these normal cell types [
11], and provide gene sets that can be used to comprehensively specify the epithelial phenotype expressed in breast tumours, as well as defining new markers of each cell type.
The data presented here report for the first time the application and validation of the MPSS sequencing technology to malignant human breast epithelial cells and their normal counterparts. MPSS expression studies of different human cell lines and normal tissues have already shown that this technology represents the most comprehensive sequencing methodology available at present, in terms of gene coverage and quantitative assessment of gene expression [
22,
39]. With over 10
6 sequencing reactions per sample [
18,
19], it is comparable in scope with the now commonly used genome-wide microarray profiling methods, as also used in the present study. Comparative studies of genome wide data sets are entirely dependent on the choice of common denominator for annotation [
40]. By using our sequence based mapping, 97% of MPSS tags could be aligned with individual features on genome-wide microarrays, indicating that the vast majority of the expressed sequence tags in the normal and malignant breast epithelium MPSS libraries represent known transcripts, in agreement with the recent data suggesting that MPSS identifies very few truly novel genes [
39]. Given the significant methodological differences between microarray and MPSS analysis, the fact that more than 65% of our MPSS differential data set showed concordance with expression profiling obtained by several different microarray platforms, represents a good overlap compared with other examples of sequence versus array data [
41]. However, a substantial number of differentially expressed genes (4,149) measured on at least two microarray platforms were not identified as such by MPSS, and a significant number of MPSS differential transcripts (2,440) were not confirmed on any array (Figure
1), implying a relatively high false positive and false negative rate of the MPSS methodology. This probably reflects the known limitations of the MPSS technology [
39], particularly with regards to transcripts that were not detected (zero counts) in one sample, as well as genes lacking appropriate restriction enzyme sites required for this technology. However, individual microarray platforms themselves differ substantially [
42] and a multiplatform approach, as used here, clearly defines a robust DTET seen by every technology.
Another important feature of our DTET is the use of purified epithelial cells, derived by both positive and negative immunomagnetic sorting in which the contamination of malignant samples with stromal cells is reduced to a minimum, and normal luminal and myoepithelial cells are separated from short-term primary cultures. Although the profiling techniques used represent the global transcriptomes of purified normal and neoplastic breast epithelial cells in highly enriched preparations, it is conceivable that even a small contamination of the malignant samples by normal or reactive stromal cells, as well as the induction of inflammatory genes due to
in vitro manipulation, could result in false positives. However, verification of the probable epithelial origin of differentially expressed genes can be obtained by comparing expression data from breast epithelial cell lines [
22], breast tumour cell lines or, as in the present study, by immunohistochemistry, all of which show that, for example, IL8, is a
bona fide epithelial tumour-associated product [
43,
44]. One of the features of normal luminal epithelial cultures is the loss of estrogen receptor expression [
45]. The microarray gene expression profiling currently used to classify breast cancers supports the paradigm that ER status is the most important phenotype in breast cancer and has led to the classification of breast cancers into luminal A (ER-positive good prognosis) and luminal B (ER-positive poor prognosis), and ER-negative myoepithelial/basal and HER2 subtypes, each with distinct differences in prognosis and response to therapy [
4,
5,
46]. Genes identified in this study representing the normal luminal epithelial phenotype are distinct from the subset of genes that are associated with ER expression and are used to classify 'luminal' breast tumours. Thus, we are able to define the luminal phenotype independently of ER status. In contrast, our myoepithelial signature contains several members of the previously reported gene clusters identifying basal-like breast cancers. Some of these have been previously identified as myoepithelial genes in the normal breast epithelium, for example,
TIMP3,
SPARC,
JAG1,
PRSS11 and
CAV-1 [
11], and some of them, such as
S100A7,
SPARC and
CNN1, have previously been shown individually to be correlated to poor outcome [
5,
11,
47]. Since our cell type specific gene signatures were derived from phenotypically well characterised cell types compared to empirical stratification based on expression data, we were also able to identify a range of myoepithelial type genes in ER-positive tumours as well as those in basal-like breast cancers. Thus, although the majority of the primary breast tumours within our malignant pool were ER-positive 'luminal' tumours, a significant number of up-regulated gene sets also showed myoepithelial expression. The observation of myoepithelial genes such as
SFRP2,
DCN,
POSTN,
LUM,
COL1A2 and
COL11A1, which showed higher expression in ER-positive compared to ER-negative breast tumours in two other breast cancer tumour profiling studies [
48,
49], proved the value of such an approach and demonstrated the heterogeneity of breast tumours with respect to the levels of luminal epithelial and myoepithelial gene expression. The potential clinical significance of the expression of myoepithelial/basal genes in ER-positive tumours has been highlighted by recent data showing that the promoter DNA methylation of the classic myoepithelial marker
S100A2 is correlated with a poor prognosis in ER-positive tumours [
50]. In contrast, increased levels of expression of phosphoserine aminotransferase (encoded by
PSAT1), which was another gene also identified in our myoepithelial transcriptome, was the strongest predictive marker for a poor response to tamoxifen therapy in ER-positive tumours [
50]. Our observation that the malignant epithelial expression of POSTN, also a myoepithelial/basal gene, is associated with poorer survival (
P = 0.0083) in ER-positive tumours demonstrates that the normal epithelial annotation of tumour transcripts can identify many other types of myoepithelial/basal genes, including those associated with a poor outcome.
An important question is whether the expression of myoepithelial/basal genes in breast cancers are responsible for the prognosis and poor response to therapy or are merely surrogate markers thereof. There are several lines of evidence to suggest that POSTN may play a role in the biology of breast cancer [
51,
52]. POSTN is a ligand of α
vβ
3 integrins and promotes adhesion and migration of epithelial cells [
51]. Clinical studies of periostin expression in human cancers have demonstrated that increased expression of POSTN is correlated with tumour angiogenesis and metastasis [
52‐
54]. In primary breast tumours, POSTN causes up-regulation of vascular endothelial growth factor receptor (VEGFR)-2 in endothelial cells [
52]. Elevated expression of VEGFs, the ligands for the VEGF receptors, as observed in some breast carcinomas as well as in our study, provides synergistic paracrine signalling through VEGFR-2 on endothelial cells, potentially promoting angiogenesis and dissemination. Although the expression of POSTN shows a weak correlation with Ki67 immunoreactivity, there is no evidence to suggest that POSTN itself influences proliferation or is a surrogate marker of proliferation rate. Rather, it seems more likely that that its prognostic significance may be due to the altered therapeutic responses of POSTN positive tumours to drugs like tamoxifen. The fact that tumour-specific expression of VEGFR-2 has been associated with an impaired response to tamoxifen therapy in ER-positive premenopausal breast cancer [
55] is in line with the poor prognosis of this cohort of breast cancers. Therefore, further studies are required to investigate if POSTN positivity is correlated with VEGFR-2 expression, thereby providing a molecular mechanism that links POSTN to endocrine resistance for ER-positive breast tumours.
Metastasis to bone occurs frequently in advanced breast cancer and is accompanied by debilitating skeletal complications [
56]. Among the up-regulated gene sets in the malignant sample with enrichment in myoepithelial/basal type genes in this study was a small family of genes involved in bone remodelling and skeletal development. Their expression in the human breast epithelial cells, including the normal myoepithelial cells, indicates that they play a significant role in epithelial cell biology, in addition to mesenchymal development. Many of these mesenchymal-specific genes, associated with osteoblasts, have previously been found overexpressed in other primary breast tumours [
57]. By acquiring the expression of such mesenchymal genes, the malignant epithelial breast cells may have an advantage in growth in the bone environment correlating with progression into a more aggressive cancer phenotype. Targeting such genes and proteins might, therefore, be a means of suppressing this phenomenon.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
This study was conceived by AG, AM and MJOH. The expression profiling and statistical analysis was carried out by AG, AM, KF and MI. The pathological analysis and immunohistochemistry were performed by JSRF and DS. ML performed the RT-PCRs. CI, BS, HV and CVJ participated in the sequence alignment and annotation of MPSS data. The manuscript was written by AG, AM, AMN and MJOH with help from JRSF, PSJ, AA, AJGS and RLS. All authors read and approved the final manuscript. AG and AM contributed equally to this work.