Introduction
Breast cancer is a heterogeneous disease that encompasses a range of phenotypically distinct tumour types. Underlying this heterogeneity is a spectrum of molecular alterations and initiating events that manifest clinically through a diversity of disease presentations and outcomes. Novel therapeutic strategies are increasingly being investigated and implemented, but unpredictable response and the development of resistance to adjuvant therapy remain major challenges in the clinical management of breast cancer patients.
The key to optimizing and targeting therapy lies in a more complete understanding of the complex molecular interactions that underlie breast cancer and contribute to its heterogeneous nature. Breast-cancer-related genes have been extensively investigated, largely through the development of high-throughput array-based gene expression profiling platforms. The substantial datasets that have ensued have enabled us to decipher in depth some of the molecular intricacies associated with breast cancer, and have expanded our knowledge of the genetic pathways associated with breast carcinogenesis, resulting in classification systems predictive of outcome [
1,
2].
Breast tumours can now be classified into major subtypes on the basis of gene expression – luminal, v-erb-b2 erythroblastic leukaemia viral oncogene homolog 2 receptors (HER2/
neu) overexpressing and basal like – and further analysis has identified additional subtypes within the original subgroups [
3]. The expression of specific genes such as the oestrogen receptors (ERs) and HER2/
neu are indicative of outcome in breast cancer patients, and the clinically relevant subgroupings are based broadly on ER/progesterone receptor (PR)/HER2/
neu status. The ability to classify breast cancers in this manner has obvious beneficial implications for the development of targeted therapies; multigene prognostic and predictive tests have been developed, have been commercialized and have become established as tools in breast cancer diagnostics [
4], although as yet there is little knowledge regarding the precise regulation of these genes and receptors.
MicroRNAs (miRNAs) are short (~22 bp), single-stranded, noncoding RNAs that have recently been recognized as a highly abundant class of regulatory molecules. They are thought to regulate up to one-third of the human genome via sequence-specific regulation of post-transcriptional gene expression by targeting mRNAs for cleavage or translational repression [
5]. miRNAs have recently been identified as key players in cellular processes including self-renewal, differentiation, growth and death [
6], all of which are dysregulated in carcinogenesis. There is increasing evidence to suggest that miRNAs may be responsible for a large proportion of breast cancer heterogeneity. A number of miRNAs have been shown to be dysregulated in breast cancer [
7‐
10], and specific miRNAs functioning as regulators of tumorigenicity, invasion and metastasis have been identified [
11‐
14]. Furthermore, miRNA regulation of ER and HER2/
neu, known to be of prognostic significance in breast cancer, has been demonstrated [
15,
16]. As each miRNA can target up to 200 mRNA sequences, and mRNAs can have multiple miRNA target sites [
5], it is probable that further miRNA regulators of these genes remain to be determined.
Expression profiling of miRNA to classify breast tumours according to clinicopathological variables currently used to predict disease progression is of particular interest. Firstly, profiling highlights the potential to identify novel prognostic indicators, which may contribute to improved selection of patients for adjuvant therapy. This approach has already shown promise with genomic signatures [
2], and miRNA profiles appear to have superior accuracy to mRNA profiling [
17]. Furthermore, the identification of miRNAs with regulatory roles in clinically distinct breast tumour samples could identify novel targets for therapeutic manipulation.
Despite its apparent clinical application, microarray technology remains deficient with regard to its translation into routine clinical practice. There has been little overlap between the breast cancer gene sets, leading to questions regarding their biological significance and reproducibility [
18]. Array technology is highly dependent on bioinformatics, mathematics and statistics to produce biologically relevant results. The generation of high-complexity microarray data has necessitated the development of novel data analysis methodologies that can cope with data of this nonlinear and highly dimensional nature. Current conventional methods such as hierarchical clustering have shown limitations for the modelling and analysis of high-dimensionality data [
19].
Artificial neural networks (ANNs) are a form of artificial intelligence that can learn to predict, through modelling, answers to particular questions in complex data. The models produced by ANNs have been shown to have the ability to predict well for unseen data and have the ability to cope with complexity and nonlinearity within the dataset [
20,
21]; these features of ANNs means they have the potential to identify and model patterns in this type of data to address a particular question. ANNs are therefore able to determine patterns or features (for example, in genes or proteins) within a dataset that can discriminate between subgroups of a clinical population (for example, disease and control), or disease grades [
22]. Indeed, this discrimination has been previously demonstrated in different tumour types [
22,
23]. These patterns can combine into a fingerprint that can accurately predict the subgroups.
Our aims in the present study were to identify miRNA signatures using ANNs that accurately predict the ER, PR and HER2/neu status of breast cancer patients, thus identifying potential biologically relevant miRNAs and providing further insight into breast cancer aetiology and regulation.
Discussion
In the wake of molecular profiling and the identification of intrinsic subtypes, breast cancer is now considered a heterogeneous group of disease entities with distinct clinical, pathological and molecular features. This biologic heterogeneity has implications for treatment; response to therapy can be predicted by subtyping tumours based on their expression profiles [
2]. The molecular subclasses of breast cancer that are predictive of prognosis are based on their expression of specific genes including ER and HER2/
neu: luminal-A subtype, ER
+/HER2/
neu-; luminal-B subtype, ER
+/HER2/
neu+; basal-like subtype, ER
-/PR
-/HER2/
neu-; HER2/
neu-overexpressing subtype, ER
-/HER2/
neu+ [
1]. The expression of these receptors alone has also been shown to have an effect on chemotherapy sensitivity [
28]. Furthermore, the only targeted therapies currently used in the management of breast cancer are directed at these receptors; ER-positive tumours are treated with endocrine therapy in the form of selective ER modulators, pure anti-oestrogens such as fulvestrant that completely inhibits ER signalling, or aromatase inhibitors that deplete extragonadal oestrogen synthesis. The monoclonal antibody trastuzumab has been developed to target the HER2/
neu, while lapatinib inhibits HER2/
neu-associated tyrosine kinase activity.
The specific combination of receptor status has a significant impact on the outcome of these targeted therapies; HER2/
neu-positive breast cancer is less responsive to any type of endocrine treatment [
29]; approximately one-half of HER2/
neu-positive breast cancers are also ER-positive, and this breast cancer subgroup (luminal B) is thus more refractory to endocrine therapy – despite the ER-positive status. In addition, many patients with HER2/
neu-positive breast cancers do not respond or eventually evade trastuzumab by both
de novo and acquired mechanisms of therapeutic resistance. The subset of patients who are HER2/
neu-negative and ER-negative (basal like/triple negative) are a particular therapeutic challenge as they typically exhibit aggressive clinical behaviour and poorer prognosis. Focused research has revealed promising strategies for treating this subtype of breast cancer, including platinum agents, epidermal growth factor receptor (EGFR)-targeted agents and poly(ADP-Ribose) polymerase (PARP) inhibitors; however, there is as yet no specific target for effective tailored therapy in this subgroup.
Clearly the hormone (ER and PR) and HER2/neu receptors are vitally important to the current classification and management of breast cancer; however, there is little knowledge regarding the precise regulation of these receptors. For this reason we sought to identify miRNAs associated with these receptors.
Microarray profiling is a useful strategy for examining global gene and miRNA expression [
17]. Messenger RNA profiling has been central to breast cancer subtyping. Adaptation of microarray-devised gene sets into routine clinical practice, however, has been hindered by the apparent lack of consensus between gene sets. One reason for this hindrance is that the classical computational analysis of such highly dimensional microarray data has proved problematic as it is not robust enough. The inherent noise (for example, experimental error, sample and chip variability) can significantly interfere with the development of accurate predictive models, and their performance is compromised by their modelling of extraneous portions of the dataspace. Michiels and colleagues questioned the robustness of the analysis of several microarray studies, and found that the molecular signatures were largely dependent on the selection of patients in training sets and that several of the largest studies addressing cancer prognosis failed to classify patients better than randomly [
30].
ANNs were chosen as the bioinformatics tool for microarray data analysis for the present study due to their ability to cope with complex data and the potential for modelling data of high nonlinearity. For this reason, they have been widely applied to a range of domains including character/face recognition [
31], stockmarket predictions [
32], or survival prognosis for trauma victims [
33]. ANN model development is achieved by a training process involving the adjustment of the weighted interconnections between nodes within the neural network over a defined number of epochs. This adjustment occurs by the iterative propagation of the predictive error back through the entire network with a learning algorithm (for example, the back-propagation algorithm used in the present study). ANNs have already been successfully applied in a number of contexts where markers of biological relevance have been identified, including polycystic ovarian syndrome [
34], melanoma [
22], prostate cancer [
35] and breast cancer [
36].
The miRNA expression profiles have shown superior accuracy to mRNA signatures at classifying tumours [
17]. The novel application of ANNs to the analysis of miRNA array data should serve to enable breast tumours to be classified according to their miRNA expression profile, and should also focus attention upon a relatively small number of molecules that might warrant further biochemical/molecular characterization to assess their suitability as potential therapeutic targets.
In the present study, miRNA transcript signatures predictive of ER, PR and HER2/
neu status were generated from microarray data using an ANN model (Tables
3 and
4). The breast tumours selected for the array experiment were relatively homogeneous in terms of other clinicopathological parameters, all being early stage (stages 1 and 2a) and free of nodal disease. In the first step of the analysis, miRNAs capable of classifying tumour samples according to receptor status with an accuracy of 67 to 87% were identified. Sequential selection and addition of miRNAs to the ANN successfully identified an optimum miRNA set based on predictive performance.
While the model shows high confidence for the dataset analysed (100% predictive accuracies), further validation is required on larger datasets and validation of the miRNA sets identified using alternative methods such as PCR.
Confirmation of the expression data from the microarray by RQ-PCR was used for validation in this dataset; the expression patterns of a subset of eight miRNAs was validated in the same sample set by stem-loop RQ-PCR, and there was significant positive correlation in sample-to-sample expression patterns between the two techniques (Figure
6,
P < 0.05). Furthermore, the expression patterns and phenotypic associations of the top-ranking miRNAs
miR-342 and
miR-520g were validated in an independent sample set of 95 tumours (Figure
7).
The miRNA signatures generated for ER status (
miR-342,
miR-299,
miR-217,
miR-190,
miR-135b,
miR-218), for PR status (
miR-520g,
miR-377,
miR-527-518a,
miR-520f-520c) and for HER2/
neu status (
miR-520d,
miR-181c,
miR-302c,
miR-376b,
miR-30e) include miRNAs that have previously been identified as dysregulated in breast cancer and other cancers [
7,
9,
37‐
43] and involved in the regulation of cell functions such as growth, apoptosis, migration and invasion [
38,
42,
43]. This finding suggests that the miRNAs thus identified are biologically relevant and their selection is not arbitrary or a result of the highly dimensional nature of the data.
Notably, two chromosomal locations account for a number of the dysregulated miRNAs in these predictive sets: Ch19q13 (
miR-520g,
miR-520d,
miR-527-528a,
miR-520f-520c,
miR-181c) and Ch14q32 (
miR-342,
miR-299,
miR-377,
miR-376b). Allelic deletions on chromosome 14q32 are frequently observed in various tumours, including renal cell carcinoma [
44], neuroblastoma [
45], colorectal carcinoma [
46], bladder cancer [
47], ovarian carcinoma [
48], meningioma [
49] and breast carcinoma [
50].
Approximately one-third of human miRNAs are organized in clusters, which may represent a single transcriptional unit and coordinated regulation – possibly leading to synergistic biological effects, as suggested by the inclusion of miRNAs from adjacent chromosomal locations in our signatures. This may contribute to our finding that while single miRNAs are capable of distinguishing between different breast tumours (step 1; Table
4), multiple miRNAs in combination significantly enhance the predictive power of these models (step 2; Table
3). Our finding of co-expression of other neighbouring miRNAs not included in the predictive signatures (Figure
5) is in concordance with previous studies [
7,
51] and is probably due to shared regulatory elements.
A primate-specific conserved miRNA family is located at Ch19q13.42 [
52]. Two miRNAs from this location,
miR-373 and
miR-520c, have previously been shown to stimulate cancer cell migration and invasion in both
in vitro and
in vivo models and to be expressed at increased levels in metastatic breast cancer [
43]. The miRNAs from this family were associated with ER, PR and HER2/
neu status in our analysis. Similar seedpairing in miRNA families indicates that they may function through the same pathways and share mRNA targets – such as CD44, identified as a target of
miR-373 and known to correlate with survival in breast cancer patients [
53]. It is likely that this particular miRNA family has a significant regulatory role in breast cancer.
miR-520g was ranked as the top miRNA in the PR signature and also was identified in step 1 of the analysis as an ER-predictive miRNA. Both of these findings were validated using RQ-PCR in a larger, more heterogeneous cohort of 95 breast tumours (Figure
7d). To our knowledge this is the first report of
miR-520g dysregulation in association with ER and PR status in breast cancer. Importantly,
miR-520g is computationally predicted to target a number of breast-cancer-related genes including ABCG2 (BCRP) [
54]. ABCG2/BCRP is an ATP-binding cassette transporter that is often associated with multidrug resistance due to its ability to remove substrates from a cell against a concentration gradient [
55]. ABCG2 expression in cancer cells has been shown to confer a drug-resistant phenotype and correlates with response to anthracyclines in breast cancer [
56]. The regulation of ABCG2/BCRP is controlled via oestrogen and progesterone response elements [
57,
58], and the steroid hormones have been shown to impact on ABCG2 expression [
57,
59,
60].
Recent studies have shown that ABCG2 expression is also regulated by miRNAs including
miR-328 [
61], leading to increased mitoxantrone sensitivity, and by miRNAs from the Ch19q13.42 cluster. Specifically, ABCG2 is downregulated by
miR-519c in drug-sensitive cells via a binding site in the 3' UTR that is not present in their drug-resistant counterparts [
62], and
miR-520h targets ABCG2 in hematopoietic stem cells during their differentiation into progenitor cells [
63].
miR-520g shares sequence homology with
miR-520h, and these miRNAs were coordinately expressed in our dataset (Figure
5); it is therefore probable that
miR-520g may also be a regulator of ABCG2. This hypothesis warrants further investigation; identification of miRNA binding sites in the 3' UTR of genes such as ABCG2 that promote multidrug resistance could enable the delivery of specific miRNAs from this cluster to tumours in an attempt to repress ABCG2 and to increase sensitivity to existing therapeutic agents.
The ER-status predictor
miR-342, identified as having the strongest response curve, was also chosen for further characterization. Expression of
miR-342 in the larger cohort of breast tumours (n = 95) using RQ-PCR confirmed the microarray findings of an association between
miR-342 and ER positivity. Furthermore, we report the first findings of an association between
miR-342 and HER2/
neu positivity. Increasing evidence suggests that
miR-342 plays an important role in the carcinogenic process, particularly in the hormonally regulated breast cancer.
miR-342 is dysregulated in multiple myeloma [
64] and has been shown to be epigenetically silenced by methylation in colorectal carcinoma [
42].
In vitro studies have demonstrated that introduction of a
hsa-miR-342 mimic to colorectal cancer cells induces apoptosis, suggesting a potential tumour suppressor role for this miRNA [
42].
Previous miRNA profiling studies in breast cancer have identified associations between
miR-342 and ER, intrinsic breast cancer subtype and tumour grade [
7,
9]. A recent study has shown downregulation of
miR-342 in tamoxifen-resistant breast cancer cells compared with tamoxifen-sensitive breast cancer cells, suggesting a potential role as a biomarker of drug sensitivity [
65]. To our knowledge this is the largest number of primary breast tumours in which
miR-342 has been quantitated using RQ-PCR. Our findings of increased
miR-342 expression in both ER-positive and HER2/
neu-positive tumours is of particular interest as the luminal B (ER
+/HER2/
neu+) and triple-negative tumours present particular therapeutic challenges. In the present study,
miR-342 has emerged as a potential candidate for regulation of ER/HER2/
neu expression that warrants further functional investigation to elucidate its mRNA targets and its precise role in breast carcinogenesis.
Authors' contributions
AJL performed the experiments, was responsible for data analyses and drafted the manuscript. NM conceived of, designed and supervised experimental work and manuscript editing. AD and PAD contributed to RQ-PCR data. REM contributed to sample preparation and array experiments, and participated in preliminary data analysis. VB, SS, JB were responsible for conducting microarray hybridizations and preliminary data analysis at EMBL Heidelberg. GB and CL designed bioinformatics models for interrogation of the array dataset. MJK contributed throughout the experiment, critically reviewed the manuscript and participated clinically in sample provision. All authors read and approved the final manuscript.