Introduction
MicroRNAs (miRNAs) are small, non-coding, single-stranded RNAs ranging in size between 18 and 22 nucleotides; they are typically excised from longer, 60- to 110-nucleotide stem-loop precursors [
1,
2]. miRNAs are involved in fundamental biological processes, including development, differentiation, apoptosis, and proliferation, and are believed to act predominately as post-transcriptional regulators that can either degrade their mRNA targets or repress their translation [
3]. A single miRNA may have multiple mRNA targets, and up to 30% of human genes may be regulated by miRNAs [
4,
5].
Aberrant expression of miRNAs in cancer was initially identified in B-cell chronic lymphocytic leukemia [
6], and miRNA dysregulation has been subsequently reported for many tumor types in which, depending on the specific target mRNA(s), they may act either as tumor suppressor genes or as oncogenes [
7,
8]. In breast cancer, post-diagnosis miRNA levels have been shown to correlate with a number of tumor characteristics, including stage, vascular invasion, proliferative index, and estrogen receptor/progesterone receptor (ER/PR) status [
9,
10], and may have prognostic value.
miRNAs have recently been found in human serum and plasma, where they appear to be resistant to RNAase degradation and thus relatively stable, even in stored samples [
11]. This stability has made miRNAs appealing candidates for epidemiologic studies of stored samples, particularly since miRNA profiling requires only small amounts of serum or plasma [
12]. The use of circulating miRNA profiles as potential early-detection cancer markers has generated considerable interest [
13‐
16], although data addressing such application remain sparse. Initial studies have suggested that serum levels of miRNAs may differ between diagnosed cancer cases and controls [
17], and several recent case control studies of breast cancer have reported evidence of differential miRNA expression levels in serum [
18‐
21]. These studies have shown little agreement, perhaps because some have measured only a few miRNAs whereas others have used more comprehensive miRNA screens, but with a small number of subjects. None has used samples obtained prior to diagnosis. Use of such prospective samples avoids a number of important potential biases (for example, differential selection and processing of cases and controls or the possibility that the differences observed in case samples are the result of biopsy, cancer treatments, behavioral changes, stress, or other factors experienced by cases but not controls).
Here, we report on a study that prospectively collected serum samples from 205 women who subsequently developed breast cancer and 205 women who remained cancer-free and that used microarrays to comprehensively assess known miRNAs.
Materials and methods
Study population
The Sister Study [
22] is a prospective cohort study of 50,884 women and was designed to examine the environmental and genetic determinants of breast cancer. The cohort has been previously described [
23]; briefly, women from the US or Puerto Rico were eligible to enroll if they themselves had never had breast cancer but had a full or half-sister who had breast cancer. At baseline interview, all participants provided extensive information, including family history, reproductive history, and information about potential risk factors. Informed consent and blood samples were obtained during a home visit. For women who subsequently developed breast cancer, detailed information on diagnosis was collected from medical records and self-report. Pathology reports were abstracted for tumor grade, stage, and other information, including status for ER, PR, and HER-2 (human epidermal growth factor receptor 2) expression. The study was approved by the Institutional Review Board of the National Institute of Environmental Health Sciences, National Institutes of Health, and the Copernicus Group Institutional Review Board.
Selection of cases and controls
We designed a matched-pair nested case control study. We selected patients who had confirmed invasive breast cancer, who completed enrollment by August 2008, and whose diagnosis occurred within 18 months following blood draw (n = 242). We excluded 29 cases who lacked a serum sample or whose sample had integrity issues during collection and shipping and eight cases whose sample had limited volume, leaving 205 cases that are the focus of our study. For each case, a matched control was selected from the 50,884 participants on the basis of the following criteria: no history of cancer (other than non-melanoma skin cancer), having completed enrollment by August 2008, an available blood sample, same race (non-Hispanic white, black, Hispanic, or other), similar age at enrollment (within 5 years), and similar date of blood draw (within 2 months). Three replicate serum samples from three women (nine samples in total) who were not participants in the study but who provided blood samples that were collected and processed in the same manner as Sister Study participants were used to provide technical replicates.
To minimize possible processing and chip lot effects, samples were assigned to processing batches of seven to nine pairs, and batches had similar distributions of age, race, and date of enrollment. For array hybridization, each batch was assigned to one of two different chip lots ('A' and 'B') in a manner designed to ensure a balance of these same characteristics. The nine replicates (described above) were assigned to the same batch and chip lot. Laboratory personnel were blind to case control status and other phenotype information.
RNA extraction, labeling, and hybridization
Total RNA was extracted in batches by using a Total RNA purification kit (cat. no. 17200; Norgen Biotek Corp., Thorold, ON, Canada). In accordance with the manufacturer's recommendation not to exceed 200 µL per column, 400 µL of total serum from each individual was split into two equal 200-µL aliquots and then processed separately following the manufacturer's recommended protocol for total RNA purification from serum. An on-column DNase digestion was added before sample elution by using an RNase-Free DNase I Kit (cat. no. 25710; Norgen Biotek Corp.), and the two aliquots were subsequently pooled. Fixed volumes rather than fixed amounts of RNA were used in accordance with other studies [
24].
Total RNA (8 µL) was directly labeled by using Flash Tag Biotin HSR Labeling kits (cat. no. HSR30FTA; Genisphere, LLC, Hatfield, PA, USA) in accordance with the instructions of the manufacturer. RNA was heated to 80°C for 10 minutes before labeling to inactivate any residual DNase activity. RNA was hybridized for 42 hours to the GeneChip miRNA 2.0 array (cat. no. 901755; Affymetrix Inc., Santa Clara, CA, USA [
25]). The GeneChip miRNA 2.0 arrays contain 100% miRBase version 15 coverage of 131 organisms and contain probes for 3,439 human non-coding RNAs (ncRNAs), including 1,105 miRNAs and 2,334 other ncRNAs (including scaRNAs and snoRNAs). The arrays were washed and stained by using standard Affymetrix protocols and scanned by using an Affymetrix GCS 3000 7G Scanner. Feature intensities were extracted by using miRNA 2.0 array library files. Array hybridization and scanning were completed by Precision Biomarker Resources, Inc. (Evanston, IL, USA). The average Spearman correlation coefficient values for three sets of three technical replicates were all above 0.8 (Additional file
1). Array data were deposited into the NCBI Gene Expression Omnibus (GSE44281).
Replication samples and qRT-PCR
An independent set of 10 women were used to validate selected miRNAs via quantitative reverse transcription-polymerase chain reaction (qRT-PCR). Five women who provided consent and blood samples but who developed breast cancer prior to completing enrollment were selected as cases, along with five controls who also provided consent and blood samples and who were cancer-free but did not complete enrollment. Total RNA was extracted from serum samples of these women as described above with the addition of Synthetic
C. elegans miScript miRNA Mimic (cat. no. MSY0000010; Qiagen, Valencia, CA, USA). Synthetic cel-39 was spiked-in at a final concentration of 0.25 fmol/µL prior to extraction and used as a PCR normalization control. The RNA concentration, reverse transcription, and pre-PCR steps were carried out in accordance with a previously published protocol [
26]. ExoSAP-IT (cat. no. 78250; Affymetrix Inc.) treatment followed by column purification (cat. no. 28004; Qiagen) in accordance with the protocol of the manufacturer was used to purify the pre-PCR product. Individual PCR was run in triplicate by using 1 µL of purified pre-PCR product. The reaction contained the following components: 2x Taqman universal master mix (cat. no. 4324018; ABI, Carlsbad, CA, USA), 1 µM forward primer, 1 µM universal reverse primer, and 0.2 µM probe. The reaction was run on a Bio-Rad CFX 384 Real-Time System (Bio-Rad Laboratories, Inc., Hercules, CA, USA) by using the following parameters: 55°C for 2 minutes, 95°C for 10 minutes, followed by 40 cycles of 95°C for 15 seconds and 55°C for 1 minute. PCR cycle threshold (Ct) values were recorded for each target gene and for normalization controls and were averaged across three independent runs. Primers for miR-222, miR-181a, miR-1825, and miR-18a were custom-ordered from IDT (San Diego, CA, USA) by using previously published sequences [
26]. Primers for cel-39 were designed in the same fashion as above and custom-ordered from IDT.
To determine the best candidate miRNA for PCR normalization in our data set, we ran the array expression data from the 47 miRNAs expressed in almost all individuals through the NormFinder software [
27]. NormFinder uses a model-based variance estimation approach [
28]. Using these results, we selected as a qRT-PCR normalization control miR-1825, which showed one of the highest stability values across the 410 cases and controls and had blood levels that were similar to those of the three target miRNAs. We used the average of miR-1825 and an external spike-in cel-39 control, a strategy shown to be effective for controlling both technical and biologic variability in qRT-PCR assays from serum [
17,
24]. The efficiency of the four PCR assays (for miR-181a, miR-18a, miR-222, and miR-1825) was similar for all four assays (Additional file
2). Normalized relative expression was based on Ct values and calculated as 1/(Ct
gene−Ct
norm).
Data processing and statistical analysis
miRNA expression intensity values were background-corrected and normalized across arrays by using the robust multichip average method [
29]. The intensity data used in all analysis were log (2)-transformed.
For each array, the miRNA probe set signals were compared with the distribution of signals for anti-genomic probes that had matching GC content (miRNA QC Tool, version 1.0.33.0), and in accordance with the recommendation of the manufacturer, Wilcoxon rank-sum test of
P value of less than 0.06 was used to identify miRNAs above background. Subsequent analysis was restricted to 414 miRNAs that exceeded background levels in at least 50 women. Conditional logistic regression was used to identify differentially expressed miRNA probes between cases and controls for those 414 probes. Because analysis of circulating miRNAs in prospectively collected samples is still exploratory, we - like some other investigators of circulating miRNAs [
30,
31] - regard these results as descriptive and not as tests of hypotheses and so provide
P values that are unadjusted for multiple comparisons.
The association between miRNAs and the tumor characteristics of hormone receptor status (ER, PR, and HER-2) and lymph node status was tested in a case-only logistic analysis, in which race was adjusted for. Chip lot and batch were specified as random effect variables. All statistical analyses were performed by using R 2.15.
Pathway analysis with ingenuity pathway analysis
miRNAs found to be significantly associated with case control status were further analyzed with ingenuity pathway analysis (IPA) [
32]. Using IPA's microRNA target filter, we generated a list of predicted mRNA targets for each of the 21 significant miRNAs. The list was then restricted to the mRNAs listed in the IPA database as experimentally verified targets of any of the 21 miRNAs. This mRNA target list was then used to run a canonical pathway analysis.
Discussion
miRNA profiles are gaining interest as potential diagnostic or prognostic markers for breast cancer [
33]. However, existing studies have been limited by sample size or the number of miRNAs analyzed, and none has used prospectively collected samples [
18,
31,
34]. Our study minimized potential biases by profiling global serum miRNA expression patterns in samples obtained from women prior to clinical diagnosis (mean time to diagnosis was 10 months). We found a set of 21 miRNAs differentially expressed in serum samples from 205 women who subsequently developed breast cancer compared with 205 women who remained cancer-free during the time of follow-up. The differences in miRNA levels were small and include both overexpression and underexpression of miRNAs in the cases, and overexpression was significantly more frequent than would be expected had the association been random. Published reports of primary breast tumors or cell lines have examined seven of the 21 differentially expressed miRNAs we found, and all seven showed agreement with the direction of change in our case serum samples (Table
3). IPA of the mRNA targets of these differentially expressed miRNAs suggested gene enrichment for cancer-related signaling pathways. Although the absolute differences in miRNA levels between serum samples of cases and controls are quite small, differences pre-date clinical diagnosis and may reflect important pathways for breast cancer development.
miR-18a, miR-181a, and miR-222 showed the highest percentage difference between cases and controls in our study; qRT-PCR of these miRNAs in a small independent replication set of cases and controls, though not statistically significant, replicated the direction of change for all three. These three miRNAs have been suggested to act as oncogenes through regulation of their potential target mRNAs. miR-18a is part of the oncogenic miR 17-92 cluster, which is often overexpressed in solid tumors, including breast [
35]. Overexpression of this cluster is believed to cooperate with c-Myc in stimulating proliferation by negatively regulating E2F1 [
36,
37]. Increased expression of miR-181a in the bone marrow of patients with breast cancer has been reported to be associated with shorter disease-free survival, higher grade, and breast cancer recurrence [
38]. miR-181a is believed to target the tumor suppressor gene programmed cell death protein 4 (
PDCD4) [
38], which inhibits tumor neoplastic transformation [
39]. In breast cancer cell lines, miR-222 overexpression has been reported to be associated with tamoxifen resistance through targeting the cell cycle inhibitor p27 (Kip1) [
40]. miR-222 has also been reported to increase proliferation of ERα-negative cells while reducing the expression of various tumor suppressor proteins [
41], and expression of miR-222 has been reported to increase cell migration in the epithelial-to-mesenchymal transition acting downstream of the RAS-RAF-MEK oncogenic pathway [
42].
Interestingly, two recent case control studies have provided evidence that both miR-222 and miR-181a are overexpressed in the serum of patients with breast cancer. One used sequencing by oligonucleotide ligation and detection (SOLiD) of serum samples obtained prior to surgery from 13 breast cancer cases compared with samples from 10 healthy controls and found 26 miRNAs that were overexpressed in cases, including miR-222 and miR-181a; overexpression of miR-222 was validated in an independent group of 50 cases and 50 controls by using qRT-PCR [
20]. A second study used Solexa sequencing combined with Taqman low-density array chips on serum samples obtained prior to surgery from 48 breast cancer cases and 48 controls; 10 miRNAs were found to be overexpressed in the cases, and four were validated by using qRT-PCR in an independent group of 76 cases and 76 controls [
21]. That study also found overexpression of miR-222 [
21]. These studies, combined with our prospective study, provide a growing body of evidence that miR-222 measured in blood is associated with breast cancer.
Among cases, we compared the serum miRNA profiles of women with different tumor characteristics, including hormone status (ER, PR, and HER-2) and nodal status. Although there were no differences in ER or PR status, there were differences in HER-2 and lymph node status. Of the seven miRNAs differentially expressed in the serum of women who developed HER-2-overexpressing breast tumors, miR-93, miR-183, and miR-29a have been reported to be associated with breast cancer in previous studies [
20,
43,
44]. In our study, miR-93 was underexpressed in the serum of women who developed HER-2-overexpressing breast tumors; interestingly, miR-93 expression was recently shown to induce a more differentiated cell phenotype in breast cancer cell lines, and expression of miR-93 in mouse mammary fat pads blocked tumor development and metastases [
44]. Of the 10 miRNAs differentially expressed in the serum of women with tumors that spread to the lymph nodes (pN1 or higher), four (miR-145, miR-124, miR-125b, and miR-320) have been reported to be associated with breast cancer in previous studies [
45‐
48]. Of these, miR-320 is of particular interest as we found three miR-320 family members (miR-320b, miR-320d, and miR320e) to be underexpressed in the serum of women who developed lymph node-positive breast tumors. miR-320 has been reported to be decreased in breast tumor tissue and downregulation of miR-320 - through loss of phosphatase and tensin homolog (PTEN) -has been shown to promote tumor proliferation and invasiveness in mouse models; expression of miR-320 distinguished human normal breast stroma from tumor stroma and was correlated with recurrence [
49]. A study comparing miRNA expression in inflammatory breast cancer (IBC) with non-IBC also found miR-320 to be downregulated in the more aggressive IBC group of tumors [
50]. Thus, loss of miR-320 expression may be associated with a higher likelihood of lymph node involvement and a more aggressive metastatic phenotype.
Although miRNAs that are differentially expressed between tumor and normal tissue are more frequently downregulated in tumor tissue [
7], our study (like others [
20,
21]) has found that circulating miRNAs that differ in levels between breast cancer patients and controls are more frequently at higher levels in case blood samples. The mechanism underlying circulating miRNA stability is still being investigated. One model involves the active release of miRNAs from cells in membrane-bound microvesicles, including exosomes and shedding vesicles [
51‐
53]. There is evidence that microvesicles can deliver miRNAs to recipient cells and trigger changes in target mRNA levels [
54]. A recent report has shown that vesicle-encapsulated miRNAs represent only a minor portion of circulating miRNAs but that a significant portion of circulating miRNAs are associated with Argonaute2 (Ago2) [
55], the effector component of the miRNA-induced silencing complex [
56]. Both models support the possibility that miRNAs may be actively released into circulation and could act as signaling molecules able to regulate their target mRNA expression in recipient cells. Cancer-associated miRNAs in the circulation could also originate from immunocytes in the tumor microenvironment or from a response mediated by the body's systemic response to disease [
57].
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
ACG carried out sample extraction and sample preparation, did some statistical analysis, and helped to design the experiments, interpret the results, and write the manuscript. ZX performed the majority of the statistical analysis. CRW and PAW participated in study design. LDR participated in study design and provided patient data and study variables. DPS participated in study design and collected samples and data. RCG provided study reagents and processing as well as critical advice on study design, QA, and data interpretation. JAT helped to design the experiments, interpret the results, and write the manuscript. All authors read and approved the final manuscript.