Background
Breast cancer is the second leading cause of death by cancer in women. It is estimated by the American Cancer Society that in 2014, approximately 232,670 new cases of invasive breast cancer will be diagnosed in women and up to 40,000 women will die from breast cancer in the United States alone [
1].
There has been mounting evidence demonstrating that breast cancer is not one simple disease, but represents a heterogeneous group of tumors with different molecular subtypes, risk factors, clinical behaviors, and responses to treatments [
2,
3]. Cancer biomarkers are increasingly being utilized for diagnostic, prognostic, and predictive purposes [
4,
5]. Distinct molecular subtypes of breast cancer have been identified using the presence or absence of biomarkers, including estrogen receptors (ER+/ER-), progesterone receptors (PR+/PR-), and human epidermal growth factor 2 (HER2+/HER2-) [
6‐
8]. The expression profiles of these three biomarkers are used to divide breast cancer into four subtypes: Luminal A, Luminal B, Triple negative (basal-like), and HER2 type [
9,
10]. Among these subtypes, Luminal A is the most prevalent, accounting for 40 % of all breast cancers. Examples of biomarker-targeted therapy include when patients are given tamoxifen [
11] for those with ER+ breast cancer and trastuzumab for those with HER2+ breast cancer, resulting in significantly improved prognosis [
12]. However, these three classic molecular biomarkers are still insufficient, particularly considering that a significant portion of breast cancers falls under the triple negative breast cancer (TNBC) subtype [
13,
14]. Therefore, claudin-low subtype and new biomarkers such as androgen receptors have been discovered for breast cancer [
15,
16].
Gene signature, a group of genes whose combined expression pattern is uniquely characteristic of a biological phenotype or medical condition, can be a complement to classic prognostic factors to provide more accurate prognostic information [
17]. During the past several decades, a number of gene signatures have been identified for breast cancer. For example, a 70-gene signature (MammaPrint; Agendia, Amsterdam, The Netherlands) and a 21-gene signature (OncoType; Genomic Health, Redwood City, CA) are being used in selected patients with early ER+ disease [
18]. However, the 10-year results of ongoing clinical trials for testing the clinical benefit of gene signatures will not be fully available until 2020 [
19]. Therefore, the identification of novel biomarkers and gene signatures in breast cancer remains highly essential, particularly considering that gene signatures in TNBC have not been fully developed yet.
Emerging technologies, such as gene expression profiling, are increasingly valued as powerful tools for new biomarker identification [
20‐
24]. Data from gene expression profiling is generated from the analysis of hybridization microarray, which is a powerful method for high-throughput screening (HTS) of thousands of genes at one time. Previously, our group has studied the gene expression pattern of lung cancer using Affymetrix human exon array [
25]. In this work, breast cancer gene expression profiling with mRNAs from clinical patient tissues was examined using the newly developed Agilent SurePrint G3 Human Gene Expression Microarray technology. This state-of-the-art high throughput platform takes advantage of the higher density available on the SurePrint G3 chip. Compared with other chips, the one we employed in this study exhibits a remarkably wide dynamic range (approximately 5 orders of magnitude), providing reliable detection of both low- and high-expressing genes. In addition, this technology requires low DNA input and the whole workflow is simple and straightforward. Genes of biological significance in breast cancers were identified via statistical analysis using GeneSpring 12.6 software. The expression levels of several selected genes were further confirmed using real-time reverse transcription quantitative polymerase chain reaction (RT-qPCR). Taken together, our gene expression profiling using the Agilent SurePrint G3 chip will contribute to the clinical diagnosis and treatment of breast cancer through the identification of novel breast cancer biomarkers.
Methods
Tissue samples
Tissue samples from clinical patients were acquired from the Clinical and Translational Science Institute (CTSI) Biorepository at University of Florida with all necessary ethical approval of collection and usage. All patients provided written informed consent for their tissue samples to be archived and used for research purposes. This study was approved by the University of Florida Institutional Review Board (IRB201200353) for breast cancer samples usage through UF CTSI Biorepository. A total of 150 tissue samples were included in this study, covering 3 subtypes of breast cancer (Luminal A, Lumina B, Triple negative) as well as normal tissue samples. All the human tissue samples were stored at −80 °C before RNA extraction.
RNA preparation
Total RNA was isolated and purified from frozen tissue samples using Qiagen RNeasy Mini Kit, QIAshredder kit and RNase-Free DNase Set kit (Qiagen, Valencia, CA) following manufacturer's recommendations. The protocol includes: 1) homogenizing tissue by grinding in mortar with liquid nitrogen; 2) binding the homogenized tissue to the RNeasy Mini spin column; and 3) eliminating any trace amount of DNA using the DNase kit. The qualities of total RNA were strictly controlled by several parameters. The RNA extracts were first analyzed by Nanodrop 2000 (Thermo Fisher Scientific, Waltham, MA) and gel electrophoresis. RNA quality was determined by the ratios of A260/A280 (close to 2) and A260/A230 (close to 2), and the presence of 2 distinct ribosomal bands on gel electrophoresis. Qualified RNAs were further tested using Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA), and samples with 28S/18S RNA ratio > 1 were selected for gene expression profiling [
26]. Thirty-two samples were finally tested, among which 2 samples C501 (Luminal A) and N513 (normal) were from the same patient, others are unmatched samples.
Gene expression microarrays
Cyanine-3 (Cy3) labeled cRNA was prepared from 100 ng RNA using the One-Color Low Input Quick Amp labeling kit (Agilent, Valencia, CA) according to the manufacturer's instructions, and then purified by RNeasy Mini Kit (Qiagen, Valencia, CA) purification. Dye incorporation and cRNA yield were checked with the Nanodrop 2000 (Thermo Fisher Scientific, Waltham, MA). For hybridization, 0.6 μg of Cy3-labelled cRNA (specific activity > 8 pmol Cy3/μg cRNA) was fragmented at 60 °C for 30 min in a reaction volume of 25 μL containing 1X Agilent fragmentation buffer and 2X Agilent blocking agent following the manufacturers’ instructions. On completion of the fragmentation reaction, 25 μL of 2X Agilent hybridization buffer was added to the fragmentation mixture and hybridized to Agilent Whole Human Genome Oligo Microarrays (GPL17077) for 17 h at 65 °C in a rotating Agilent hybridization oven. After hybridization, microarrays were washed for 1 min at room temperature with GE Wash Buffer 1 (Agilent) and 1 min with 37 °C GE Wash buffer 2 (Agilent), then dried using Agilent stabilization and drying solution. Immediately after washing, slides were scanned on the Agilent DNA Microarray Scanner (G2505C) using1 color scan setting for 1x60k array slides (Scan Area 61 × 21 mm, Scan resolution 3 μM, Dye channel set to Green and Green PMT set to 100 %).
Data normalization and quality control
This gene expression microarray data is deposited to the GEO repository and available via the accession number GSE57297 at
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE57297. The data were analyzed by GeneSpring 12.6 software (Agilent) and initial processing method was reported earlier [
27]. The raw signals were log transformed and normalized using the Percentile shift normalization method, the value was set at 75th percentile. For each probe, the median of the log summarized values from all the samples was calculated and subtracted from each of the samples to get transformed baseline. The parameter values for experimental grouping were set as Luminal A, Luminal B, Triple negative, and Normal. Probes with intensity values below 20th percentile were filtered out using the “Filter Probesets by Expression” option.
Differential expression analysis
Moderated t-test with Benjamin-Hochberg multiple testing corrections was used to calculate the p-value for the volcano plots. One-way ANOVA with asymptotic computation and Benjamin-Hochberg multiple testing corrections were used to calculate the p-value for the heat map. A p-value cutoff of 0.05 and a change of 2 fold or more were selected for gene analysis.
Real-time RT-qPCR validation
cDNA was generated using SuperScript® VILO™ MasterMix (Invitrogen). All primers required were designed using Primer Premiere 6 software, and purchased from Integrated DNA Technologies (IDT). The real-time RT-qPCR reactions were prepared using SYBR® Select Master Mix (Life Technologies), and performed using BioRad CXF96 Real-Time PCR Detection System. The following conditions were used: 95 °C for 2 min, 40 cycles of 95 °C for 10 s and 60 °C for 1 min. Fold change of gene expression was calculated with the 2
-ΔΔCT method, using
β-actin as the house keeping gene [
28].
Conclusion
Breast cancer is one of the most common cancers and the leading health crises for women today. Identification of mechanisms and biomarkers in breast cancer remains an urgent challenge [
2,
41‐
45]. By applying state-of-the-art Agilent SurePrint G3 Human Gene Expression Microarray technology to clinical human tissue samples, we were able to obtain a comprehensive snapshot of the gene expression profile of breast cancer, providing informative data to identify novel biomarkers. Expressions of MMP11 and HPSE2, 2 genes closely involved in ECM-mediated cancer cell migration and angiogenesis, were found to be significantly different in breast cancer samples compared with normal controls. This important finding was further confirmed by real-time RT-qPCR. To the best of our knowledge, this is the first time that these 2 genes are demonstrated to act as a gene set in breast cancer. Our findings identify the negative correlation of MMP11 and HPSE2 in breast cancer progression, which provides novel insight into the optimization of breast cancer treatment. Based on our results, effective and targeted therapy for patients with different breast cancer subtypes can be designed and optimized for clinical application to more precisely identify and attack cancer cells by selectively inhibiting the expression of MMP11 or inducing the expression of HPSE2. The efforts to target those 2 genes in anti-breast cancer research with chemically synthesized molecules have already been initiated in our group [
46].
Availability of supporting data
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
JF carried out real-time qPCR, data analysis, and participated in the manuscript preparation. RK participated in the statistical analyses. YZ assisted the gene expression microarray. AX participated in the initial RNA extraction and critical reading of the manuscript. XQ designed the experiment, compiled RNA samples and wrote the manuscript. All authors read and approved the final manuscript.