Background
Breast cancers are clinically heterogeneous [
1]. Particular molecules have been identified that are associated with clinical prognosis. For instance, 20% to 25% of breast cancers are associated with the overexpression of HER2, and its presence is associated with poor prognosis [
2,
3]. In addition, increased ER/PR expression has been identified in 70% of breast cancer patients. These biomarkers have motivated a shift away from "one size fits all" approach of treating breast cancer to developing therapies that target specific molecules. In particular, tamoxifen, a selective ER modulator (SERM), improves survival for patients with ER/PR positive tumors [
4,
5]. Trastuzumab, a human monoclonal antibody was developed to bind the HER2 receptor and block its activity [
6]. However,
de novo or acquired resistance to tamoxifen [
7‐
9] and trastuzumab [
10] has been an emerging problem.
Molecularly targeted drugs, like trastuzumab and tamoxifen, are designed to block aberrant signaling within oncogenic pathways. Cell signaling pathways direct the flow of information (i.e. flux) from an extracellular stimulus to the corresponding cellular response (e.g., cellular proliferation, contact inhibition, or cellular death). By analogy with metabolic control analysis, control of flow of information is distributed among all the steps in a network [
11‐
13]. This implies that an increase in one protein does not necessarily correspond to an increased pathway flux. Conversely, the decrease in expression of one protein via therapeutic modification does not necessarily lead to a decrease in pathway flux. Conceptually, this leads to the hypothesis that combining gene expression measurements over group of genes that fall within common pathways will be more effective means of marker identification. In fact, recently it was shown that breast cancer genes that do not exhibit a change in their expression profile still play a central role interconnecting deregulated genes in a protein network [
14]. The observation that onset and progression of many diseases arises from the interactions of a number of interconnected genes has shifted the drug discovery perspective from a molecule-centric to a network/pathway-centric approach [
15]. Proteomics provides an attractive platform for interrogating pathway flux as measuring actual protein levels instead of measuring proxy mRNA levels maybe more informative in spite of added experimental complexity [
16].
One of the most commonly used techniques for proteomic profiling is 2DE based protein separation in combination with mass spectrometry based identification. Using this approach, in addition to analyzing proteins in the blood, tumor tissues are being examined to yield insights about molecular pathways that are altered in cancer progression. While 2-DE based high-throughput proteomic data reveal proteins that are differentially regulated, different sources of biological and analytical variations affect the statistical importance of these results [
17]. These can be addressed using an experimental design that incorporates several technical and biological replicates to account for variations at two levels, within gels and within samples, respectively [
18‐
20]. This puts a demand on the sample amount and composition as proteomic analysis of breast cancer biopsies is complicated due to heterogeneity of cellular phenotypes contained in the sample [
21]. While laser capture microdissection (LCM) can provide a relatively homogeneous sample by concentrating on the cell type of interest, generating enough sample for a conventional proteomic study is laborious with a minimum of 100,000 cells and a dissection time in tens of hours required for one 2D-PAGE [
22]. Given the desire to aid in clinical decision-making, the ability to obtain sufficient clinical sample presents a significant challenge. In the case of sample that is insufficient to carry out a proteomic study with multiple replicates, like in the case of an early stage breast tumor, does the information obtained from a single gel replicate still provide insight into the predominant cell signaling pathway at work in a cell?
Thus the objective of this study was to identify predominant pathways and protein interaction networks in two breast cancer phenotypes using prior information. Given the conservation of genetic information and marker expression between tumors and their corresponding cell lines [
23‐
25], we have used well characterized model systems in our study. The central hubs of protein interaction networks obtained using prior information were validated to establish confidence in the protein expression patterns.
Methods
Cell culture and reagents
The human breast cancer cell lines (BT-474 and SK-BR-3) were kindly provided by Dr. Jia Luo (Health Sciences Center; West Virginia University, WV). Cells were grown in 75-cm2 plastic tissue culture flasks (Costar Corning; Corning, NY) in a humidified incubator at 37°C and 5% (v/v) CO2. The BT-474 cells were routinely maintained in Rosewell Park Memorial Institute (RPMI) 1640 medium (Mediatech, Inc., Herndon, VA) supplemented with 10% (v/v) heat inactivated fetal bovine serum (FBS) (Hyclone, Inc., Logan, UT), 0.3% (w/v) L-glutamine, 1% (v/v) penicillin/streptomycin (BioWhittaker, Walkersville, MD) and 10 ng/mL insulin (Sigma, St Louis, MO). SK-BR-3 cells were maintained in Improved Modified Eagle Medium (IMEM) Zn2+ option (Invitrogen) containing 4 mM L-glutamine, 2 ml/L L-proline, 50 μg/mL gentamicin sulfate supplemented with 10% FBS (Hyclone) and 1% penicillin/streptomycin (BioWhittaker). Cells were passaged at 1:5 dilution with fresh medium every 5 days.
Preparation of cell lines for 2-DE
Cells were grown to approximately 80% confluence. Growth medium was removed from dishes and cells were washed twice with 10 mL Phosphate Buffered Saline (PBS) to remove dead cells as many extracellular proteins as possible. Cells were made non-adherent by incubating the flasks at 37°C for 10 min in the presence of trypsin (BioWhittaker). Trypsin was neutralized by the addition of FBS. Cells were then washed twice with PBS and harvested at 1,200 rpm at 4°C for 10 min. Sufficient precaution was taken to get rid of PBS to eliminate salts that could possibly interfere with the 2DE. Cells were incubated in lysis buffer (7M Urea, 2M thiourea, 2% (w/v) CHAPS) for 30 min on ice and sonicated five times in an ultrasonic water bath, where each sonication was performed for 10 s followed by 10 s cooling interval on ice. Cell debris were pelleted by centrifugation at 14,000 rpm for 40 min at 4°C. The supernatant was aliquoted in fresh tubes and stored at -80°C. The protein concentration was determined using BCA protein assay kit (Pierce).
2-D Electrophoresis
For each cell line, 500 μg of cell lysate was mixed with rehydration buffer (7M urea, 2M thiourea, 2% CHAPS, 1% DTT, 2% IPG buffer, 0.002% bromophenol blue) and incubated for 1 h at room temperature prior to rehydration on Immobilized pH Gradient (IPG) strips pH 3-10 NL, 24 cm, (GE Healthcare, Uppsala, Sweden) for 12 h at 25°C. Isoelectric focusing was done using Ettan IPGphor apparatus (Amersham Biosciences) for a total of 90 kVh at 50 μA per strip at 20°C. Thereafter, IPG strips were equilibrated in 75 mM Tris-HCl pH 8.8, 6M urea, 30% (v/v) glycerol, 2% (w/v) SDS, 0.002% (w/v) bromophenol blue and 1% (w/v) DTT for 30 min. A second equilibration step was done for another 30 min by replacing the DTT with 2.5% iodoacetamide. Equilibrated strips were transferred onto 24 cm 12% uniform precast SDS-polyacrylamide gels (Jule, Inc., Milford, CT) poured between non-fluorescent glass plates. IPG strips were sealed with 0.5% (w/v) low melting point agarose in SDS running buffer containing bromophenol blue. Gels were run in Ettan DALTsix Larger Vertical System (Amersham Biosciences) at 30mA per gel at room temperature, until the dye front had run off the bottom of the gels.
Gels were fixed in 10% (v/v) methanol, 7% (v/v) acetic acid overnight, washed in 18 MΩ water, and stained overnight with SYPRORuby dye (Bio-Rad). Excess dye was removed by washing twice with 18 MΩ water in a dark room. Gels were imaged using the Typhoon 9400 scanner (Amersham Biosciences) at 200 μm resolution with a 488nm laser with 610nm band pass filter at normal sensitivity under fluorescence acquisition mode. Data were saved in .gel format using ImageQuant software (Amersham Biosciences). The 2-DE results are representative of three biological replicates.
Image analysis
The images were analyzed using SameSpots software from Nonlinear Dynamics. Saturated and damaged areas of the gels were ignored in the analysis by selecting a region of interest (ROI). The images were warped using automatic and manual vectors to a reference image that was automatically selected based on the gel containing the most spots. Normalized spot volumes were generated from the optical densities for each individual spot to the ratio of the total spot volume in each gel. 304 differentially expressed protein spots were chosen for further analysis.
In-gel digestion
The gel spots of interest were excised using an Ettan Spot Picker (Amersham Biosciences) fitted with a 1.5-mm spot picker head. Briefly, specified excised spots were reduced in DTT (10 mM, 60°C, 10 min) and alkylated with iodoacetamide (100 mM, room temperature, 45 min) in a dark room. The gel pieces were dehydrated in acetonitrile for 10 min. Then the gel pieces were vacuum dried and rehydrated with 10 μL of digestion buffer (10 ng/μL of trypsin (Promega; Madison, WI) in 25 mM NH4HCO3) and covered with 10 μL of NH4HCO3. The samples were incubated for 16 h at 37°C to allow for complete digestion. Peptides were extracted from gel plugs by sonication in 2.5 μL 5% formic acid.
MALDI-TOF MS analysis
MALDI-TOF-MS system model Micromass MALDI-R (Waters®) was used to obtain the peptide mass fragment spectra as recommended by the manufacturer. Protein digest solutions were mixed at a 1:1 ratio with the MALDI matrix α-cyano-4-hydroxycinnamic acid (CHCA) (Sigma-Aldrich Fluka; St. Louis, MO). 1 μL of tryptic peptide sample was applied to the MALDI plate and allowed to dry. The MALDI-TOF MS was operated in the positive ion delayed extraction reflector mode for highest resolution and mass accuracy. Peptides were ionized/desorbed with a 337-nm laser and spectra were acquired at 15 kV accelerating potential with optimized parameters. The close external calibration method employing a mixture of standard peptides (Applied Biosystems) provided mass accuracy of 25-50 ppm. Internal calibration was performed with the monoisotopic peak of adrenocorticotropic hormone (ACTH) (18-39) peptide (m/z: 2465.1989). Mass spectral analysis for each sample was based on the average of 300 laser shots. Peptide masses were measured from m/z: 800 to 3,000. The peak lists containing the m/z ratio and corresponding intensity values were exported to Microsoft Excel for further processing.
Protein identification using peptide mass fingerprinting (PMF)
Peptide mass fingerprints for each of the 304 proteins were entered in an Excel spreadsheet along side each other. To optimize the database searching, the list of peptide mass peaks from the spectrum of each sample was processed and background peaks that were observed in greater than 10% of the PMF's were eliminated to improve the efficiency of database searching [
26]. MASCOT
http://www.matrixscience.com, Aldente (ExPASy) and MS-Fit (Protein Prospector; University of California, San Francisco) were each used to query the UniProtKB/Swiss-Prot human database with the corresponding monoisotopic peptide mass fingerprints with the following settings: peptide mass tolerance of 50 ppm, one missed cleavage site, one fixed modification of carboxymethyl cysteine, one variable modification of methionine oxidation, and no restrictions on protein molecular mass or isoelectric point. The protein identities reported were ranked high in at least two of the three algorithms used.
Ingenuity pathway Analysis
Differentially regulated proteins identified by 2DE and PMF were further analyzed using Ingenuity Pathway Analysis (IPA; Ingenuity Systems, Mountain View, CA;
http://www.ingenuity.com). IPA was used to interpret the differentially expressed proteins in terms of an interaction network and predominant canonical pathways. The Ingenuity Pathways Knowledge Base (IKB) is a regularly updated curated database that consists of interactions between different proteins culled from scientific literature. IPA uses this database to construct protein interaction clusters that involve direct and indirect interactions, physical binding interactions, enzyme-substrate relationships, and cis-trans relationships in transcriptional control. The networks are displayed graphically as nodes (proteins) and edges (the biological relationship between the proteins).
A protein interaction network was generated as follows. A dataset containing the upregulated proteins, called the focus proteins, for a particular cell line was uploaded into the IPA. These focus proteins were overlaid onto a global molecular network developed from the information in the IKB. Networks of these focus proteins were then algorithmically generated by including as many focus proteins as possible and other non-focus proteins from the IKB that are needed to generate the network based on connectivity.
Canonical pathways are identified from the IPA library based on their significance to the dataset. The significance of the association between the dataset and the canonical pathway is measured in two ways: a) a ratio of the number of proteins in the dataset that map to the pathway divided by the total number of proteins that exist in the canonical pathway and b) a
p-value that is obtained by comparing the number of genes/proteins of interest relative (i.e., focus genes) to the total number of genes/proteins in all functional/pathway annotations stored in the Ingenuity Pathways knowledge base (i.e. a right-tailed Fisher's exact test of a 2 × 2 contingency table with the Benjamini-Hochberg correction for multiple hypothesis testing). The 2 × 2 contingency table is shown in Table
1, where K is the number of genes/proteins of interest (i.e., focus genes) and N is the total number of genes/proteins in all pathway annotations. This test is a standardized choice in the IPA estimate of statistically significant findings. The null hypothesis tested was that the pathways associated with the upregulated proteins were likely to be observed by random chance alone. A low p-value suggests that the pathways associated with the upregulated proteins were not observed by random chance alone.
Table 1
2 × 2 contingency table used for testing the significance of gene/protein enrichment in all IKB pathway annotations.
Genes associated with pathway | k | n - k | n |
Genes not associated with pathway | K - k | (N - n) - (K - k) | N - n |
Column Total | K | N - K | N |
Western blotting
For western blot analysis, 10-30 μg of total cell lysate was separated by SDS-PAGE using a 12% Tris polyacrylamide gel with a 4% stacking gel at 75 V for 4 h. Proteins were transferred onto Bio Trace PVDF membrane (PALL Life Sciences; Pensacola, FL) at 42 V for 1.5 h. Blots were washed in Tris Buffered Saline (TBS) for 5 min at room temperature, blocked for 1 h in TBS + 0.1% Tween 20 (TBS/T) plus 5% dry milk at room temperature and then washed three times in TBS/T. Blots were incubated overnight at 4°C with primary antibodies specific for IGF-1R (sc-9038), α-Enolase (sc-100812), GAPDH (sc-25778) (all from Santa Cruz Santa Cruz, CA), Ras (BD Biosciences, 610001) and Profilin (Millipore, AB3891) in TBS/T plus 5% dry milk. The next day, blots were washed three times in TBS/T, incubated for 1 h at room temperature with anti-biotin (Cell Signaling Technology, Inc., Danvers, MA, 7727) and either a goat anti-mouse IgG-horseradish peroxidase (HRP) (BD BioSciences, 554002) or a goat anti-rabbit IgG-HRP (Sigma-Aldrich, A0545). Finally, the blots were washed three times in TBS/T, developed using LumiGLO reagent (Cell Signaling Technology, Inc., Danvers, MA, 7003) and bands were visualized on KODAK Biomax light film (Fisher Scientific). Densitometric analysis was performed using ImageJ software (National Institute of Health) and protein levels were normalized to GAPDH protein levels for each sample. Given the uncertainty in estimating the level of expression in both cell lines, an empirical Bayesian approach was used to establish the level of confidence associated with differential expression between BT474 and SKBR-3 given the available data [
27]. Levels of expression were log-transformed to minimize potential bias in estimating the expression ratio, R, as follows: log
10 (X
BT474) = R + log
10 (X
SKBR3). A Markov chain Monte Carlo algorithm was used to estimate the posterior distribution in the differential expression coefficient. An initial unbiased gaussian prior distribution was used to propose new steps in the Markov chain. The prior distribution was scaled to achieve an acceptance fraction of 0.4. The Gelman-Rubin potential scale reduction factor was used to estimate convergence of three independent Markov chains to the posterior distribution [
28]. Posterior estimates of the expression ratio were obtained from the tails of the three independent chains following convergence.
Discussion and Conclusion
It is the specific biological question that shapes the design of experiments [
20]. For example, a common approach is to detect changes in expression in each spot, considered individually, that is consistently above a threshold determined by the system's experimental noise. Implicitly this work reflected a desire to strike an optimal balance between the amount of data required and the ability to infer, with predictive potential, differentially activated pathways in two cancer models. Duncan and Hunsucker have used an engineering term - fitness-for-purpose - to characterize experimental design in proteomics where different constraints; such as limits on biological samples, effective use of resources, and how the information will be used; shape the experimental design [
37]. To strike a meaningful balance between these constraints, we hypothesized that a single gel replicate was sufficient to infer, with predictive potential, cell signaling pathways and protein networks that are differentially regulated between two breast cancer models. We demonstrated the predictive potential by validating the inferred differentially regulated cell signaling pathways using previously reported gene expression data [
38] and by validating the inferred protein networks by western blot. Using previously reported gene expression data, we have attempted to minimize potential bias in our results introduced by our protocols (i.e., subtle differences in tissue culture or proteomics workflow). In cases where there is less prior information, such as a proteomic analysis of primary tissue, independent analysis of other gel replicates could help establish the confidence in the inferred differentially regulated cell signaling pathways. Weitzel et al. [
39] recently described such an approach, where they used an additional gel replicate to confirm the upregulated pathways.
The cellular origins and predominant signaling pathways within these two cell lines are quite different. BT474 is derived from a solid invasive ductal carcinoma in the breast [
40] while the SKBR3 cell line is derived from pleural effusion adenocarcinoma [
41]. The aggressiveness of these cell lines are different as BT474 is ER+/PR+ with a high
in vitro invasion capability whereas SKBR3 is ER-/PR- with a low
in vitro invasion capability. The aggressiveness of BT474 is supported by finding that the regulation of actin based motility by Rho and actin cytoskeleton signaling pathways were enhanced. In contrast, metabolic pathways like amino acid biosynthesis and glycolysis/gluconeogenesis were more pronounced in SKBR3 cell line.
To compare our results against gene expression data for the BT474 and SKBR3 cell lines, we used the mRNA expression data from the study of a collection of breast cancer cell lines. The gene expression data was reported as a matrix of probe sets by cell lines in which value is the calculated log abundance of each probe set gene for each cell line. Gene expression values were centered by subtracting the mean value of each probe set across the cell line from each measured value. To calculate the fold-difference between the two cell lines, these log abundance gene expression values were subtracted from each other. All the up-regulated genes for BT474 (8663 genes) and SKBR3 (8237 genes) were uploaded into the IPA. As shown in Additional File
4, Table S2, the top 5 associated network functions for BT474 had an identical significance score of 27 and were as diverse as cancer, skeletal disorder and dermatological diseases. Similar analysis for SKBR3 gene expression showed the top 5 functions had an identical significance score of 25 and were as varied as embryonic development, hematological system development, and lipid metabolism. The canonical pathway analysis applied to all of the differentially expressed genes revealed that none of the canonical pathways for either cell lines were significant (Additional file
5, Figure S1). This is because the multiple-testing criteria raise the threshold for significance such that none of the embedded gene expression patterns within the dataset provide a sufficiently strong signal to surpass this increased threshold. The threshold for gene expression data was increased by setting the cut-off value to be the same as protein expression data which was 1.5-fold. The resulting analysis for BT474 dataset which consisted of 506 genes showed 4 out of top 5 associated network functions to be related to cancer with scores better than 13 as shown in Additional File
6, Table S3. The most significant pathway for this analysis was IGF-1 signaling pathway with
p-value < 1.2 × 10
-4 (Figure
3B). For 304 genes which were up-regulated by a factor of 1.5-fold in SKBR3, 4 out of 5 associated network functions were related to cell death and cancer and had scores better than 11. The most significant pathway was urea cycle and metabolism of amino groups with a
p-value < 6.45 × 10
-4 (Figure
3B).
The group of pathways in SKBR3 cell line associated with our protein expression data was in agreement with the group of pathways associated with gene expression data. In BT474 cell line, though the top pathways for protein expression data were associated with cell motility and the pathway for gene expression data was associated with IGF-1 signaling, they are both associated with proliferation [
42] and resistance to apoptosis [
43,
44] in a broader sense. The number of focus molecules for the protein datasets involved in the top two networks for BT474 and SKBR3 were 25 and 29 which was 89.2% and 85.2% respectively of the total dataset; similar number for the gene datasets were 8.3% and 10.8% for BT474 and SKBR3 respectively, suggests that proteomics provides greater information per observation relative to gene expression.
In summary, experimental designs that consider each protein individually place a high burden on clinical samples. In cases where a sample is limited, a single replicate is unable to establish whether a single protein can be used as a biomarker. Our results do suggest that in a non-ideal case scenario, the overall pattern of differential protein expression can still be used, in conjunction with prior information, to infer pathways that underpin differences in cell phenotype. This information may prove helpful in tailoring therapies to the patient.
Acknowledgements
This work was supported by the PhRMA Foundation, National Cancer Institute (NCI) R15CA123123, and the National Institute of Allergy and Infectious Disease (NIAID) R56AI076221. The content is solely the responsibility of the authors and does not represent the official views of the NCI, the NIAID, or the National Institutes of Health. We thank Dr. Jia Luo for kindly providing the breast cancer cell lines.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
YK and VS carried out the 2D-PAGE experiments. YK and VS performed the image analysis. YK was responsible for PMF, IPA analysis and immunoblotting. DK conceived of the study, participated in its design, and coordinated its execution. All authors drafted, read and approved the final manuscript.