Background
Breast cancer is the most common cancer affecting women worldwide. Human breast carcinomas represent a heterogeneous group of tumors diverse in behavior, outcome, and response to therapy. Despite tremendous advances in screening, diagnosis, and treatment, causes of this disease remain elusive and complex.
It has been hypothesized that the clinical and genetic heterogeneity of breast cancer is a result of activation of different oncogenes or loss of different tumor suppressor genes in specific stem/progenitor cells [
1]. The genetic and immunohistochemical analysis led to further clasification of human breast cacinomas as basal or luminal according to their cell type origin. To date, five types of breast carcinomas have been recognized according to the molecular genetics profiling [
2,
3].
The nature of molecular changes varies between breast tumors and determines the characteristics of the disease. Current research priority is to develop methods to identify the most informative molecular changes, also known as disease markers. Thus the treatment strategy could be optimized and individualized using molecular-biological properties of the patient's tumor cells.
At present, several prognostic and predictive factors are commonly used in the breast carcinoma treatment. They include clinical factors such as tumor size, stage and histological type, histological grade, number and scale of regional lymph node involvement, hormone-receptor levels (ER, PR), HER-2/neu expression level and nuclear DNA ploidy. The significance of these factors has been clearly determined and together with the clinical state of the patient they are the main determinants in the process of selection of treatment modality [
4]. Despite the research and treatment advances, the outcome of patients is still often poor. Clearly, there is a critical need to find new molecular parameters not only for detection, but also for classification and treatment of the breast cancer.
Proteomics is a rapidly developing field that can explore the heterogeneity of breast cancer and supplement the wealth of information gained from genomics. Breast cancer is one of the most studied cancers in proteomics. Studies investigating differential expression of proteins between normal and breast cancer cells revealed changes in the composition of cytoskeletal elements such as cytokeratin distribution and tropomyosin expression, the differential distribution of molecular chaperones (heat shock protein family members, protein folding enzymes, 14-3-3 σ) has been described together with elevated levels of glycolytic enzymes (aldolase, glyceraldehyde dehydrogenase) [
5,
6]. Roles of lysozomal proteases (cathepsin D, cathepsin B) and matrix metalloproteases (MMPs) in the breast cancer development and progression have been explored [
7].
However, proteomic analysis of larger amounts of clinical samples is so far a challenge [
8]. Two-dimensional gel electrophoresis (2-DE) facilitates the separation of proteins from highly complex protein mixtures and has become a central method in proteomics in recent years. Unfortunately, the 2-DE methodology remains labor intensive and also the subsequent gel analysis is difficult. Although the 2-DE processing softwares are continuously developing, their full automation is immense [
9,
10]. The methodology also puts demands on sample amount and composition. Selection of the most convenient samples containing sufficient amount of proteins suitable for 2-DE proteomic analyses is of crucial importance. Whereas differential proteomic analysis of breast tissue biopsies is complicated due to heterogeneity of cellular phenotypes contained in the sample [
11], cells in culture represent a homogenous system, which can be to a certain extent defined and specifically altered.
Optimized feeder layer technique was adapted for cultivation of mammary gland epithelial cells [
12]. Successful
in vitro expansion of luminal cells together with myoepithelial cells in heterogeneous populations of human breast epithelial cells was achieved. It is assumed that among the bulk of cells forming the body of the tumor only a few drives the tumor outgrow. They are supposed to be derived from the so-called stem or progenitor cells [
13,
14]. Recently, we have characterized a new cell line, EM-G3, possessing some characteristics of putative breast progenitor cells. The cell line was established from the primary culture of breast cancer biopsy sample using the optimized feeder layer technique [
15,
16]. We believe that our method of temporal
in vitro propagation of cells from breast cancer tissues could partially lead to the selection of cells relatively close to putative tumor stem cells [
12,
17]. We performed the 2-DE protein analysis of malignant breast cancer cells cultivated from tissues of different patients in various stages of breast cancer. We tried to find association among possible variations in the expression of proteins and clinical outcome of breast cancer patients. R/computing environment was used to perform statistical analyses [
18]. Namely, the analyses based on the R/maanova package [
19] have been performed. We further employed the data-mining technique GUHA (General Unary Hypothesis Automaton) to reveal possible relations among protein spots and their impact on clinical image [
20]. The GUHA is a method of exploratory data analysis with logical and statistical backgrounds. It automatically formulates and tests a huge amount of hypotheses on relations in data and reveals the "interesting" ones. Some potential candidates for protein markers ensue from these trials.
Methods
Patients
The samples were obtained in the years 1999 – 2002 from women who underwent partial breast resection or radical mastectomy at the General Faculty Hospital in Prague. Patients were chosen unselectively at the time of operation. The patient's written informed consent approved by the Ethical committee of the General Faculty Hospital in Prague was obtained prior to surgery. The morphology of tumors was determined and immunocytochemical staining for hormonal receptors (ER, PR), HER 2/neu and antigen Ki67 was performed. The patients were treated according to the stage-adjusted therapeutic standards. We estimated the clinical outcome of the patients. The patients with follow-up at least three years were chosen for further analysis. The patients were divided into two groups: distant metastase-free after three years and patients with proven distant metastases.
Immunohistochemistry
Paraffin sections 5 μm from formalin-fixed tissues were used. The tissue sections were incubated with primary antibodies ER, Dako (Glostrup, Denmark), clone 1D5, dilution 1 : 100; PR, Novocastra (Newcastle, UK) clone 16, dilution 1 : 100; Ki67, Novocastra (Newcastle, UK) clone MIB-1, dilution 1 : 50. Immunodetection was performed with the universal immuno-peroxidase polymer Histofine, Nichirei Biosciences INC (Tokyo, Japan). Detection of HER 2/neu was performed using HercepTest TM assay detection system, Dako (Glostrup, Denmark). Five percent 3,3'-diaminobenzidine tetrahydrochloride chromogen solution was used for visualization. Positive and negative controls were included in each run of slides.
Cell cultures
Primary cell cultures were isolated from biopsies of human breast carcinomas. The cells were cultured by the 3T3 feeder-layer technique [
12,
17].
The cells in the second or third passage were grown to confluence, harvested and stored in liquid nitrogen in the culture medium containing 10% dimethylsulfoxid and 20% of bovine serum. The cells designated for 2-DE analyses were thawed, seeded, cultivated to confluent layers and harvested as described in Selicharova et al. [
15]. Out of 120 cultivated samples, primary cultures from tumor tissue of 23 patients were suitable for further 2-DE based analysis because of a sufficient amount of cultivated cells (about five millions).
Two-dimensional gel electrophoresis
The cell lysate (70 μg of proteins) in rehydration buffer composed of 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 50 mM DTT, 0.8% (v/v) ampholytes (pH 3–10) was applied to 18 cm linear IPG strips pH 4–7, GE Healthcare (Uppsala, Sweden). 2-DE was performed exactly as described [
15]. Briefly, the IEF of rehydrated strips was performed with stepwise increasing voltage as follows: 250 V for 1 h, 500 V for 1 h, 1000 V for 2 h and 10,000 V for the time period necessary to reach 70,000 Vh in total. The focused strips were equilibrated for 30 min in the solution containing 6 M urea, 20% (v/v) glycerol, 2% (w/v) SDS, 0.05 M Tris/HCl pH 8.8 and 2% (v/v) DTT with traces of Bromphenol Blue. Then free thiol groups were alkylated for 30 min in the same solution containing 2.5% (w/v) iodoacetamide instead of DTT. The SDS-PAGE on gradient gels (8–16%, 4% stacking gel, 19 × 22 cm) was performed in 0.025 M Tris/0.192 M Glycine with 0.1% (w/v) SDS running buffer for 1 h at 16 mA and for about 9 h at 24 mA per gel till the Bromphenol Blue line has reached the bottom of the gel. Three silver-stained analytical gels were prepared from each sample. All the common chemicals were from Sigma (St. Louis, USA) and Fluka (Buchs, Switzerland).
Image analysis
Gels were scanned by a GS-800 Calibrated Densitometer, Bio-Rad (Hercules, CA) at 700 dpi resolution. The images were further processed by PDQuest Advaced 8.0.1 2D Gel Analysis Software, Bio-Rad (Hercules, CA). For computational purposes the file size was reduced to 50% and the images were cropped to frame the same clusters of spots. One or two representative gels per each cell population were used to create a match-set. Spots were detected and matched automatically to a master gel selected by the software. The spot detection and matching were edited manually. The spot boundary tool was applied to detect large spots. The patterns in sections of the gels in appropriate magnification were checked and spots were added manually to the master gel to allow matching unique spots present in the individual gels. The spot quantity table containing all matched spots was generated. The quantity of missing spots was estimated by the software. The means of logarithmic ratios method was used for normalization. The mean of log ratios method of normalization calculates the normalization factor of a gel by calculating the mean of all log ratios (log spot quantity of gel/log spot quantity of master gel) of all matched spots (master gel – gel). The quantity table was exported to a spreadsheet .xls file and submitted to statistical analyses (Additional data file
1).
Statistical analysis and data mining – GUHA (General Unary Hypothesis Automaton)
Independent statistical tests were performed using R/computing environment in version 2.6.0 [
18] and by adapting R/maanova package version 1.8.0. [
19] which has been designed for processing microarray data. It implements sample shuffling. R/maanova provides a permutation method to calculate the nominal permutation p-values for each gene (i.e. spot intensity) using Fs test statistics. Because of multiple testing, the p-values were adjusted to false discovery rate [
21].
The relations among spot intensities and clinical image were analyzed on the basis of data-mining technique GUHA [
20]. The analyzed data were stored in a source database in the form of a table of n rows (objects = 2-DE gels) and m columns (variables = spot intensities). The variables were dichotomized. Each variable was categorized. Categories were actually subsets of ranges of variables given by cut points. The category was evaluated as 1 if the value of a variable dropped within a subset given by the respective cut point otherwise it was evaluated as 0. The settings of cut points were based on the specification and behavior of an impurity function [
22]. In our application we employed the entropy function as impurity function for each intensity variable classified with respect to the clinical variable metastases. The impurity function takes its minimum if all objects are classified as 0 or 1. The maximum is reached if roughly a 50/50 mixture of classes is present in the group. The idea is to split the original group into two sub-groups in such a way that impurity decreases in a maximal way, i.e., that the sum of impurities of sub-groups is minimized. We identified optimal splits and corresponding optimal cut points by a script in MATLAB [
23]. The cut points then enabled us to categorize spot intensities. A GUHA hypothesis is determined by the ordered pair of cedents (
antecedent (A) and
succedent (S)) and by a
quantifier. Cedents are Boolean conjunctions formed from individual categories. The length of a cedent is given by the number of categories forming the conjunction. A cedent of length = 1 corresponds to a single category (simple cedent). Cedents of length > 1 are called compound cedents. For a given object a cedent can be evaluated as 1 (true) or 0 (false). The evaluation stems from the evaluation of single categories forming the cedent and rules for Boolean conjunction. For a given pair of cedents, we can construct a corresponding contingency table by evaluating cedents for all objects in the database and then perform statistical tests on this table. The employed quantifier determines the type of test. We used the Fisher quantifier corresponding to Fisher's exact test. A hypothesis is formally written A ~S, and if it is valid (statistically significant), it is revealed in the GUHA output. Q-values were calculated by q-val package [
21]. Statistical tests were two-sided at the 5% level of significance.
Characterization of proteins
The spots generated from the statistical analyses as significantly changed were researched by their spot ID in the match set created by the PDQuest software. The relative molecular masses (Mr) and isoelectric points (pI) were estimated for each protein from their positions in the gels. The statistically important spots were considered for identification. The preparative 2-DE gels were prepared from cells with a relatively high content of the protein of interest using 400 μg of the cell lysate. They were stained with colloidal Coomassie stain [
24].
Mass spectrometry and protein identification
Selected spots on the preparative gels were excised and destained using 50% acetonitrile in 25 mM ammonium bicarbonate, dehydrated with 200 μl of acetonitrile for 5 min at 30°C using thermomixer comfort, Eppendorf AG (Hamburg, Germany) at 30°C and then vacuum-dried in SpeedVac, Thermo Scientific (Waltham, Ma). Gel pieces were rehydrated and proteins were digested for 8 hours at 37°C in the thermomixer with 30 ng/μl trypsin (Trypsin Gold Mass Spectrometry Grade, Promega, Madison, WI) in 25 mM ammonium bicarbonate. After digestion, peptides were extracted from gel pieces using step by step extraction with acetonitrile gradient (15%–60% acetonitrile with 1% trifluoroacetic acid). The extraction was performed in sonicator, Elma (Singen, Germany) with ice cubes.
Extracted peptides were concentrated in SpeedVac, Thermo Scientific (Waltham, Ma). Tandem electrospray ionization mass spectrometry (ESI-MS/MS) was used to characterize the digests. The ESI-MS/MS was performed in a quadrupole-time of flight (Q-TOF) tandem Micro mass spectrometer (Waters-Micromass) equipped with nanoelectrospray source and coupled to 2-D capillary chromatography CapLC (Waters). Chromatographic separation was achieved using the symmetry 300 Å OPTI-PAC (1 cm × 5 μm) trap column (Waters) and Atlantis dC18 (75 μm × 10 cm × 3 μm) capillary column (Waters). Data were processed by proteomic software Proteinlynx global server 2.1 (Waters) (LC-MS/MS).
Discussion
Breast cancer is one of the most intensively studied cancers. However, the breast cancer research proved to be extremely complicated due to the complex biology of mammary gland [
11,
14]. We believe that our method of temporal
in vitro propagation of cells from breast cancer tissues [
12,
17] could partially lead to the selection of cells relatively close to putative tumor stem/progenitor cells. We suppose that through analysis of these cells we might indicate proteins responsible for the overall tumor behavior. We performed the 2-DE analysis of 23 primary cultures of epithelial cells derived from breast cancer tissues from which seven samples were metastase-positive.
All the 2-DE gels from different samples were similar to each other and were conformable to the normal mammary epithelial (NME) cell sample described in Selicharova et al [
15]. The similarity of individual primary cultures of breast cancer cells was a good prerequisite for performing the comparative proteomic experiment. On the other hand, the experimental design based on the primary cultures substantially decreased the available amount of samples necessary for powerful statistics. From 120 human breast tumors we only obtained 23 usable cell cultures with a sufficient amount of cultured cells (about five millions).
Only few spots apparently varied among sets of gels from individual samples qualitatively or quantitatively. Some of the proteins have been identified (data not shown). We detected variations in quantity of cathepsin D, cathepsin B, squamous cell carcinoma antigen, γ synuclein, cytokeratin 19 and other proteins that have been reported to play a role in cancer etiology [
25‐
29]. We also found isoelectric variants of several proteins arising from common polymorphism that might have impact on the cancerogenesis (glutathione transferase ω, glyoxalase I) [
30,
31]. On the other hand we have not observed variation in quantity of heat shock proteins (HSP 90, HSP 60, HSP 27) or the molecular chaperone 14-3-3 σ among our breast cancer cell cultures. These proteins have been reported to be altered in the breast cancer [
5,
32]. Although these findings were exciting, we could not demonstrate without further validation the significance of above mentioned observations and their connectedness with the tumor characteristics.
We intended to perform computational quantitative analysis of our 2-DE data. Any computer software designed to align and compare 2-DE gels must somehow deal with distorted spot patterns that are pertinent to the methodology. So far, the spot detection and matching must be supervised by a researcher [
10] which was another bottleneck of our experimental design. We have analyzed our gels by the PDQuest Advanced 8.0.1 2D Gel Analysis Software since it is available in our laboratory. The software is not designed to compare multiple groups of samples that have arisen from our experimental setting (23 samples in triplicate). We compromised between the quantity of data and our ability to process them. We have finally constructed a match set composed of 44 cropped gels yielding well distinguished spot patterns. The spot patterns were carefully studied and matching was adjusted in each gel. Finally, 245 spots were matched and their normalized quantities in each gel were subjected to statistical analyses and data mining. The quantification of proteins in 2-DE gels is relative and it is a matter of dynamic range versus sensitivity [
8,
24]. The gels were silver-stained because we intended to achieve the utmost sensitivity to be able to detect possible changes in the expression of less abundant proteins in our samples. We are aware that this type of staining might be a source of inaccuracy. The correlation coefficients between technical replicates of gels ranged from 0.76 to 0.88 which is normal for the 2-DE analyses [
9]. However, the overall variability within the data cut down the attainable statistical significance of our results. In spite of all the disputable issues we indicated spots correlated with metastases in the set of patients. We used two different statistical approaches to search for significant correlations between clinical data and spot intensities. The GUHA [
20] enriched with cut-off points and q-values computes with categorical variables sorted according to the cut points. The R/maanova [
19] computes with integers that correspond to the spot densities. The outputs of the methods slightly differ but in general the same spots were found with both the methods as significantly correlated with the clinical data.
Spots 7305, 5104 and 1606 fulfilled the statistical criteria in either analysis. Spot 7305 was identified as 2,3-trans-enoyl-CoA isomerase. The enzyme is involved in mitochondrial β-oxidation of unsaturated fatty acids [
33]. The defects in distribution of polyunsaturated fatty acids in healthy and cancerous breast tissues have been documented [
34]. Decreased levels of this enzyme might have impact on the aberrant behavior of cancer cells.
Spot 5104 was identified as glutathione peroxidase 1, a selenium dependent enzyme that detoxifies hydrogen and lipid peroxides. The protective function of selenium against cancer mortality has been documented. It remains unclear how selenium decreases cancer risk and whether glutathione peroxidase is involved in the action [
35]. The lowered levels of the enzyme in our group of patients with metastases further support possible involvement of the glutathione peroxidase 1 in the anticancer defense.
Spot 1606 was increased in the group of patients with metastases and it appeared to be abundant. The spot had a diffused pattern in the 2-DE gel and the MS spectra were complicated. It was identified as nucleophosmin, a highly phosphorylated protein associated with nucleolar ribonucleoprotein structures [
36]. The protein is known to be extensively post-translationally modified. It might be a reason for its 2-DE pattern but we cannot exclude that there might be other proteins contained in the spot. Nucleophosmin is overexpressed in many types of human solid tumors. It is a multifunctional protein and its physiological function in tumorigenesis is controversial [
37].
Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions
JV designed and coordinated the study, evaluated and interpreted the clinical data, contributed to statistical computations and drafted the manuscript. IS designed and performed the analysis of 2-DE gels, prepared the samples for identification of proteins, analyzed, evaluated and interpreted the proteomical data and drafted the manuscript. KS optimized the methodology for preparation of 2-DE gels and prepared the gels. MS performed the LC-MS/MS experiments and protein identifications. EM contributed to the study concept and design, was responsible for cell cultures establishment and handling and contributed to writing of the manuscript. EB and MP established and handled the cell cultures. ZV performed the imunohistochemistry analysis. DC was responsible for the GUHA and statistical analyses and contributed to writing of the manuscript. JJ contributed to the study concept and design and critically revised the manuscript for important intellectual content. All authors read and approved the final manuscript.