Introduction
The diagnosis of breast cancer relies on an integrated approach using clinical and physical examinations, imaging mammography and ultrasound, and histopathology. Although serum biomarkers have not yet played a major role in breast cancer diagnostic or prognostic practice [
1,
2], an effective biomarker panel in an easily accessible biological fluid would be a valuable and minimally invasive adjunct to other clinical and pathological approaches [
3]. As whole blood provides a dynamic representation of physiological and pathological status, serum or plasma represents the most extensively studied biological matrix for cancer biomarkers [
4]. Therefore, analysis of the serum or plasma proteome may be an important step to achieve accurate diagnosis or prognosis.
For breast cancer biomarker discovery, proteins and peptides have been identified in breast cancer cell lines [
5‐
7], nipple aspirate fluid [
8,
9], and normal, benign, premalignant, and malignant breast tissue [
10‐
13], in addition to serum and plasma [
1,
4,
14]. Numerous proteomics-based studies of serum and plasma have reported discriminatory peptide/protein ion peaks, either as identified proteins or on the basis of their mass/charge (
m/z) values, for breast cancer diagnosis or prognosis. However, not all have reported protein identities for the discriminatory ion peaks.
This study used surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry (MS) protein chip technology to discover a unique combination of serum biomarkers for breast cancer and confirm them in an independent sample set. The markers were identified by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF)/TOF MS and verified immunologically. We also investigated the association between this serum protein panel and patient outcome to determine its potential prognostic utility.
Methods
Serum samples
The study involved a total of 320 human serum specimens and was approved by the Human Research Ethics Committee of the Northern Sydney Local Heath District, Sydney, Australia. The training set samples from patients diagnosed with breast cancer (BC, n = 99) and control samples from healthy volunteers (HV, n = 51) were obtained from the Kolling Institute Breast Tumour Bank, at the Royal North Shore Hospital, Sydney, Australia. The validation set consisted of 100 independent BC serum samples from the Australian Breast Cancer Tissue Bank, Sydney, Australia and 70 HV samples. Sample sizes were estimated to allow the detection of a difference of at least 25% in a measured parameter between sample groups at the 5% significance level (α = 0.05) with a statistical power of at least 0.8, assuming group coefficients of variation of 50% (σ = 0.5). All patients whose tumor samples (or healthy tissue samples) are deposited into either of the two tissue banks used in this project had given prior written informed consent to the banking of their tissue and its use in any future research projects. Therefore additional patient consent was not required for this specific project. The median ages of patients included in the training and validation sets were 59 (range 28 to 92) and 58 (31 to 86), respectively. For HV control groups, serum samples were age-matched to BC samples within five-year age brackets. All sera were stored at -80 ºC until analyzed by SELDI-TOF MS.
Preparation of serum sample and protein chips for SELDI-TOF MS
All serum samples were initially denatured in buffer containing 8 M urea, 1% CHAPS (3-[(3-cholamidopropyl) dimethylammonio]-1-propanesulfate) and analyzed by TOF MS on SELDI protein chip arrays (Bio-Rad, Hercules, CA, USA) as previously described [
15]. Four chip types with different adsorptive surfaces were used: Q10 (strong anion-exchange), Cu
2+-IMAC30 (immobilized metal affinity capture), CM10 (weak cation-exchange), and H50 (hydrophobic). The four chip types were pre-equilibrated twice for 5 min with 5 μl of binding buffer (50 mM Tris-HCl pH 8.0 for Q10; phosphate-buffered saline (PBS) pH 7.2 for IMAC30; 50 mM sodium acetate pH 6.0 for CM10; 10% acetonitrile (ACN) containing 0.1% trifluoroacetic acid (TFA) for H50). Denatured serum protein samples were diluted 1:5 with the respective binding buffers and 5 μl of each diluted sample was pipetted onto the chips. All samples were analyzed in duplicate. Chips were then incubated with shaking for 90 min at room temperature (settings: form 20, amplitude 4) on a MicroMix 5 (EURO/DPC Instrument Systems, Flanders, NJ, USA). After washing twice with the binding buffer, each spot was treated with 2 × 1 μl of 50% sinapic acid (Sigma-Aldrich, St Louis, MO, USA) in 50% ACN, 0.5% TFA and air dried.
MS serum protein profiling and data analysis
All mass spectra were obtained in the m/z range of 2,500 to 70,000 with the ProteinChip SELDI System Enterprise Edition (Bio-Rad). Spectra were averaged from 583 laser shots evenly distributed across each spot. Mean values from duplicate spectra for each sample were used in all subsequent analyses. The m/z value for each peak was determined using external calibration with protein standards: bovine insulin (5,734.51 Da), equine cytochrome c (12,361.96 Da), equine apomyoglobulin (16,952.27 Da) and bovine carbonic anhydrase (29,023.70 Da) from Sigma-Aldrich. After calibration, spectra were baseline-subtracted and normalized using the total ion current between 2,500 and 30,000 m/z. Of the original 320 samples, 19 were excluded when their mass spectra did not meet the normalization criteria. A total of 602 spectra were subjected to full analysis (301 samples: BC = 187 and HV = 114, in duplicate) on each of four chip types (total = 2,408 spectra).
Clustering analysis of protein peaks (ProteinChip Data Manager version 4.1, Bio-Rad) was performed to identify protein patterns related to BC and HV groups. Data analysis across all four protein chip types was achieved using univariate analysis of individual peaks by Mann-Whitney
U test (IBM SPSS version 20.0, IBM Corp., Armonk, NY, USA). For initial discovery, biomarker panels were developed on the training data set of 99 BC and 51 HV samples. All protein peaks that significantly discriminated BC from HV at
P <0.005 were then subjected to multivariate analysis using forward and reverse binary logistic regression (SPSS) to develop the training model. The discriminatory power of each putative serum biomarker was further described using receiver operating characteristic (ROC) area-under-the-curve (AUC) analysis [
10,
16]. External validation was also carried out using an independent set of 100 BC and 70 HV serum samples aged-matched within five-year age brackets.
Protein peak identification
Immunological validation of protein biomarkers and protein peak identification was achieved by immunodepletion using Protein G Dynabeads (Life Technologies Corp., Carlsbad, CA, USA). For complement C3a des-arginine anaphylatoxin (C3a-desArg), 1.5 mg of Protein G beads was incubated with 5 μg of anti-C3a/C3a desArg mouse monoclonal antibody (Abcam, Cambridge, UK) and incubated for 30 min at room temperature with rotation. After washing with 200 μl of PBS containing 0.02% Tween 20 to remove free antibody, the immobilized antibody was added to 50 μl of diluted serum samples and incubated for 2 h at 4 ºC with rotation. The captured protein-antibody complex was washed twice with 200 μl of PBS and the bound protein eluted at room temperature in 20 μl of 0.1 M glycine, pH 3.0. The starting material, immunodepleted samples and the eluted proteins were monitored by SELDI-TOF MS on normal-phase NP20 chips (Bio-Rad). For apolipoprotein CI (ApoCI) and transthyretin (TTR) a similar procedure, using rabbit anti-ApoCI polyclonal antibody (Abcam) and anti-prealbumin monoclonal antibody (Abcam) respectively, was followed.
Immunological confirmation of serum protein markers by western blotting
Three putative protein markers were also examined by western blotting. Human sera from BC (n = 4) and HV (n = 4) were separated by 4 to 12% SDS-PAGE (Invitrogen, Carlsbad, CA, USA) and transferred to polyvinylidene difluoride membrane (Bio-Rad). Membranes were blocked for 1 h at room temperature with 5% skim milk. Western blotting was conducted using primary antibodies against C3a/C3a desArg (mouse monoclonal, Abcam) or ApoCI (rabbit polyclonal, Abcam) at 1:1000 dilution and TTR (mouse monoclonal antibody to human prealbumin, Abcam) at 1:2000 in 5% skim milk. Secondary antibodies, peroxidase-linked anti-mouse immunoglobulin G (IgG) (1:2000) or anti-rabbit IgG (1:2000), respectively, were added for 1 h at room temperature and protein bands were visualized by enhanced chemiluminescence using Amersham ECL Prime Western Blotting Detection Reagent (GE Healthcare Life Sciences, Little Chalfont, UK). Western blot data were imaged using the LAS 3000 imaging system (Fujifilm, Stamford, CT, USA) and the images were analyzed with MultiGauge version 3.0 software (Fujifilm). Correlations between densitometric analysis by western blotting and SELDI-MS peak intensities were also examined.
Protein identification by MALDI-TOF/TOF MS
Human sera were fractionated on a weak cation-exchange HiTrap FF column (GE Healthcare) with a linear gradient from 0 to 600 mM NaCl in 25 mM Na acetate pH 6.0 using an ÄKTA Purifier system (GE Healthcare). Fractionated proteins were monitored by SELDI-TOF MS on NP20 chips. Fractions containing a 3.8 kDa putative biomarker were further purified using reverse-phase liquid chromatography (RP-LC) on a 250 × 4.6 mm Jupiter 5 μm 300-Å C18 column (Phenomenex, Lane Cove, Australia), eluted with a 30-min linear gradient from 15 to 60% ACN in 0.1% TFA at 1.5 ml/min. After freeze drying the fraction containing the protein of interest, it was reconstituted in 15% ACN, 0.1% TFA and analyzed using MALDI-TOF peptide mass fingerprinting (PMF) and MS/MS on a Bruker UltrafleXtreme MALDI-TOF/TOF MS (Bruker Daltonics, Bremen, Germany), using an MTP AnchorChip target (Bruker Daltonics) and α-cyano-4-hydroxycinnamic acid as matrix.
To identify the protein peak at 28.2 kDa, human sera were fractionated by anion-exchange chromatography using Q ceramic HyperD F sorbent (BioRad) by means of stepwise pH elution from pH 9 to pH 4 as previously described [
17]. Fractionated proteins were monitored by SELDI-TOF MS on NP20 chips. Final identification was achieved after liquid chromatography (Ultimate 3000 nanoLC, Thermo Fisher Scientific, Waltham, MA, USA) on an Acclaim PepMap RSLC C18 2 μm, 100 Å, nanoViper guard (75 μm × 20 mm) and analytical (75 μm × 150 mm) column (Thermo Fisher Scientific), using a 2 to 79% ACN/0.05%TFA gradient at 300 nl/min. Fractions were spotted onto an MTP AnchorChip target (Bruker), and analyzed by MS/MS using the UltrafleXtreme MALDI-TOF/TOF MS (Bruker).
Statistical analysis
Univariate analysis by the Mann-Whitney U test (SPSS Inc., Chicago, IL, USA) was used to distinguish sera from patients with breast cancer from healthy controls. Further multivariate analysis by binary logistic regression was also achieved by SPSS. The correlation between the levels of the five serum markers, individually and in combination, with tumor pathologic variables (histological grade, tumor size, lymph node involvement, estrogen receptor (ER) and progesterone receptor (PR) status and human epidermal growth factor receptor 2 (HER2) overexpression) were investigated by multiple linear regression (SPSS). We defined the median of combined peak intensity for all group patients as the cutoff value for the survival data analysis. Disease-free survival analyses were estimated using the Kaplan-Meier method and the model differences in survival time were tested using the log-rank test.
Discussion
Patient blood samples are an ideal source of disease biomarkers owing to their ease of access, and many studies have identified possible candidates, but few have overcome validation and reproducibility issues to achieve clinical application [
22]. In the present study, we used protein chip mass spectrometry to discover and identify a unique panel of five serum proteins that, in combination, discriminate between sera from breast cancer patients and healthy volunteers with high sensitivity and specificity. The five-protein panel was developed by multivariate analysis of a larger group of proteins found to be significantly regulated in breast cancer, and validated on an independent data set. Whereas the sensitivity of the five-protein parameter was somewhat lower in the validation set than the training set, the specificity was slightly higher in the validation set. A simplified four-protein panel, from which data for a fragment of ApoH (
m/z 3808) was omitted, showed considerably less specificity than the five-protein panel on both data sets (that is, it classified an increased number of false positives), but remained highly sensitive in detecting samples from women with BC.
When tested for its ability to predict disease-free patient survival, the median value of the five-protein parameter separated patients into significantly different groups, those with values above the median showing more rapid disease recurrence than those with values below the median. Using the four-protein parameter this significant discrimination was lost. Interestingly, the prognostic value of the five-protein parameter appeared to be restricted to women with ER-negative tumors, none of whom showed disease recurrence over the monitoring period if their five-protein parameter had a value below the median. Conversely, in this ER-negative group, almost half of the women had disease recurrence within the monitoring period if their five-protein parameter was above the median value. These distinctions were not seen among women with ER-positive tumors. Therefore we conclude that, in women with ER-negative tumors, the five-protein parameter appears to have strong prognostic value within the first five years. It is recognized that, owing to the use of endocrine therapy, ER-positive disease is less likely to relapse early [
23]; therefore longer follow-up will be required to ascertain prognostic utility in this subgroup.
Using a combination of mass spectrometry and immunological methods, the proteins were identified as a fragment of ApoH, ApoCI, C3a-desArg, TTR, and ApoAI. Among this five-peak panel, three (C3a-desArg, TTR and ApoH) were increased in sera of the breast cancer patients compared to that of HV subjects, while ApoCI and ApoAI were decreased in cancer. Each of these serum proteins has previously been associated with breast cancer in various studies, but this study is the first to identify the unique prognostic value of combining their serum concentrations into a single parameter. The combined value was also significantly associated with tumor size (P = 0.018) and lymph node involvement (P = 0.016).
Human complement C3 is the most abundant complement protein in human serum. C3 convertase exists in two forms (C3bBb and C4bC2a) and cleaves only C3, a central molecule of the complement system, between residues 726 to 727 (Arg-Ser), generating C3b and an N-terminal fragment, C3a, (8.9 kDa) [
24]. C3a has high biological activity and is able to trigger the degranulation of mast cells and basophils, which produces a local inflammatory response. The desArg form represents a stable inactivated form of complement C3a. C3a-desArg was previously observed to be higher in BC sera compared to healthy controls in several studies [
14,
21,
25,
26] with a
m/z range of 8900 to 8941 observed on IMAC-Ni protein chips. Increased C3a-desArg serum levels have also been reported in hepatocellular and colorectal cancer [
27,
28]. In our study, we identified this protein at
m/z 8916 on Q10 chips alone, with significant discrimination between breast cancer patients and healthy controls.
Transthyretin (TTR, also known as prealbumin) is a liver-derived secreted protein and is the major serum carrier of thyroid hormones, thyroxine and tri-iodothyronine. TTR is also involved in the transport of retinol through its interaction with retinol-binding proteins. Differential levels of TTR in serum have been linked to several cancers, including breast [
29,
30], ovarian [
31] and hepatocellular carcinomas [
32]. Five isoforms of TTR have been previously demonstrated by MALDI analysis after immunoaffinity capture [
33]: full-length TTR (13,758 Da), a form truncated N-terminally by 10 residues (12,210 Da), and the three modified isoforms (Cys-TTR at
m/z 13876, CysGly-TTR at
m/z 13924, and glutathionylated-TTR at
m/z 14062). In our study, we identified the peak at
m/z 13870 as full-length TTR; however, a peak at
m/z 13756 detected by Q10 protein chip was also significantly upregulated in the serum of breast cancer patients (Table S1 in Additional file
2). Only the isoform that most likely corresponds to Cys-TTR (
m/z 13870 in this study) was computationally selected into the final five-protein panel.
Apolipoproteins bind lipids to form lipoproteins that transport the lipids through the lymphatic and circulatory systems. Serum and plasma lipoprotein metabolism is regulated and controlled by the specific apolipoprotein (Apo-) constituents of the various lipoprotein classes such as ApoAI, ApoCI, ApoH (beta2 glycoprotein) and others. Several classes of apolipoprotein in serum or plasma have been discovered as putative breast cancer biomarkers using proteomic techniques including SELDI-TOF, MALDI-TOF/TOF, 2D-iTRAQ-LC-MS/MS, and 2D-LC MS/MS [
19‐
21,
30,
34]. We observed that levels of ApoAI and ApoCI were significantly downregulated in breast cancer patients, while a peptide identified as a fragment of ApoH was significantly higher in BC. A previous study also identified both ApoAI and ApoCI by SELDI-TOF as part of a multiprotein panel evaluated as a predictor of metastatic relapse in high-risk BC patients [
20]. Decreased serum ApoAI has also been found in other types of cancer including ovarian [
31] and bladder carcinomas [
30]. ApoAI and ApoAI mimetic peptides have been shown to inhibit tumor development in a mouse model of ovarian cancer, suggesting that ApoAI may not only have potential as a biomarker, but may also have therapeutic utility in this disease [
35]. Serum ApoCI has also been previously found to be decreased in breast cancer patients compared to healthy control groups [
21]. ApoH or beta2 glycoprotein was recognized immunologically over 30 years ago as being increased in the serum of breast cancer patients [
36], but the 3808 Da ApoH fragment that we found to be increased in breast cancer sera has not been reported previously.
Competing interests
All authors declare that they have no competing interests.
Authors’ contributions
LC collected the MS data and performed the statistical analysis, contributed substantially to data interpretation, and drafted the first manuscript. KM contributed to study design, acquisition of data on patient outcomes, and data interpretation. LP contributed to acquisition of data for protein biomarker identification. FMB and DJM contributed to study design and data interpretation. RCB coordinated the study design, data interpretation, and manuscript preparation and revision. All authors read and approved the final manuscript.