Background
Breast cancer is one of the most common cancers in women worldwide [
1]. Early detection and treatment of breast cancer in patients showed good prognosis, but current diagnostic techniques such as mammography, MRI and PET are not sufficient to detect early stages of breast cancer efficiently [
2]. Finding a series of relevant biological markers for early cancer detection and diagnosis and monitoring the therapeutic response can definitely improve our ability to manage breast cancer [
3‐
5].
An emerging issue of proteomics is to discover novel biological markers that can be applied to early detection, disease diagnosis and prediction of response to therapy [
6]. Proteomics has advanced direct profiling of differentially expressed proteins between diseased and control samples, or at various stages of diseases under particular environments [
7], and thus become a key technology in biomarker development pipeline. The biomarker pipeline can be divided into four phases: discovery, qualification, verification and validation phases [
8]. Discovery phase is an unbiased and semiquantitative process, usually comprising simple binary comparisons between diseased and normal state. Tissues, body fluids, or even model cell lines are being utilized as proteome sources. The 'products' of the discovery phase are confirmed in the next qualification phase. Immunoassays with commercially available antibodies can be used. In verification phase, the analysis is extended to a larger number of samples, now incorporating a broader range of cases and controls. Although any proteome source can be used in the discovery phase, biomarkers that are detected and validated in specimens obtained by less invasive techniques, such as plasma or serum, are more desirable [
8,
9]. The blood serum or plasma contains enormous complexity of biological components which reflect spatio-temperal changes of diseased cells, tissues, or organs [
10]. Knowing any change in the containment of blood caused by a specific disease like cancer will help us understand and develop detection and further management of the disease.
The objective of our study is to discover new breast cancer biomarkers using blood plasma as proteome sources. We previously analyzed breast cancer tissues [
11] and secretome from a breast cancer cell line [
12] to detect cancer-relevant proteins as potential biomarkers. In the current study, we analyzed plasma proteomes using an isotope-coded affinity tagging (ICAT) technique. This method has been developed to analyze relative amounts of cysteine-containing peptides in tryptic digests of protein extracts [
6,
13]. All the plasma samples used were depleted of six high-abundance plasma proteins by affinity chromatography. The biomarker candidates discovered were then confirmed and verified with pooled or individual samples, and further with a blinded set of multiple samples by Western blot assays. The employed ICAT and Western blot assay strategy enabled us to identify and quantify biotinidase (BTD) as a potential breast cancer biomarker in plasma.
Methods
Subjects
Blood samples were collected from breast cancer patients and normal healthy volunteers at the Seoul National University Hospital (Seoul, Korea). The use of human samples for research purpose was authorized by the Institutional Review Board of Seoul National University Hospital, and all the patients and volunteers agreed to take part in the experiment with the name signed on the informed consent document. The plasma sample was depleted of top six abundant serum proteins using a multiple-affinity MARS column (Agilent Technologies, Palo Alto, CA, USA) [
12], and precipitated with trichloroacetic acid. The pellet was dissolved in ICAT denaturation buffer (6 M urea, 0.05% SDS, 5 mM EDTA, 50 mM Tris-HCl, pH 8.3).
ICAT labeling and sample preparation
A pooled plasma sample from 6 breast cancer patients was labeled with a 'heavy (H)' ICAT reagent (Applied Biosystems, Framingham, MA, USA), whilst another pooled sample from 6 normal healthy women was labeled with a 'light (L)' reagent. We pooled equal amount of proteins from individual samples. Proteins (100 μg) in the denaturantion buffer were first reduced with 250 mM tris(2-carboxyethyl)phosphine for 30 min. ICAT-labeling was then performed using 350 nmol ICAT reagent with gentle shaking for 2 hr at 37°C, and terminated with 1.75 μmol DTT for additional 5 min. The H- and L-ICAT-labeled samples were mixed, diluted 10 fold with 50 mM Tris (pH 8.0), and digested with 5 μg of trypsin (Promega, Madison, WI, USA) for 16 hr at 37°C. The reaction was quenched at 0.5% phosphoric acid. The tryptic digest was applied on a polysulfoethyl A column (Western Analytical, Murrieta, CA, USA) equilibrated with 10 mM KH2PO4 in 25% ACN (pH 3.0) using an ÄKTA Explorer system (GE Healthcare Biosciences, Uppsala, Sweden), eluted with a 40-min gradient from zero to 0.4 M KCl, and collected on 40 fractions. The SCX fractions were neutralized by the addition of 10 volumes of 2× PBS, loaded on an ICATR avidin-catridge (Applied Biosystems), and then washed with PBS followed by 50 mM ammonium bicarbonate in 20% methanol, pH 8.3. ICAT-labeled peptides were eluted with a solution of 0.4% TFA in 30% acetonitrile, dried in vacuo, redissolved in 90 μl of 95% TFA, incubated at 37°C for 2 hr to cleave off the biotin moiety from the ICAT label, and finally dried again.
Liquid chromatography and tandem mass spectrometry
An Agilent nanoflow-1200 series HPLC system was connected to a linear ion trap mass spectrometer (LTQ, Thermo Electron, San Jose, CA, USA). The dried ICAT-labeled peptide sample was reconstituted with 20 μL of 0.4% acetic acid, and an aliquot (1 μL) was injected to a reverse-phase Magic C18aq column (13 cm × 75 μm) equilibrated with 95% buffer A (0.1% formic acid in H2O) + 5% buffer B (0.1% formic acid in acetonitrile). The peptides were eluted in a linear gradient of 10 to 40% acetonitrile over 75 min. The MS survey was scanned from 300 to 2000 m/z, and followed by three data-dependent MS/MS scans with the following options: isolation width, 1.5 m/z; normalized collision energy, 25%; dynamic exclusion duration, 180 sec.
Database searches
Peak lists were generated using Extract-msn program in Bioworks package v3.1 (Thermo Electron) with the following parameters: minimum ion count threshold, 15; minimum intensity, 100. The peak lists were compared against the human International Protein Index database including known contaminants (IPI, versions 3.24, European Bioinformatics Institute,
http://www.ebi.ac.uk/IPI) using the SEQUEST (TurboSequest version 27, revision 12) allowing two missed cleavages (trypsin) and ±0.5 and ±3 Da mass tolerance for MS/MS and MS respectively. ICAT option (+227.26 Da fixed modification plus +9 Da variable modification) on cysteine residue was used and a variable modification of methionine oxidation (+16 Da) was allowed. Peptide assignment and quantification were performed with the Trans-Proteomic Pipeline provided by Institute for Systems Biology (TPP, version 4.0,
http://www.proteomecenter.org). The SEQUEST search output was used as an input for Peptide-Prophet module and peptides with probabilities greater than 0.05 were included in the following Protein-Prophet. Proteins with probabilities greater than 0.5 were put into manual inspection to evaluate MS/MS spectral quality [
14]. False discovery rate was 10% at the cut-off value of 0.5 before manual inspection. From a list of 238 proteins, 30 proteins were removed due to lack of quantification information and 53 proteins were removed due to their unreliable mass spectra during manual inspection. The number of removed proteins (53 ea) by manual inspection exceeded the number of estimated false positives (~24 ea). As a result, the false discovery rate for the final data set containing 155 proteins would be almost zero.
Western blot analysis
Plasma samples were resolved on 10% SDS-PAGE gel, and electro-transferred to nitrocellulose membrane (Bio-Rad Laboratories, Hercules, CA, USA). In order to handle a large number of samples that exceeded loading sites of a gel, two or more gels were placed on a transfer membrane to minimize experimental bias of western blots. Immunobloting analyses were performed as described previously [
12], using antibodies against neural cell adhesion molecule L1 (CHL1; Atlas, Stockholm, Sweden), α1-acid glycoprotein 2 (ORM2; Proteintech Group Inc., Chicago, IL, USA), monocyte differentiation antigen CD14 (CD14; Abcam, Cambridge, MA, USA), BTD (GeneTex Inc., San Antonio, TX, USA), and glutathione peroxidase 3 (GPX3; Abcam).
Analysis of gene expression microarray data
Large cohort tissue microarray datasets of breast cancer patients analyzed by Human Genome U133A platform (GPL96) were downloaded from the Gene Expression Omnibus (GEO) database (
http://www.ncbi.nlm.nih.gov/projects/geo/). The samples included 1,715 cases of biopsied breast cancer tissues (GSE1456, GSE2034, GSE2990, GSE3494, GSE4922, GSE5364 and GSE11121) and 95 cases of laser-capture microdissected (LCM) breast cancer tissues (GSE5847). The latter 95 samples were considered to be positive controls for breast cancer, since the LCM would be effective to get rid of contamination of normal tissues or blood cells in the breast cancer biopsies. As references, 39 microarray data of normal human breast tissues analyzed by the same or upgrade version (GPL570) were obtained from the GEO database. The obtained microarray data were analyzed by the R-package 2.7.2 using an Affy package and a gcRMA package to make normalization and adjustments of the background and average intensities. The average values of gene expression were calculated from replicate probes. From the log2-transformed values of each tumor sample, the log2 medians of included normal samples were subtracted to calculate a tumor-to-normal ratio (fold change) conveniently.
Statistical analysis
Band intensities of Western blot images were quantified using ImageQuant version 5.2. (GE Healthcare Biosciences), and compared by Wilcoxon rank-sum test using SPSS 12.0 (SPSS, Chicago, IL, USA) due to the small sample size. For the statistical analysis of gene expression data, Wilcoxon rank-sum tests were performed with a 95% confidence interval, as the sample size of breast cancer tissues was greater than the size of normal controls and also the two data sets had been collected independently from large cohort breast cancer studies and standard human tissue microarray studies.
Discussion
In this study we discovered serum BTD as a potential breast cancer biomarker through the biomarker development pipeline encompassing mass spectrometry based screening and independent downstream immunoblot assays. Biomarker candidates discovered by ICAT analysis of plasmas from 6 breast cancer patients and 6 age-matched normal healthy controls were examined by Western blot in the same sample set. The two candidates, BTD and GPX3, confirmed by this approach were next tested with immunoblot assay in a blinded set of breast cancer and control to ascertain the markers ability to differentiate the two groups.
The ICAT method applied here for the screening of differentially expressed proteins has low-throughput and is not suitable for a large number of samples. Therefore, a sample pooling strategy was employed to overcome this drawback. Although pooling reduces the expense of costly assays, nevertheless their still remains a possibility to obtain biased quantification result stemming from individual variations, which necessitates an independent downstream assay in individual samples. In the immunoblot assays after the ICAT discovery phase, we had to drop some candidate markers like GPX3. Our results suggest that the sample pooling strategy has both advantages and disadvantages. We adapted immunodepletion of top six high-abundance proteins to dig deep into low abundance proteins since plasma proteins are present over a wide dynamic range in concentration. This antibody-based separation system has demonstrated high efficiency to remove the specifically targeted proteins as well as both reproducibility and selectivity [
16‐
19]. Actually, it was effective in our study enough to detect protein that exist at about a few μg/ml in plasma such as L-selectin [
20].
Differential profiling by ICAT method enabled us to identify and quantify a total of 155 plasma proteins. The number was much smaller than that of proteins identified by the same method with tissue extract proteome [
15], which was accounted for by the small number of the proteins identified based on single peptide match. On the contrary, spectral counts for each peptide count were distributed more widely: for example, spectral counts ranged from 2 to 23 for the proteins identified with two unique peptides. This is clearly due to the concentration depth of plasma proteome. Our result indeed covered 4 orders of magnitude from the lowest L-selectin at 0.67 μg/ml to the highest α-2-macroglobulin at 1.8 mg/ml [
20]. The lower detection limit was similar to another study performed on plasma proteome [
21], which indicated that ICAT analysis for immunodepleted plasma samples is effective for biomarker discovery medium-abundance protein. Comparing current plasma proteome list with those of our previous studies [
11,
12] and a recent report related to multiplex serum biomarker [
22], several proteins, such as transthyretin, vitamin D-binding protein, and endorepellin were commonly identified. However, in the current study, these proteins showed little change among breast cancer patients and normal healthy controls (Additional file
1), and hence were excluded from further analysis. This is due to the difference in sample sources and in the screening method. 2DE or SELDI-TOF was used in the previous studies while we adapted ICAT strategy in this study. As mentioned earlier, protein quantification by ICAT is based only on cysteine containing peptide whereas 2DE reflects whole protein features including proteolytic processing. In addition, the previous study also could not draw decisive conclusion about discriminatory power of transthyretin and vitamin D-binding protein [
22]. In case of endorepellin, we discovered LG3 fragment of the protein as a biomarker in the previous study [
12]. In contrast, the single ICAT peptide matched to endorepellin covered other part than the LG3 fragment in the current study. Moreover, common proteins such as heptoglobin, serum albumin and transferrin were detected not due to their differential regulation in breast cancer but due to incomplete depletion of these proteins from plasma during immunoaffinity chromatography (in this study) or incomplete removal of blood vessels in the preparation of breast tissues (in the previous study [
11]). Therefore, such proteins were not put into the next qualification phase of this study, while the proteins chosen for qualification were seemingly novel proteins discovered in relation to breast cancer.
Among the proteins showing differential level between breast cancer patients and normal healthy women, we chose five proteins for further verification. In this experimental flow, 4 biomarker candidates including ORM2, CD14, BTD and GPX3 showed relatively similar quantification results both in ICAT and Western blot. In case of CHL1, we observed an inconsistency between two quantification results (Table
2). According to ICAT result, CHL1 was quantified based on a single peptide (LHMLELHCESK; Additional file
2) in which the methionine residue had formerly been oxidized. Methionine oxidation occurs frequently during sample preparation and handling. Therefore, it seems inappropriate to quantify proteins based on a single oxidized peptide. Increased level of ORM2 [
23] and CD14 [
24] or down regulation of GPX3 [
25] appears to be reasonable when interpreted in relation to their biological functions. However, we could not confirm their ICAT fold changes in the subsequent verification steps. It is likely that expression levels of ORM2, CD14 and GPX3 vary with age or with other factors that are currently unknown. Other possibility like stochastic variation cannot be excluded. Interestingly, the mRNA level of BTD in breast cancer tissue also changed compared to normal breast tissues. There is no evidence and it is unlikely that expression change of BTD in breast cancer tissues affect their levels in plasma. Therefore, the abundance change of BTD observed in the plasmas of breast cancer patients need not necessarily be the same as the abundance change of tissue mRNA. Nevertheless, it is worthy investing BTD for their potential use as tissue biomarkers in future study.
Among five candidates, BTD subsisted until the last step of verification of the biomarker pipeline. BTD is known to catalyze the release and recycling of endogenous biotin [
26]. It is known to be secreted into blood circulation [
27], and is highly active in the serum, liver, kidney and adrenal glands. Its deficiency results in various diseases such as seizures, hypotonia, hyperammonemia, and so forth [
28]. We first reported here that its level was consistently down-regulated in breast cancer plasma. It is, yet, difficult to explain how BTD is down-regulated in breast cancer plasma. Interestingly, transcriptional levels of BTD in breast cancer tissues were also down-regulated. So, it merits further studies to elucidate down-regulation of BTD in relation to the behavior of breast cancer cells. Despite many uncertainties in the cellular and molecular mechanism of BTD, it is apparent from our results that BTD is down-regulated in breast cancer plasma. It will be useful and applicable for clinical use alone or in combination with other biomarkers in detecting breast cancer by less invasive techniques using plasma samples.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
UBK performed all studies and drafted the manuscript. YA performed Western blot analysis. JWL collected the plasma samples used in this study. YHK downloaded and analyzed GEO microarray data. JK and MHY provided analytical platforms. DYN and CL designed the study. CL revised the manuscript. All authors were involved in the conception of the study and data interpretation. All authors have read and approved the final manuscript.