Background
Breast cancer is a heterogeneous disease driven by a continuum of mutations and abnormal gene/protein expression that controls the tumourigenic phenotype and molecular mechanisms underpinning the complexity of its clinical behaviour [
1]. To select systemic therapies, current treatment guidelines combine traditional prognostic factors (stage, tumour size, histologic grade, nodal status) with estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (Her2) expression status. However, these conventional prognostic algorithms are insufficient to capture the biologic diversity of breast cancer and impede effective tailoring of individualised treatment strategies [
2]. In the post-genomic era, advances in prognostic and predictive models are beginning to capture this heterogeneity, not least with the recent generation of a new molecular classification consisting of at least ten different breast cancer subtypes [
3‐
6]. Molecular profiling of cancer tissues has aided the development of targeted therapies, improved our understanding of treatment resistance, and helps better predict patient prognosis. This knowledge has allowed personalised breast cancer therapeutic regimens to become an achievable goal.
The cornerstone of molecular profiling has historically been transcriptomics which has transformed our understanding of the complexity of the underlying signalling pathways and interactions within a breast tumour, as well as allowing the identification of gene expression signatures associated with patient outcome [
4,
7]. Consequently, clinical development of transcriptomic profiling tools has dramatically escalated, augmenting standard diagnostic and prognostic information obtained from traditional clinicopathological variables [
8]. The most clinically advanced prognostic gene expression signatures in breast cancer are MammaPrint [
7,
9] and OncotypeDx [
10], which are currently the subject of large-scale prospective randomised control trials to assess their utility for stratification of breast cancer patients [
11‐
13].
Whilst transcriptomic approaches have undoubtedly enabled the acceleration of translational pathology, providing an excellent platform for omic-based discovery [
13,
14], reservations have been raised regarding the clinical applicability of gene expression studies given their prohibitive cost, often reliance on frozen tissue, quality assurance issues and the advanced technical expertise required to utilise the technology [
2]. Crucially, mRNA transcription does not necessarily translate to protein expression, and it is not uncommon to observe a discrepancy between mRNA and protein expression [
15,
16]. As proteins are one of the primary effectors of the cell, protein-based assays may be more clinically relevant as biomarkers in personalised medicine. Effective implementation of personalised cancer therapy depends upon the successful identification and translation of informative biomarkers to aid treatment provision. In a prior review, we described the contribution of antibody-based proteomics for fast-tracking the development of new diagnostic assays that are crucial to achieving personalisation of cancer therapy [
17]. The systematic generation and validation of specific antibodies offers a high-throughput mechanism for the functional exploration of the proteome and a logical approach for fast-tracking the translation of identified biomarkers [
17]. Whilst DNA microarray technology provides an excellent platform for biomarker discovery, it would now appear that IHC and genomic sequencing may play an increasingly important role in the clinical management of breast cancer [
2]. Tissue microarrays (TMAs) are an ideal platform for rapid development of an IHC profile, allowing multiple targets to be systematically assessed, and reduce an assay to clinical utility [
3‐
5,
8,
18‐
23].
In this proof-of-concept study, we used a novel high-throughput system, using affinity-purified, mono-specific antibodies, to translate protein targets from gene expression studies into clinically applicable IHC-based prognostic panels for breast cancer.
Methods
Selection of candidate biomarkers from transcriptomic datasets
Thirty-one genes were selected from an in-house analysis of the van ’t Veer study [
7], using a Between Group Analysis (BGA) method identifying the top 100 good and poor prognosis genes [
24,
25]. From this list, we considered the top 15 genes associated with good prognosis and the top 16 genes associated with poor prognosis. Another 25 genes of interest were selected from a transcriptomic study of ductal carcinoma
in situ (DCIS) to invasive ductal carcinoma (IDC) progression, with a particular focus on transcripts that were up-regulated in the invasive component [
26] (Additional file
1: Table S1).
Patients
The TMAs used in this study were derived from a reference cohort of 512 consecutive invasive breast cancer cases diagnosed at the Department of Pathology, Malmö University Hospital, Malmö, Sweden between 1988 and 1992 and have been previously described [
27‐
29]. The median patient age was 65 years (range 27–96) and median follow-up time regarding disease-specific and overall survival was 11 years (range 0–17). Duplicate cores for each patient were reported as consensus scores. Each patient was assigned a unique identifier that was then linked to an anonymised ethics board-approved database containing follow-up information. Patients with recurrent disease and previous systemic therapies were excluded. Two hundred and sixty-three patients were deceased at the last follow-up date (December 2004), 90 of which were classified as breast cancer-specific deaths. Ethical permission was obtained from the Local Ethics Committee at Lund University (Dnr 613/02), whereby informed consent was deemed not to be required, but opting out was an option.
TMA construction
The TMAs were constructed using a manual tissue arrayer (MTA-1, Beecher Inc., WI, USA). PBK and PDZK1 were screened on a TMA inclusive of all 512 cases from the reference cohort with 0.6 mm duplicate tissue cores extracted from each donor block. ANLN was screened on a second generation TMA inclusive of 498 cases from the reference cohort, with 1.0 mm duplicate tissue cores extracted from each donor block and transferred to the recipient block. The total number of cores per block was limited to ~ 200 (100 patients), with a total of 5 blocks arrayed.
Antibody generation
The Human Protein Atlas (HPA) [
30] use a high-throughput method to generate affinity-purified, mono-specific antibodies raised to all non-redundant human proteins [
31]. Protein epitope sequence tag (PrEST)-specific antibodies represent unique regions of each protein target. Rabbit polyclonal antisera immunised with His
6ABP-PrEST antigens derived from a subset of the 56 targets of interest described above (Additional file
1: Table S1) were purified by a two-step immunoaffinity protocol to obtain pure mono-specific antibodies [
32].
Cell culture
A panel of breast epithelial cell lines were selected to test antibody specificity, including MCF-7, BT474, T47D, SKBR3, MDA-MB-231 and Hs578T cells. The Hs578T (i8) invasive subclone was a kind gift from Dr. Susan McDonnell (School of Chemical & Bioprocess Engineering, University College Dublin, Ireland) and was derived from the parental Hs578T cell line (also denoted as Hs578T(P)) by sequential selection through the BD Matrigel® Invasion Chamber assay system [
33]. All remaining cell lines were purchased from the European Collection of Cell Cultures (Wiltshire, UK). The MCF-7, BT474, T47D, SKBR3, and MDA-MB-231 cell lines were cultured in DMEM supplemented with 10% (w/v) foetal calf serum, 2 mM L-glutamine, 50 IU/ml penicillin, and 50 μg/ml streptomycin sulphate. The Hs578T variants were also supplemented with 10 μg/ml bovine insulin. Cells were maintained in humidified air with 5% CO
2 at 37°C. Studies of protein expression were performed on cells at 70-80% confluence. All cell lines were routinely screened for Mycoplasma contamination.
Western blot analysis
Total protein was extracted from sub-confluent cells by the addition of radioimmunoprecipitation assay buffer (RIPA), followed by centrifugation at 16,000 g for 20 min at 4°C. The supernatants were removed and the protein levels determined using the bicinchoninic acid (BCA) method (Pierce, IL). Samples containing 50 μg aliquots of protein were separated by sodium dodecyl sulfatepolyacrylamide gel electrophoresis (SDS-PAGE), on a 12% polyacrylamide gel under reducing conditions. Following electrophoresis, proteins were transferred to polyvinylidene fluoride membrane. Membranes were blocked in 5% non-fat milk for 1 hr at room temperature. Protein expression was detected using rabbit mono-specific polyclonal anti-human antibodies (HPA, Sweden) applied overnight at 4°C (PDZK1 1:1000 dilution; PBK, ANLN 1:500). Membranes were washed in TBS-T (Tris buffered saline with 0.1% Tween 20) and incubated for 1 hr with horseradish peroxidase (HRP)-conjugated anti-rabbit immunoglobulin (all antibodies: 1:5000 dilution). The blots were again washed in TBS-T. HRP was detected using Enhanced Chemiluminescence plus (Amersham Biosciences, UK). Chemiluminescence was detected by autoradiography using X-ray film. Membranes were stripped and re-probed with anti-β-actin (1:5000 dilution; Abcam, UK) as a loading control.
Cell pellet arrays
In order to validate the Western blotting results in the IHC setting, a cell pellet array was constructed and IHC was performed on the same panel of breast cancer cell lines. Cells were trypsinised and fixed for 1 hr in 10% formalin, centrifuged at 500 x g for 10 minutes, washed twice with PBS and re-suspended in 0.8% agarose. The tumour cell-containing agarose plugs were processed through gradient concentrations of alcohols before being cleared in xylene and washed in molten paraffin. These cell pellets were embedded in paraffin and arrayed in quadruplicate 1.0 mm cores using a manual tissue arrayer (MTA-1, Beecher Inc, WI). IHC was carried out on 5 μm sections.
Immunohistochemical analysis
Sections of cell pellet arrays or TMAs were deparaffinised in xylene and rehydrated in descending gradient alcohols. Heat-mediated antigen retrieval was performed using 10 mM sodium citrate buffer (pH 6.0) in a PT module (LabVision, UK) for 15 min at 95°C. The LabVision IHC kit (LabVision, UK) was used for staining. Endogenous peroxidase activity was blocked by incubation with 3% hydrogen peroxide for 10 min. Sections were blocked for 10 min in UV blocking agent. Rabbit polyclonal anti-human antibodies (HPA, Sweden) were applied at individual optimised dilutions for 1 hr (PDZK1 1:50 dilution; PBK, ANLN 1:150). Sections were washed in phosphate buffered saline with 0.1% Tween 20 (PBS-T). Subsequently, primary antibody enhancer was applied for 20 min, and sections were washed again in PBS-T. Sections were then incubated with HRP polymer for 15 min, washed in PBS-T and then developed for 10 min using diaminobenzidine (DAB) solution (LabVision, UK). After antigen retrieval, all incubations and washing stages were carried out at room temperature. The sections were counterstained in haematoxylin, dehydrated in alcohol and xylene and mounted using an automated coverslipper (Leica, Germany). As a negative control, the primary antibodies were substituted with PBS-T.
Evaluation of immunohistochemical staining
Slides were scanned at 20X magnification using a ScanScope XT slide scanner (Aperio Technologies, CA). Cores with less than 30% tissue present or less than 100 cells were discarded to avoid manual selection bias. Tumour samples were evaluated by at least two independent observers including one pathologist, and the maximum values of the two cores was used. All discordant cases were re-evaluated and a consensus reached between both observers. ANLN expression, as a nuclear marker, was categorised based on percentage nuclear staining such that 0 = ≤1%, 1 = 2-25%, 2 = 26-75% and 3= > 75%. PDZK1 expression, as a cytoplasmic marker, was scored on a semi-quantitative scale depending on intensity of cytoplasmic staining: ranging from 0–3, where 0 is negative, 1 is weakly positive, 2 is medium positive and 3 is strongly positive. The intensity distribution (ID) scoring method was used with the cytoplasmic marker, PBK, which incorporated intensity of the scoring with percentage of cells stained [
34].
Annotation of gene expression data and hybridisation probes
Gene expression data sets were downloaded from the Gene Expression Omnibus [
35] or authors’ websites in the form of raw data files where possible (Additional file
1: Table S2) [
36‐
43]. Relevant gene expression and clinical data was extracted from ten publicly available datasets incorporating approximately 1,300 samples. Where raw data was not available, the normalised data as published by the original study was used. In the case of the Affymetrix datasets (.cel files), gene expression values were called using the robust multichip average method and data were quantile normalised using the Bioconductor package, affy [
44,
45]. For the dual-channel platforms, data were loess normalised using the Bioconductor package limma [
46]. Hybridisation probes were mapped to Entrez gene IDs to gene-centre the data [
47]. The Entrez gene IDs corresponding to the array probes targeting genes of interest were obtained from the Gene database at NCBI [
48] (ANLN:54443, PBK:55872, PDZK1:5174). If there were multiple probes for the same gene, the probes were averaged for that gene. All calculations were carried out in the R statistical environment [
49].
Statistical analysis of transcriptomic meta-analysis data
Gene expression data from ten publicly available datasets were included in a meta-analysis to evaluate the individual prognostic significance of candidate proteins at the transcriptomic level, as previously described (Additional file
1: Table S2) [
36‐
43]. Once a sample was assigned to a particular group, the 10 datasets were combined and a global survival analysis was performed. Each dataset was considered separately when determining which group a sample belonged to, due to the variability across different platforms. Recurrence-free survival (RFS) was considered the survival end point. Median mRNA levels established the cut-off for high and low expression for each biomarker. Survival curves of the dichotomised groups were compared using the log-rank test for significance. The survival curve was based on Kaplan-Meier estimates. Cox regression analysis was used to calculate hazard ratios (HR) and to adjust for all available clinical parameters. Across the meta-analysis, the available clinicopathological parameters were lymph node status, tumour grade and ER status.
Statistical analysis of consecutive cohort data
The χ2 test and Fisher’s exact test were used to evaluate associations between protein expression and clinicopathological variables in the cohort. Pearson’s correlation coefficient was used to evaluate correlation between expression of the three independent markers. Kaplan-Meier analysis and the log-rank test were used to illustrate differences between recurrence-free survival (RFS) or breast cancer-specific survival (BCSS), according to differential protein expression. Cox proportional hazards regression was used to estimate proportional hazards for the individual protein expression and other clinicopathological variables in both univariate and multivariate models. The clinicopathological variables available for the consecutive cohort included tumour size, age at diagnosis, histological type, grade, nodal, ER, PR, Ki67 and Her2 status. All calculations were carried out using IBM SPSS Statistics version 20.0.
Discussion
Gene expression profiling has successfully yielded new insights into the biologic diversity of breast cancer identifying several distinct molecular subtypes (such as luminal A, luminal B, basal and Her2) differing markedly in prognosis and in the repertoire of therapeutic targets they express [
4,
5,
50]. Importantly, these intrinsic subtypes play a key role in prediction of disease recurrence, treatment response, and the provision of new insights into oncogenic pathways and metastatic progression [
51]. It is striking that, in the face of what is considered a heterogeneous tumour, molecular signatures of tumour subtypes consistently emerge across independent cohorts with diverse genetic and environmental backgrounds [
52‐
54]. This reproducibility is a crucial primary descriptor of disease phenotype in the early detection of disease, lending key prognostic and predictive information.
Antibody-based proteomics occupies a pivotal space within the cancer biomarker discovery and validation pipeline, facilitating the high-throughput evaluation of candidate markers [
17]. In this context, IHC-based high-throughput technology has been demonstrated as an effective platform for identification of protein surrogates of these intrinsic breast cancer subtypes by various groups [
23,
53]. For example, a panel of 5 proteins detected by immunohistochemistry was shown to be prognostic for ER-positive breast cancer [
8]. The use of validated IHC surrogates should provide more clinically applicable assays in the future, due to ease of accessibility, low technical demand, cost-effectiveness and applicability to FFPE tissue. Despite these advances, the development of IHC-based assays has been globally impaired by the limited availability of high quality antibodies and lack of rigorous validation of emerging biomarkers. However, the development of comprehensive antibody resources and streamlining of reporting standards, promises to help overcome these obstacles [
31,
55].
In this study, we sought to determine whether insights from gene expression studies relating to breast cancer progression could be translated into a robust prognostic protein model using a discrete set of IHC markers. This proof-of-concept strategy generated a prognostic panel using high-throughput biomarker screening in combination with a devised panel scoring technique. We confirmed that a high panel score was significantly associated with reduced RFS (p < 0.0001; n = 1,038), using a meta-analysis of publicly available breast cancer transcriptomic datasets. The panel was an independent prognostic marker using multivariate Cox regression analysis (p = 0.018, HR = 1.49, 95% CI = 1.080-2.054, n = 699). This strategy revealed a novel 3-marker prognostic model significantly predictive of RFS based on ANLN, PDZK1 and PBK expression patterns.
Next, we validated this signature on a protein-based platform using TMA technology. The 3-protein panel score correlated with known pathological prognostic variables, including tumour grade and lymph node status, ER, Her2 and Ki67 status. Univariate Cox regression analysis of RFS demonstrated that high panel scores, indicative of poor prognosis, were significantly associated with reduced RFS. However, multivariate analysis demonstrated that the 3-marker panel score was not a significant predictor of either BCSS (HR = 6.38; 95% CI = 0.79-51.26, p = 0.082) or RFS (HR = 1.46; 95% CI = 0.66-3.19, p = 0.348), when adjusted for other well-established variables. We noted that the 3-panel score becomes an independent predictor of BCSS (HR = 11.66; 95% CI = 0.1.50-90.68, p = 0.019), when all variables except for PR status are adjusted for. This may be due to marginal associations of our individual markers with these variables (e.g. PDZK1 and ER status: p = 0.041; PDZK1 and PR status: p = 0.074). Since both PDZK1 and PR are surrogate markers for ER activity, we note that the strength of this panel may be skewed by the presence of PDZK1 protein in the panel. Thus, we hypothesise that additional or alternate biochemical markers, interrogated using this high-throughput platform, may further augment the prognostic accuracy of this algorithm to a point that may allow implementation into routine clinical practice.
Interestingly, the 3 proteins that comprise this panel model are associated with distinct pathways in cancer biology. ANLN, initially characterised as a human homologue of anillin, a
Drosophila actin-binding protein, is essential for the organisation of actin cables in the cleavage furrow, and plays a key role in cytokinesis and cell cycle progression [
56‐
59]. ANLN has been demonstrated as a marker of poor prognosis, relating to aggressive cancer phenotypes [
60]. In breast cancer, a transcriptomic study of DCIS to IDC breast cancer progression identified ANLN up-regulation in invasive tumour specimens relative to the pre-invasive phenotype [
26]. Our study confirms the role of ANLN as a marker of poor prognosis, at the protein level, in an independent breast cancer cohort. PBK phosphorylates p38MAPK during mitosis, is considered a marker for cellular proliferation and is also implicated in DNA damaging sensing and repair [
61,
62]. PBK is associated with poorer prognosis in lung cancer [
63], is up-regulated in IDC relative to DCIS at the transcriptomic level [
26], and may be a promising molecular target for treatment of breast cancer [
64]. Our findings further support the role of PBK as a marker of poor prognosis in breast cancer, with expression of PBK also associated with the histological markers of proliferation, Ki67 and tumour grade. PDZK1 is a known estrogen response gene in breast cancer, with proposed roles in signal transduction, cell polarity and ion exchange gating [
65,
66]. An in-house statistical re-analysis of the genes assessed by van’t Veer and colleagues in the development the 70-gene prognostic signature identified PDZK1 as a marker of good prognosis in breast cancer [
24], which we confirmed at the protein level in this study. The present study successfully validates these gene expression findings at the mRNA level, and also translates them at the protein level.
However, further studies are warranted at the
in vitro and
in vivo level, to help further interrogate the functional background of each of these markers in breast cancer progression. It will be necessary to further validate these findings with additional independent cohorts of samples to meet accepted international validation guidelines [
55]. Although the literature is conflicting with regard to the best way to incorporate histopathology, IHC phenotypes, and gene expression data into an accurate classification system, our findings further support the key role of IHC prognostic models for current breast cancer management.
Competing interests
The authors (Mathias Uhlèn, Fredrik Pontèn, Karin Jirström) currently hold a patent for the use of ANLN protein as an endocrine treatment predictive factor in breast cancer (Patent US20110269797).
Authors’ contributions
KJ coordinated the collection of patient tissue and constructed the tissue microarray. FP and MU managed the production of all antibodies. CK, SP and RTD carried out the Western blotting, cell pellet arrays and immunohistochemical analysis. POL, SM, DJB and ER performed the statistical analysis. POL and RTD helped to draft the manuscript. MJD, RZ, AHMc, MRK, KJ and WMG provided critical reading and revision of the manuscript. WMG and KJ conceived of the study, and participated in its design and coordination and helped to draft the manuscript. All authors have read and approved the final manuscript.