Background
Prostate cancer (PCa) is a major health problem, being the most frequent cancer in men and a leading cause of cancer-related death in North America [
1]. Despite an increase in the ability to detect cancer, the clinical behavior of prostate cancer remains hard to predict and ranges from indolent course to aggressive evolution with metastasis and death. Gleason score, tumor stage and pre-operative PSA are the only current prognostic factors available, and are used, alone or in combination, to predict the evolution of this disease [
2,
3]. Nonetheless, none of these factors has enough prognostic ability to identify which patients should be additionally treated, given their individual risk of biochemical recurrence or death after radical prostatectomy.
In order to avoid over- or under-treatment, the ability of several histologic measurements (including histological type, androgen receptor status [
4], expression of specific cancer markers such as p53, Ras oncogene, and BCL2 [
5‐
7]) to improve the prediction of individual prognosis have been explored. The expression of the Ki-67 antigen, a nuclear protein expressed during the G1, S, G2 and M phases of the cell cycle but not during the resting phase (G0) [
8,
9], has emerged as the preferred marker of tumor proliferation in different tissue types [
10‐
13], offering interesting predictive value in breast cancer [
14] and neuroendocrine digestive tumors [
15]. Clinical studies have also shown the ability of this marker to distinguish aggressive from non-aggressive PCa in cohorts with different treatment modalities [
16‐
26]. However, the integration of this analysis in the current predictive arsenal to determine the appropriate treatment strategy to adopt given the risk of progression remains a challenge, mainly due to issues related to the reproducibility of measurement methods and application of a threshold value.
Manual counting of a tumor proliferation index is technically long and tedious and also subject to intra- and inter-observer variability. In this context, digital image analysis (DIA) offers a potential adjunct to conventional visual scoring with the promise to increase reproducibility, accuracy and rapidity of the quantification process. Indeed, increasing reports of DIA reproducing visual scoring at an acceptable level are found [
27,
28]. However, it is unknown whether one method is superior to the other to assess Ki-67 index, even in tumors with a larger hindsight on DIA use such as breast cancer [
29]. Such method still needs to be validated against traditional visual assessment for a broader range of patients and tissues.
Our objective was to compare the Ki-67 measurements obtained with the DIA and the visual scoring methods. To achieve this objective, we first compared the measurement distributions of the two scoring methods and then compared their ability to predict clinically useful endpoints (biochemical failure and death by prostate cancer) over time. We took advantage of a well-constructed and characterized cohort of PCa patients who underwent radical prostatectomy and benefited from extended follow-up.
Methods
Patients and data collection
Patients included in this study were selected randomly from a cohort of Caucasian men who were treated by radical prostatectomy (RP) for localized prostate cancer at L’Hôtel-Dieu de Québec (Québec city, Canada) between 1990 and 2002. Men were included if they had a prostate adenocarcinoma of any stage treated with RP. Patients who received neo-adjuvant androgen deprivation therapy were excluded. All men provided informed written consent to participate in this study. The Institutional Ethical Research Committee approved the protocol.
The patient’s medical files were reviewed to collect clinical characteristics (age, stage, histology, Gleason score, preoperative and follow-up serum PSA levels, status at last follow-up). Clinical follow-up data were regularly updated until June 2012 to document the occurrence of progression or death after surgery. Clinical follow-up was performed at least every year and included PSA testing and physical examination. PSA recurrence was defined as two consecutive PSA values of at least 0.3 ng/mL, or one PSA value of at least 0.3 ng/mL followed by androgen deprivation therapy or radiation therapy, or PSA value less than 0.3 ng/mL but with an androgen-deprivation therapy. Biochemical recurrence (BCR) after radical prostatectomy was defined as the period of time elapsed between the prostatectomy and the first date of meeting the criteria of PSA recurrence. Prostate cancer-specific mortality was defined as the interval of time between the prostatectomy and death from prostate cancer. For the purpose of statistical analysis, Gleason scores were dichotomized (<7 vs ≥ 7) to group patients with intermediate and less favorable prognostic. Pathological stages were also grouped (pT2 vs pT3) to compare between organ-confined versus extra-prostatic disease.
TMA construction
For the preparation of tissue microarrays (TMA), one to three paraffin blocks from each tumor was selected. Six representative 0,6 mm tumor cores were taken and placed 0,4 mm apart on a recipient paraffin block along with appropriate alignment and staining controls using a tissue arrayer (Beecher Instruments, Sun Prairie, WI, USA). All primary tumor slices were examined and graded by a pathologist (BT). Spots were selected from haematoxylin and eosin slides to represent the proportion of each Gleason patterns amongst dominant nodules and include a tertiary pattern when present. Gleason scores were also evaluated on TMA cores to ensure the representativeness of the sample with regard to the prostatectomy specimen.
Immunohistochemistry and slide digitization
Sections five micron-thick were cut from the tissue array blocks to perform immunohistochemistry. Sections were deparafinized and heat-induced antigen retrieval was performed by microwave pre-treatment in citrate (0.01 M, pH 6.0). Endogenous peroxidase was blocked by pre-incubation with 3 % hydrogen peroxide in phosphate-buffered saline for 10 minutes. Sections were then incubated with a monoclonal mouse anti-human Ki-67 antibody (Clone MIB-1; Dako, Carpinteria, CA, USA) at room temperature for 1 h at a 1/400 dilution. Chromogenic detection was carried out using a peroxidase-conjugated secondary antibody and DAB reagents provided with the IDetect Super Stain System HRP (ID Labs, London, ON, Canada), and slides were counter-stained with Harris’ hematoxylin. Positive controls were slides of lympho-epithelial tissue. Phosphate buffered saline was used instead of primary antibody in negative controls.
Digital images of IHC-stained TMA slides were obtained at 20× magnification using a slide scanner (NanoZoomer 2.0-HT; Hamamatsu, Bridgewater, NJ, USA) and visualized with the software ndpi.viewer (ver1.2.25; Olympus). These images were used both for the visual scoring and the digital image analysis.
Visual scoring
Ki-67 nuclear positivity was evaluated on digital images by two independent observers (PD, HH) blinded to the clinical data. The total number of Ki-67 positive tumor nuclei was counted in malignant cells on each individual TMA spot, regardless of the intensity of immunostaining or the Gleason score. This count was reported on the total number of malignant cells, calculated by using a 100-cell template moved over the whole tissue core. A TMA spot was rejected when neoplastic glands covered less than 30 % of the tissue. Moreover, a minimum number of 500 tumor cells per patient was require to keep the sample for analysis. The mean percentage of Ki-67 positive cells for each patient was calculated and used in further analysis as the Ki-67 labeling index.
Digital image analysis
Automated-IHC measurements were performed using Calopix software (developed by TRIBVN [Chatillon, France] and distributed by Agfa Healthcare [Toronto, ON, Canada]). For the purpose of tissue recognition and segmentation, the Ilastik 5.0 Interactive Learning and Segmentation Toolkit within Calopix was used to create a tissue mask and conserve only the malignant epithelial component for analysis (see Additional file
1: Figure S1). Then, the Calopix «immuno-object» software was applied to each segmentation result. The algorithm used allows recognition of individual nuclei («objects») by isolating brown (DAB stained) and blue (hematoxilin counter-stained) nuclei, and report their numbers. Within the algorithm, immunostaining is divided in intensity categories (0 = none, 1 = weak, 2 = moderate, 3 = strong), for which thresholds are set on visual appreciation. Segmentation and quantification algorithms were performed at 10× resolution, and all the results were reviewed visually. When the tissue segmentation was judged unsatisfactory (poor isolation of tumor glands), a gross manual segmentation was done before re-launching both Ilastik and immuno-object algorithms successively. For each tissue core, the total number of objects detected (i.e. tumor cell nuclei), the percent of immunostained objects in each of the four staining intensity categories and the global percentage of immunostained objects were computed. Those data were used to calculate a mean H-score for each patient [
30]. This compounded score is obtained when the percentage of immunostained objects in each intensity class is multiplied by the class’ category value (0, 1, 2 or 3). The results are then added, with a maximum value of 300.
Statistical analysis
First, Mann–Whitney-Wilcoxon tests were used to compare the distribution of the mean number of Ki-67 positive cells per patient obtained with the visual and automated methods. Second, Cox regression models were used to estimate the effect of the Ki-67 labeling index on the BCR or DPCa after RP according to the measurements obtained with the two methods. PSA, Gleason score and age were used as the adjustment covariates in multivariable analyses. In addition, we created a high-risk predictor by identifying patients who had at least one high-risk feature: high pathological stage (pT3), nodal involvement and presence of positive margins (pT3N + M+). Third, we built ROC curves, assessed the AUC at the 12.5th year after the RP and estimated the C-index over 12.5 years of follow-up after the RP. The C-index is used to assess the prediction accuracy of three measures of Ki-67 over time on the occurrence of BCR and DPCa. The choice of this date (12.5 years) is justified by the fact that after this time, the prediction accuracy is confounded by naturally occurring death from other causes. The statistical analyses were conducted using R-software (Version R.3.0.2; Vienna, Austria) using a two-sided alpha value of 0.05 to declare statistical significance.
Discussion
Uncontrolled proliferation is a hallmark of malignancy and the measurement of Ki-67 antigen by immunohistochemistry is the most widely performed assessment of the proliferative potential of tumors. Several studies are supporting a role for this marker in the prediction of clinically significant outcomes in prostate cancer, such as biochemical recurrence and death, but it is not routinely used in PCa prognostic models. As with other histopathological factors, clinical adoption is challenged by the weak reproducibility in measurement methods, poor standardization of assays and the need for clinical agreement on an appropriate measurement method, as noted by a group evaluating Ki-67 measurement in breast cancer [
29]. Variability between pathologists’ assessments of Ki-67 index was reported to be high, reflecting the differences inherent to subjective counting in terms of positivity threshold interpretation, field selection as well as other factors [
31]. In the absence of harmonized methodology, the integration of Ki-67 in a prognostic composite index is very difficult.
In this TMA study of PCa patients who underwent RP, we measured the expression level of Ki-67 in 225 patients by both DIA and visual scoring. With the DIA method we used, the mean labeling index (2.05 %) was similar to the one measured by visual scoring (2.23 %). Nonetheless, since similarities in the labeling index measurements between the two methods does not guarantee similar associations with clinical outcomes, we compared them based on time-dependent AUC (C-index) and show that both methods performed similarly to predict BCR and DPCa. Not only was the DIA able to reproduce the relationship with clinico-pathological outcomes obtained by visual scoring of Ki-67 index, it also had a similar pattern of prediction. Indeed, the C-index indicates that both methods allow one to discriminate between those who will die from their prostate cancer from those who won’t, even when we take into account the heterogenetity of the population, with an error rate as low as 19 %. Therefore, our results suggest that DIA might be a reliable method to assess proliferation index in patients affected with high risk PCa.
We observed a weak independent predictive role of Ki-67 positivity on BCR. These results are consistent with prior studies that have shown Ki-67 labeling index to be associated with BCR in prostate cancer patients treated with radical prostatectomy [
23,
32‐
34], as recently highlighted in a large multicenter study [
35]. The association with BCR was also observed using Ki-67 integrated in a three-marker composite index of proliferation [
36], although it could be restricted to tumors with ERG- status [
37]. One strength of the current study is that this association was observed for both a dichotomous (≤1.7 % or >1.7 %) as well as for a continuous measurement of Ki-67. The low cut-off point obtained could be related to TMA sampling or the sharp tissue segmentation we aimed at and the fact that we performed an exhaustive count of tumor cells rather than a «hot spot» approach, which might be factors preventing the overestimation of the proliferative index. We are confident that the data generated are valid because they were confirmed by two modalities of quantification (visual scoring and DIA).
In our cohort, Ki-67 was a much more potent predictor of DPCa than of BCR. When looking only at cohorts of patients undergoing radical prostatectomy, the association between Ki-67 and DPCa has been shown in some [
16,
22] but not all [
38] studies. Many studies considered only biochemical failure and did not assess DPCa, while others included patient populations with heterogeneous treatments or measured proliferative index in nodal metastases instead of primary tumors [
39‐
41]. This is in contrast with the strong relationship between death and Ki-67 proliferative index observed in radiotherapy or watchful waiting cohorts (reviewed in Fisher 2013 [
42]). Our own study confirmed that Ki-67 expression is positively associated with DPCa in a cohort of closely followed RP patients. Interestingly, when we considered the association between Ki-67 and death from other causes than prostate cancer, we found a negative association in multivariate analysis (HR 0.26, 95 % CI 0.13-0.77; p = 0.04). This observation suggests that Ki-67’s role in predicting death from prostate cancer might be very specific.
There are a growing number of reports of automated Ki-67 quantification data generated with software algorithms and of agreement with visual counts in several tissue and tumor types [
43‐
46]. When different techniques of measurement of Ki-67 were compared in breast cancer, DIA was found to be even prognostically stronger than visual counting [
31]. The use of DIA technique to quantify Ki-67 was also used successfully in PCa [
24‐
26,
35,
47], but few studies have directly compared the two modalities. In one radiation therapy cohort measuring the proliferative index on prostate biopsies samples, there was a stronger correlation between clinical outcomes and visual scoring than with DIA [
48]. This contrast with our results might reflect differences in sampling (biopsy vs TMAs), analysis (dichotomous vs continuous) and DIA methods. Our study also shows that the assessment of Ki-67 as a continuous variable is better than as a dichotomous variable because the HR estimates are better (the variance is smaller than in the dichotomous model), supporting the notion that capacity of acquiring continuous data is one advantage of the DIA technology. Finally, the algorithm used for our DIA measurements provided direct nuclei count with fractionation of staining in intensity classes. It allowed us to assess if a weighted H-score would perform better than raw percentage. Our results suggest that subtle intensity variations in Ki-67 staining, as detected with DIA and transposed into a H-score, adds no additional clinically useful information in prostate cancer.
The DIA system we used was successfully applied in morphologically complex tumors such as ovarian carcinomas [
49], but resulted in high number of manual adjustments for tissue segmentation in our cohort of prostate carcinoma. This result reflects the subtleties in morphologic differences between malignant and benign glands in prostate cancer, as we aimed to perform sharp neoplastic gland segmentation, and the difficulty of DIA modalities to discriminate them without counter verification. It also underscores that an accurate assessment of tumor cells is mandatory when faced with a tumor with a low mean value of proliferative cells, since small variations may have a larger impact on data interpretation. Present estimations of the time needed for a pathologist to make a careful objective count is approximately 20 minutes, similar to the estimated time needed to perform DIA [
31], especially if manual correction of tissue segmentation is required. It is reasonable to expect that with experience and availability of new softwares, quantitative DIA will be easier to apply and will lead to more constant and reproducible results, as the machine is not subject to fatigue nor to inter and intra observer variability. In addition, it remains that pre-analytical aspects inherent to tissue processing and immunohistochemistry techniques are also critical to optimize the performance of DIA [
29].
There are potential limitations to our study that merit consideration. First, because of the relatively small number of patients, we could not perform a validation step. Therefore, the results obtained with our cohort of 225 patients will need to be validated in an independent cohort of patients. Second, the relatively high rate of manual adjustments of segmentation with the DIA technique certainly contributed to increase the precision of our findings but increased the technical burden of our approach. More methodological work is justified and needed to facilitate the application of our approach to the clinic. We also observed a low level of DPCa in our cohort. In order to compensate for this, an analysis taking the occurrence of positive margins and nodal involvement (high risk features) as a surrogate marker of dismal evolution was performed and reproduced our results (multivariate model; HR 3.49, 95 % CI 1.19-10.23; see Table
2). In addition, most of the patients in our cohort had a high stage (T3) tumor, and our results should be validated in a cohort with a more evenly distributed pathological stage to ensure that our findings are generalizable to lower stage prostate cancer patients. However, 31 % of such low-stage patients were included in our cohort and our analysis account for the stage effect, rendering confounding by stage unlikely. Finally, the use of TMAs rather than whole tissue sections might limit the transposition of our results to prostatectomy specimens. Tissue microarrays offer a tissue sampling independent of biomarker’s expression, thus reducing the opportunity for selection bias (e.g. of counting areas) and allows for a reduction in the number of immunohistochemistry runs, conferring more homogenous conditions for marker quantification. For Ki-67 measurements in PCa, specifically, it was suggested that as few as three TMA spots were sufficient to predict PSA failure [
50]. However, since prostatectomy whole tissue sections are more convenient for direct clinical use, a validation of our results with such specimens should be done.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Competing interest
The authors declare that they have no competing interests.
Authors’ contributions
PD participated in the design of the study, the acquisition of the data, the interpretation of the data and was involved in drafting the manuscript. HH participated in the acquisition of the data and was involved in reviewing the manuscript. MN-M was involved in data analysis, interpretation of the data and drafting the manuscript. CL was involved in the interpretation of the data and in drafting the manuscript. AC was involved in data analysis and interpretation of the data. LL was involved in study design, data interpretation and in reviewing the manuscript. YF was involved in data interpretation and in reviewing the manuscript. VF was involved in the study conception and design, in the interpretation of the data and in reviewing the manuscript. All authors read and approved the final manuscript.