Background
Despite endorsements by several international guidelines [
1,
2] KI67 is yet to gain widespread application as a prognostic and/or predictive marker in breast cancer [
3]. This is due, largely, to methodological variability in KI67 scoring (such as antibody type, specimen type, type of fixative, antigen retrieval methods, method of scoring, etc.), and limitations in the design and analyses of studies that have reported on this marker [
3‐
7].
In the majority of settings, KI67 is evaluated visually by a pathologist even though there is yet to be consensus regarding which regions to score between the invasive edge, hot spots or the entire spectrum of the whole section or tumour core [
8]. As a result, both the intra-observer and, especially, the inter-observer reproducibility of visually derived KI67 scores have been shown to be poor [
9‐
11]. This has not only hampered inter-study comparability for KI67, but has fuelled concerns regarding its analytical validity [
3]. To address some of the methodological issues related to KI67 assessment, the International KI67 in Breast Cancer Working Group published recommendations aimed at the standardisation of the analytical processes for KI67 evaluation [
8]. This panel, however, fell short of making recommendations regarding the preferred method of scoring for KI67 between visual and automated. Several reports suggest that automated methods could address some of the problems associated with visual scoring [
11‐
19]. These methods are high throughput and are not limited by intra-observer variability. However, concerns exist regarding the accuracy of automated methods and the prognostic power of KI67 derived using these methods relative to that derived visually by pathologists. Few relatively small studies have reported a head-to-head comparison between scores derived using both methods in terms of prognostic properties, and the results from these are conflicting [
11,
17‐
19].
The majority opinion regarding the prognostic property of KI67 derives mostly from reviews and meta-analyses, which support its prognostic role in breast cancer [
4‐
7,
20]. The meta-analyses by de Azambuja et al. [
6] involving 12,155 patients and by Stuart-Harris et al. [
7] which included over 15,000 patients represent two comprehensive analyses on this subject. These are limited, however, by reported evidence of publication bias, by significant between-study heterogeneity and by the fact that most of the included studies utilised different methodological approaches for KI67 evaluation. Furthermore, while the analysis by de Azambuja et al. [
6] was limited by its inclusion of only univariate hazard ratios, that by Harris et al. [
7] was limited by the small intersection between the sets of covariates in the included studies. In a population-based cohort of a cancer registry, Inwald et al. [
21] examined the prognostic role of KI67 in 3658 patients for whom KI67 was routinely measured in clinical practice and reported significant associations between KI67 and overall survival [
21]. An important strength of this analysis was that it utilised routinely assessed KI67 measurements in a clinical setting. But this was also limited by the heterogeneity of the KI67 analytical processes in the different laboratories involved in the study. Nonetheless, KI67 has found use in a variety of clinical and epidemiological scenarios, including its endorsement by a number of international guidelines for use in treatment decision-making in ER-positive breast cancer [
1,
2] and its incorporation as part of emerging prognostic tools such as the IHC4 score [
22,
23] and PREDICT, a breast cancer treatment benefit tool [
24].
In this study, we evaluate the value and robustness of automated scoring of KI67 for large-scale, multicentre studies of breast cancer prognostication. We centrally generated an automated KI67 score from stained tissue microarrays (TMAs), and assessed its prognostic value overall for different subtypes of breast cancer. We also compared the prognostic performance of automated and visually derived KI67 scores in a subset of patients.
Discussion
Findings from our analysis provide strong evidence in support of a prognostic relationship for automated KI67 scoring in ER-positive (node-negative and node-positive) patients that is independent of tumour grade and other prognostic factors. Even though our data suggested a larger magnitude of the association between KI67 and survival among the node-negative patients, the difference between node-positive and node-negative was not statistically significant. Involving over 8000 patients from multiple centres internationally, this represents the largest study that has evaluated the prognostic value of automated KI67 scoring in breast cancer to date. Furthermore, the large sample size allowed us to evaluate its prognostic value in a number of breast cancer subtypes including ER+ (node-negative and node-positive), ER–, ER+ and/or PR+ (HER2+ or HER2–), ER–/PR– and HER2+ (i.e. HER2-enriched) and triple-negative breast cancers.
Our findings suggest that automated KI67 scoring is an analytically valid approach to generating KI67 scores. This is particularly noteworthy given the growing need to incorporate measures of KI67 in prognostic tools such as the IHC4 score and PREDICT [
23,
24]. These tools are relatively cheap, readily available and utilise routinely measured IHC markers and, in the case of PREDICT, other routinely available patient data to provide information that can help clinicians and patients make informed decisions regarding the course of treatment. It is acknowledged that prognostication in breast cancer is becoming increasingly more sophisticated and that a number of multigene assays [
28,
29] have been validated for this purpose; however, their costs and proprietary concerns limit their use in a large number of settings. Moreover, findings from previous studies suggest that some multigene assays may not perform better than routinely measured IHC markers. For instance, Cuzick et al. [
23] reported similar prognostic properties for the Genomic Health recurrence score (GHI-RS, Oncotype DX), a 21-gene panel test, and the IHC4 score in their analysis of 1125 women from the TransATAC study, and notably KI67 was assessed by image analysis in that study [
23]. Nonetheless, the relative performance of visual and automated KI67 scores in relation to the IHC4 score or PREDICT can only be assessed in studies that are specifically designed for that purpose.
In addition to lack of analytical validity, the prognostic performance of KI67 has also been questioned due to the design and analysis of studies that have reported previously on this protein [
3]. Our evaluation is a large-scale, multicentre analysis which has adopted the recommended laboratory processes for the staining and scoring of KI67 [
8]. All TMAs in our analysis were stained using the MIB1 antibody (even though not all of them were centrally stained in our centre) and scored using a single automated algorithm. Our estimates of ~2-fold and ~1.5-fold increased risk of mortality at baseline for high versus low KI67 in univariate and multivariate analyses, respectively, are similar to those reported by de Azambuja et al. (HR = 1.95) and Harris et al. (HR = 1.42) [
6,
7] in their univariate and multivariate meta-analyses, respectively. Stratification of our analysis according to other IHC markers (in addition to ER) showed automated KI67 to be prognostic in hormone receptor-positive cancers. These findings, together with our observation of the prognostic value of KI67 in both node-negative and node-positive ER-positive patients, support the decision by the St Gallen International Expert Consensus to endorse KI67 for treatment decision-making in ER-positive early (1–3 axillary nodes) breast cancer patients [
1]. We also observed modest evidence in support of poorer survival outcomes among high, relative to low, KI67 expressing triple-negative subtypes of breast cancer. This finding is in support of a previous report by Keam et al. [
30]. Our population of triple-negative breast cancers (
N = 1001), however, was 9.5 times larger than that of Keam et al. (
N = 105).
Comparative analysis of visual and automated KI67 scores showed a stronger survival association for the visual over the automated scores; however, differences were generally modest. Given the advantages of automated versus visual scoring in terms of its potential for standardisation, reproducibility and throughput, automated methods appear to be promising alternatives to visual scoring for KI67 assessment. A potential limitation to the adoption of automated KI67 scoring in the clinical setting is that misclassification of positive nuclei as negative or malignant nuclei as benign could lead to attenuation of prognostic associations, an observation that has been reported previously for ER and PR [
31] and one which we have also observed for KI67 in this analysis. This can be mitigated, however, by stringent quality control processes or by the adoption of a synergistic approach that combines the benefits of both the automated and visual scoring methods. One such approach is the CAV scoring method which we developed for the visual counting of negative and positive malignant nuclei. This approach, a variation of which has been reported previously [
15], exploits the advantages of both visual and digital imaging tools by enabling the visual counting of KI67-positive cells in well-defined areas of a tumour within a computer microenvironment. This method is limited, however, by the observation that it is time consuming; as such, it may not be efficient if adopted for the large-scale scoring of KI67 in epidemiological studies, clinical trials or biomarker discovery studies. Nonetheless, efforts are currently underway to standardise the methods for the visual scoring of KI67 in core-cuts.
We centrally generated KI67 scores on TMAs and determined a threshold of 12 % positive cells of prognostic relevance in our study population. However, due to possible variations in the distribution of KI67 scores according to specimen type and among different laboratories, this cut-off point may not apply to other types of clinical samples or to other laboratories. As a result, pending international standardisation of the KI67 analytical processes, setting local laboratory-specific cut-off points as recommended by international guidelines [
1] remains a pragmatic approach to determining ‘high’ and ‘low’ KI67. Furthermore, although our automated cut-off point of 12 % positive cells was determined to correspond to a visual score of 25 %, this may be related, at least in part, to the fact that automated systems generally count more cells than the visual evaluator, a reason that has been proposed to explain differences in KI67 scores between visual and automated scoring and different automated scoring approaches [
26]. Nonetheless, findings from a recent meta-analysis that assessed the prognostic value of different cut-off levels of KI67 suggest that a visual cut-off point >25 % provides greater discrimination in mortality risk than other cut-off points [
32].
Some limitations of our analysis include the lack of data on specific chemotherapeutic or endocrine agents received by each patient, as a result of which we were unable to account for the impact of a specific treatment regimen on survival or to examine whether or not KI67 is predictive of response to specific chemotherapeutic and/or endocrine agents. We were, however, able to account for whether or not patients received adjuvant systemic treatment in all our analyses because more than two-thirds of the patients had information on treatment. This also allowed us to perform stratified analysis according to whether or not chemotherapy was administered. Also, we did not have data on disease-free survival which may have been a more informative end point than BCSS in early breast cancer. Our assessment of KI67 on TMAs may mean that direct inference cannot be drawn from our findings on other types of clinical samples, especially whole sections [
8]. This is because KI67 scores are speculated to be lower for TMAs than for whole sections and not many studies have assessed the correlation between KI67 scores on TMAs and those on whole sections. However, one such study by Kobierzycki et al. [
33] involving 51 archival paraffin blocks of invasive ductal carcinoma showed excellent correlation (
r = 0.91) between the TMAs and whole sections. Their paper utilised three 0.6 mm core punches, however, and this may explain the high correlation between KI67 scores on TMAs and whole sections that was observed in that study. Nonetheless, the fact that more than half (4431/55 %) of the patients in our analysis had KI67 scores on two or more cores, with 83 % of these showing concordant KI67 status, should limit the impact of intra-tumour heterogeneity of KI67 scores on our findings.
Acknowledgements
The authors acknowledge support from Will Howat and Leigh-Anne McDuffus of the Cancer Research UK Cambridge Institute, University of Cambridge and from Lila Zabaglo of the Academic Biochemistry Laboratory of the Institute of Cancer Research, London during development of the automated scoring protocol. The authors wish to thank Heather Thorne, Eveline Niedermayr, all of the kConFab research nurses and staff, the heads and staff of the Family Cancer Clinics, and the Clinical Follow Up Study (which has received funding from the NHMRC, the National Breast Cancer Foundation, Cancer Australia, and the National Institute of Health (USA)) for their contributions to this resource, and the many families who contribute to kConFab. kConFab is supported by a grant from the National Breast Cancer Foundation, and previously by the National Health and Medical Research Council (NHMRC), the Queensland Cancer Fund, the Cancer Councils of New South Wales, Victoria, Tasmania and South Australia, and the Cancer Foundation of Western Australia.
Funding
The ABCS study was supported by the Dutch Cancer Society (grants NKI 2007–3839; 2009 4363); BBMRI-NL, which is a Research Infrastructure financed by the Dutch government (NWO 184.021.007); and the Dutch National Genomics Initiative.
The ESTHER study was supported by a grant from the Baden Württemberg Ministry of Science, Research and Arts. Additional cases were recruited in the context of the VERDI study, which was supported by a grant from the German Cancer Aid (Deutsche Krebshilfe).
The KBCP study was financially supported by the special Government Funding (EVO) of Kuopio University Hospital grants, Cancer Fund of North Savo, the Finnish Cancer Organizations, the Academy of Finland and by the strategic funding of the University of Eastern Finland.
The MARIE study was supported by the Deutsche Krebshilfe e.V. (70-2892-BR I, 106332, 108253, 108419), the Hamburg Cancer Society, the German Cancer Research Center (DKFZ) and the Federal Ministry of Education and Research (BMBF) Germany (01KH0402).
The MCBCS was supported by an NIH Specialized Program of Research Excellence (SPORE) in Breast Cancer (CA116201), the Breast Cancer Research Foundation, the Mayo Clinic Breast Cancer Registry and a generous gift from the David F. and Margaret T. Grohne Family Foundation and the Ting Tsung and Wei Fong Chao Foundation.
The ORIGO authors thank E. Krol-Warmerdam, and J. Blom; the contributing studies were funded by grants from the Dutch Cancer Society (UL1997-1505) and the Biobanking and Biomolecular Resources Research Infrastructure (BBMRI-NL CP16).
The PBCS study was funded by Intramural Research Funds of the National Cancer Institute, Department of Health and Human Services, USA.
The RBCS study was funded by the Dutch Cancer Society (DDHK 2004–3124, DDHK 2009–4318).
The SEARCH study is funded by a programme grant from Cancer Research UK (C490/A10124. C490/A16561) and supported by the UK National Institute for Health Research Biomedical Research Centre at the University of Cambridge. Part of this work was supported by the European Community’s Seventh Framework Programme under grant agreement number 223175 (grant number HEALTH-F2-2009223175) (COGS). The authors acknowledge funds from Breakthrough Breast Cancer, UK, in support of MG-C at the time this work was carried out and funds from the Cancer Research, UK, in support of MA at the Institute of Cancer Research, London.