Background
Precision medicine has been the focus of breast cancer research during recent decades. As breast cancers are detected at an earlier stage, and treatment has improved, the emphasis to avoid over treatment in addition to under-treatment has increased [
1]. Currently, the majority of primary breast cancers are treated with breast-conserving surgery (BCS), and the patient is generally offered adjuvant treatment. Prognostic and treatment-predictive biomarkers based on traditional immunohistochemical analysis (IHC), or more modern molecular techniques such as gene expression profiling, are presently used to guide the use of adjuvant endocrine therapy, chemotherapy and anti-human epidermal growth factor receptor 2 (HER2)-directed therapy [
2]. However, there is no diagnostic procedure to guide treatment with adjuvant radiotherapy (RT) after BCS, which is administered to a majority of patients. This is despite the knowledge that most patients who undergo BCS will remain recurrence-free without RT for at least 10 years, and around 20% will suffer a recurrence within 10 years despite RT [
3]. Traditional clinicopathologic variables and IHC markers have been unable to identify patients that could be spared RT [
3‐
5], although studies are ongoing to find patients with risk of recurrence low enough to avoid RT (e.g. the LUMINA study, NCT02653755, and the PRIMETIME study [
6]).
Several attempts have been made to create gene expression-based classifiers to predict response to RT after BCS, or to estimate the risk of recurrence with or without RT [
7‐
11]. Most recently, Speers et al. presented the radiosensitivity signature (RSS), a 51-gene random forest model to classify tumors as radioresistant or radiosensitive [
12]. Tramm et al. presented a 4-gene classifier predicting the response to RT after mastectomy [
13]. Torres-Roca et al. presented the radiosensitivity index (RSI), a linear model based on the rank of genes in individual samples, which has been validated in several cancer types, including breast cancer [
8]. The same authors have also advanced the model by combining RSI with the linear-quadratic model for the genomic-adjusted radiation dose (GARD) [
14]. In addition, genome instability is considered to sensitize cancer cells to treatment in general, and a centromere and kinetochore gene expression score was suggested to predict response to RT [
15]. Taken together, promising results have been presented, but no profile or marker is yet in clinical use.
There are several reasons why gene expression profiles have not been introduced in clinical routine. First, the clinical value and cost-effectiveness has not been proven, as reported profiles lack extensive independent validation, and to date, no prospective trial or studies from existing randomized clinical trials have been presented, except in the mastectomy setting [
13]. Second, few of the current profiles have been tested on technical platforms able to handle samples with low-quality RNA, such as RNA extracted from formalin-fixed paraffin-embedded (FFPE) tissue, which would greatly improve the clinical utility. Third, it has been hard to validate profiles across platforms, although attempts have been made by e.g. scaling (RSS) or rank-based models (RSI). Finally, breast cancer is a heterogeneous disease, and the response to RT and the pathways associated with radioresistance may be different in different subgroups. Indeed, this was shown when Torres-Roca et al. presented the follow-up study of RSI in estrogen receptor positive (ER+) and estrogen receptor negative (ER-) breast cancer, and only could validate previous findings in ER- tumors [
16]. Interestingly, RSI was recently further shown to correlate with immune response genes, which may partly explain the subgroup-specific performance, as the immune response is more important for prognosis in ER- breast cancer [
17,
18].
In this study, we aimed to address these issues and created a targeted radiosensitivity gene expression assay using the Nanostring nCounter platform, which is suitable for low quality RNA samples. Based on the targeted assay, we created single-sample predictors (SSPs) using a k-top scoring pairs (k-TSP) algorithm [
19]. The SSPs were validated to be prognostic for ipsilateral breast tumor recurrence (IBTR) in samples of low RNA quality from a study cohort, and further validated in public data. The SSPs also showed potential to stratify patients for RT. In addition, the panel included the genes described for RSS and a surrogate score for RSI (referred to as the 10-gene signature, 10-GS). The previously reported signatures were prognostic for IBTR, and partially predictive of RT, but their performance was dependent on ER status. Finally, we showed that the biology behind the different models and predictors may explain this difference.
Discussion
In this study, we developed and validated single-sample predictors (SSPs) that were prognostic for IBTR using a targeted gene expression panel applicable to samples of lower RNA quality. We presented a conceptual idea of applying the SSPs to stratify patients into treatment groups with promising potential. Two previously published radiosensitivity signatures [
8,
12] were also tested in our data, and their performance was found to be ER status dependent, which may be explained by the biology behind the different models.
The treatment of primary breast cancer is highly individualized, and tests are available to guide the use of adjuvant endocrine therapy, chemotherapy and anti-HER2 treatment [
37,
38]. However, no test is available to guide the use of adjuvant RT, which remains an urgent unmet clinical need. Several attempts have been made towards this aim, but no test has been introduced in clinical use. The reasons are mainly due to lack of follow-up studies and validation, the inability to handle samples of lower RNA quality, which is typical under clinical conditions with FFPE samples, and the models being cohort dependent. We here present a novel approach that aims to overcome these problems, and move individualized RT closer to clinical use. First, we build on previous biological knowledge by including genes that have been previously described in the literature to be associated with radioresistance, in addition to our newly discovered set of genes. Our final SSP models consist of genes from these different sources, and are highly prognostic for IBTR, both in our validation data and in independent public data. In addition, the targeted assay includes genes from two previously described radiosensitivity signatures, giving us an opportunity to validate a surrogate score for these two profiles, which indeed validated our data for prognostication in certain subgroups. Importantly, the 10-GS is also treatment predictive for RT. Second, most clinical samples are handled and stored as FFPE tissue, and an assay able to process RNA extracted from FFPE samples would greatly facilitate its use in the clinical routine. Here, we have used the Nanostring nCounter platform for our targeted assay, which has shown good performance in FFPE samples and is FDA approved for such use with the ProSigna assay [
39], and we validated our targeted radiosensitivity panel in samples of lower RNA quality. Although not yet directly tested in FFPE samples, our samples of lower RNA quality are similar to RNA extracted from FFPE samples in terms of the RNA integrity number (RIN) value and fragment length (data not shown). Third, we used a machine learning algorithm, (k-TSP), which relies only on the relative expression of genes within a sample, which should in theory make it both platform and cohort independent. Indeed, we validated the SSPs in data from samples that were partly degraded and in fresh-frozen tumor cohorts, without any scaling or other measure to make the data comparable.
Further, the aim of a radiosensitivity predictor in early breast cancer is to stratify patients and offer treatment only to patients in whom RT had a clinically significant effect. However, patients that do not benefit from RT after BCS may either be those that have the least aggressive tumors, and remain recurrence-free even without RT (requiring de-escalation of treatment), or those with the most aggressive and radioresistant tumors (requiring escalation of treatment). This may complicate the analysis, since those two groups of tumors most likely are not similar in their transcriptomic profiles. The strength of this study is therefore that we developed classifiers that incorporate those two different settings, for not benefitting from RT in treatment stratification, creating three groups for treatment stratification. The results were highly significant in the validation cohort, although we acknowledge the small sample sizes, and the requirement for further validation in larger cohort studies or randomized trials.
However, although we herein showed reproducible classifiers for IBTR prognostication and RT treatment stratification, it must be noted that RT is an effective treatment, with good cost-effectiveness, and relatively mild side effects, which increases the threshold for withholding RT in patients. High predictive accuracy is required from any radiosensitivity predictor for it to be clinically useful. Although promising, the performance of our proposed SSPs and the previously published profiles show that they are not yet ready for clinical use. Validation in additional cohorts may be a next step, but further classifier development is likely needed. Indeed, our SSPs were intentionally trained with default settings using the majority of genes in the panel as a proof of concept. There is great potential to further optimize the model by e.g. reducing the number of gene pairs, weighting the gene pairs, etc. For a final clinical decision tool, one alternative may be to include additional parameters in the models, i.e. combining gene expression data with clinicopathologic variables, intrinsic subtype, and other molecular data into mixed classifiers. Indeed, combining gene expression data with additional information has already been suggested [
16,
40]. However, this dataset, especially after the validation of a locked profile, is not sufficient for extensive classifier optimization or evaluation of other clinicopathologic variables.
One limitation of our study is the case-control sampling, meaning that RT was not administered in a randomized fashion. This limits the analyses that can be performed, and e.g. the proposed method of using a Cox model with an interaction term between treatment and gene expression is not feasible in this dataset [
41]. Further, the cohort is enriched for patients with IBTR, and thus the Kaplan-Meier curves and HR estimates presented are not representative of the risk of recurrence in a matched population, and should only be interpreted as an indicator of how the different models perform in the specific datasets. The problem of treatment given in a non-randomized fashion is not unique to our dataset, but is a general problem in the development of a RT predictive gene expression signature. The publicly available datasets analyzed here were also non-randomized for RT, and the dataset presented by van de Vijver included patients who underwent both modified radical mastectomy and BCS, while the dataset by Servant et al. contained only patients who underwent BCS. Also, in the publicly available datasets the proportion of patients given RT differs. In the dataset of Servant et al., all patients were given RT, while this was not the case in the van de Vijver et al. cohort. This may explain the observed differences between the datasets when we validated our SSPs. Further, systemic adjuvant treatment was allowed in our study and was not specified in the inclusion criteria, which may introduce bias and make interpretation of the classifier performance difficult in relation to another cohort. Indeed, there are differences in the proportion of chemotherapy and endocrine therapy given in the discovery and validation cohorts (Table
1, Additional file
9: Table S2). However, to correct for this, we performed multivariate Cox regression adjusting for tumor characteristics (subtype, size and positive lymph nodes) and treatment (endocrine therapy and chemotherapy) for both the prognostic SSPs, and the consecutive use of SSPs to stratify patients for treatment, which did not alter the main findings (Additional file
2).
We chose to develop different models for ER+ and ER- breast cancer, as ER status is a major determinant of breast cancer biology [
42]. Indeed, when we analyzed the previously reported RSS and 10-GS signatures, they did not perform uniformly for ER+ and ER- disease. To that end, we investigated the biological basis behind the models, focusing on proliferation and immune response, which have been described as the major drivers of breast cancer biology [
18]. As our SSPs developed in ER+ breast cancer were correlated with proliferation, one might suspect that we found the difference between luminal A and luminal B tumors, which is defined mainly by proliferation, and that our high-risk tumors were mainly luminal B tumors. However, the rate of high-risk and low-risk predictions was similar in the luminal A and luminal B tumors. Although the performance of the SSPs were slightly higher in the luminal A tumors, the difference was not significant. Furthermore, multivariate modeling including subtype did not alter the findings (Additional file
2). RSS was also correlated with proliferation, and it was trained in a cohort with mainly ER+ tumors all treated with RT, which may explain why it could only be validated in ER + RT+ patients. More interestingly, the 10-GS could only be validated in ER-RT+ patients, and the ER + RT- tumors predicted as radioresistant actually had a lower risk of IBTR, which is consistent with the follow-up study by the original authors [
16]. As the 10-GS is negatively correlated with proliferation and immune response, as was also shown recently by the original authors [
17], this means that the tumors predicted as radioresistant were mainly slowly proliferating, and it therefore makes sense that ER+ tumors predicted as radioresistant have a better outcome. Further, the tumors predicted as radioresistant have a lower immune response, which may explain why ER- tumors predicted as radioresistant have a worse outcome, as the immune response is more important in highly proliferating and ER- tumors.
Acknowledgements
We gratefully thank Sara Baker, Carina Forsare, Kristina Lövgren and Anna-Lena Borg for excellent technical assistance. We also thank the biobanks of the South Sweden Breast Cancer Group (SSBCG), the Biobank at the Department of Oncology and Pathology Lund University biobank at Cancer Center Karolinska and the Biobank at Akademiska sjukhuset in Uppsala and Department of Pathology, Uppsala University, for collecting the samples and making them available for studies. We thank the strategic cancer research program BioCARE for providing an excellent learning environment and SCIBLU Genomics for performing the Illumina HT12 anlayses. Finally, we thank Dr. Lori J Pierce, Dr. Felix Y Feng, Dr. Corey Speers, Dr. S Laura Chang and Dr. Shuang G Zhao for assistance in calculating the RSS.