Background
A wealth of clinical data has confirmed the role of using KRAS mutational status to stratify advanced-stage colorectal cancer (CRC) patients to receive anti-EGFR monoclonal antibody (mAB) therapy [
1‐
7]. Activating KRAS mutations are strong independent negative predictors of response to such treatment and mutational testing has been included in colorectal cancer practice guidelines. Interestingly, KRAS mutations may also predict lack of response to EGFR tyrosine kinase inhibitors (TKI) in lung cancer, suggesting a common mechanism of resistance to anti-EGFR therapies in these two tumor types [
8‐
10]. Importantly, a large percent of lung cancer and CRC patients harboring wildtype KRAS, do not realize benefit from EGFR-targeted agents [
1,
3,
5,
7]. Therefore, additional methods of patient stratification are required to improve the tailoring of EGFR-targeted therapy in these diseases.
We have previously published a gene expression predictor of response (GEPR) to erlotinib in lung cancer [
11]. The 180-gene model was built on Affymetrix microarray data and genes were selected and weighted based on the expression data from a series of lung cancer cell lines with known sensitivities to erlotinib. The model was externally validated using additional lung cancer cell lines as well as in human tumors (reference 11 and unpublished data). Given the correlation between KRAS mutational status and response to both EGFR-mAB and EGFR-TKI in lung and colorectal tumors, we hypothesized that our previously published GEPR is capable of predicting response to cetuximab in metastatic CRC.
Khambata-Ford and colleagues conducted a study with over 100 CRC patients wherein metastatic sites were biopsied, mutational status of KRAS was determined, and gene expression data was generated [
12]. Following the biopsy, patients were treated with cetuximab as monotherapy and response and progression-free survival were recorded. The purpose of that study was to identify predictive biomarkers for response to cetuximab.
The publication of these data presented an excellent opportunity to test our hypothesis that the 180-gene GEPR to erlotinib generated in lung adenocarcinoma cell lines was portable to KRAS-wildtype CRC in predicting response to cetuximab. Since the data published by Khambata-Ford and colleagues was not available until almost a year following the publication of our predictive model, the data could be utilized to perform a true external validation, essentially equivalent to an independent prospective study due to the sequence and timing of the involved publications.
The primary endpoint of our study was to test the ability of our predictive algorithm to segregate cetuximab responders from non-responders in the KRAS-wildtype population included in the Khambata-Ford study. We found that our GEPR of erlotinib response was strongly predictive of cetuximab response with no gene-weighting adjustment or additional gene selection. However, reducing the signature to 26 of 180 genes based on the correlation of those genes to survival in the Khambata-Ford dataset significantly improved the predictive accuracy and Kaplan Meier curve separation. Importantly, the refined signature retained the original weights from the NSCLC model-training data, reducing the likelihood of over-fitting.
The most significant finding of this study was that the GEPR was capable of predicting progression-free survival in another tumor type than that on which the model was built, and with another EGFR-targeted agent. Similarly, other groups have previously reported portability of gene expression signatures [
13,
14]. We believe that this model could be highly useful in predicting response to cetuximab in CRC in patients with KRAS-wildtype tumors. Furthermore, additional studies to validate the predictive capacity of the model in other appropriate tumor types are underway.
Discussion
The anti-EGFR monoclonal antibodies cetuximab and panitumumab are frequently used in metastatic CRC and improve overall survival when used in unselected populations [
15‐
19]. However, a number of independent studies have elucidated the correlation of activating mutations in KRAS with lack of response to EGFR-targeted agents, and patient stratification based on KRAS status should improve overall survival through enrichment of responding patients [
1‐
6]. However, a significant number of KRAS-wildtype patients do not benefit from treatment, and therefore additional methods to enrich the treated population for responders are needed to reduce unnecessary toxicity and cost while maximizing therapeutic benefit from these agents. Indeed, Karapetis and colleagues reached the conclusion that additional biomarker approaches are needed to identify KRAS-wildtype patients who will receive benefit from cetuximab in one of the largest analyses to date of the association of KRAS status with clinical outcome to cetuximab in CRC [
7].
In this study, we utilized a GEPR for erlotinib, an EGFR-TKI, which was generated in lung cancer cell lines, to test its predictive capacity in KRAS-wildtype mCRC patients treated with the anti-EGFR mAB cetuximab. It is important to note that the GEPR generated in lung cancer cell lines and was not dependent on either KRAS or EGFR mutation status. Further, the genes included in the signature demonstrate biological association with pathways downstream of EGFR, including both the PI3K/AKT and MAPK pathways [
11].
Application of our model to the CRC dataset represents a true external validation of the GEPR since the validation set was not available until well after the reporting of our GEPR model. The availability of this dataset allowed us to determine whether the GEPR could predict response to alternate EGFR-targeted agents, employ the use of KRAS status to enrich the predictive power, and function across tumor types (CRC versus non-small cell lung) [
11,
12]. Surprisingly, the unaltered 180-gene model had a high capacity to stratify KRAS-wildtype CRC patients who demonstrated disease control or response to cetuximab treatment. The data were furthered by the significant separation of the survival curves of the predicted 'sensitive' group versus the predicted 'resistant' group.
Importantly, these results were achieved even though the genes that comprised the model were selected and weighted based on the genomic expression in lung cancer cell lines. Unlike the data reported by Khambata-Ford and colleagues, neither amphiregulin (AREG) nor epiregulin (EREG) are included in our GEPR. Further, RNA isolation from biopsy of metastatic CRC of unknown tumor cell content and subsequent microarray hybridizations were all performed at a different facility than our own.
In the original report, Khambata-Ford and colleagues used AREG and EREG expression to stratify KRAS-wildtype patients, and found a significant improvement in PFS in the 'high' ligand expressers group (EREG: P = .0002, hazard ratio [HR] = 0.47, and median PFS, 103.5 v 57 days, respectively; AREG: P < .0001, HR = 0.44, and median PFS, 115.5 v 57 days, respectively)[
12]. The differences in median survival reported in that study are greater than those identified in our study using the original 180-gene model. It is not surprising that the authors were able to demonstrate separation of the survival curves between high ligand expressers and low ligand expressers because AREG and EREG were chosen as biomarkers post-hoc. AREG and EREG were selected from over 600 genes after the response and progression free survival in the study population was already determined. Optimal cutoff expression levels were obtained from a receiver-operator characteristic (ROC) curve, and changes in median PFS were then calculated on the same data used to generate these variables. It has yet to be shown whether AREG and EREG hold any external validity as predictors of cetuximab response. In contrast, our predictive model was generated prior to the reporting of the Khambata-Ford data and using these data, provides true external validity of our model. The improvement in progression-free survival that we identified in the predicted 'sensitive' KRAS-wildtype mCRC patients was approximately 1 month. Given that cetuximab yields an overall benefit in PFS of 1.5 months as monotherapy in CRC as well as the high cost of treatment, these findings should be considered clinically important [
20].
In light of the cost associated with microarray analysis, we went on to attempt to reduce the number of predictive genes necessary to achieve both response prediction and PFS stratification using data from the Khambata-Ford et al study. In so doing, we found that refining the GEPR, using a subset of 26 of the original 180 genes, greatly improved the sensitivity and specificity of the GEPR. Furthermore, using the refined 26-gene GEPR significantly improved the difference in median PFS between the predicted-sensitive and predicted-resistant groups and resulted in improved predictions compared with those reported by Khambata-Ford et al. Distinct differences in the gene expression patterns are observed in the gene expression values (color scheme of the heat map in Figure
3) of the 26 feature signature, clearly identifying a trend which corresponds to PFS. However, the variability of these patterns observed on a per gene basis highlights the necessity of using multiple features to capture the heterogeneity of tumors. As with the Khambata-Ford analysis, careful interpretation of the predictive accuracy of our refined model is necessary. Because information from the validation set was utilized in feature selection, over-fitting remains a possibility. The refined GEPR reported here retains the original weights of the 26 genes, reducing the chance of over-fitting. Additional validation will test that hypothesis and determine if the 26-gene GEPR can be used in a qRT-PCR analysis rather than on an Affymetrix platform.
While the 180-gene GEPR was useful for stratifying KRAS-wildtype patients, we also wished to determine whether the GEPR could stratify patients independently of KRAS status. Statistical significance was retained in both median PFS and the log-rank analyses when patients were not stratified based on KRAS status, suggesting that the signature is an independent predictor of benefit to cetuximab therapy in mCRC. However, patients with KRAS-mutant CRC tumors who predicted as 'sensitive' did not have longer PFS than those who predicted as 'resistant', although this could be due to the small sample size included in this particular analysis. It is of note that one patient with a KRAS-mutant tumor was reported by Khambata et al to have had a PFS of > 1 year on cetuximab, although radiographic response in this patient was not recorded. Our 180-gene GEPR classified this patient as 'sensitive', offering additional support of the independency of our test from KRAS mutational status. However, a significant number of non-responding KRAS-mutant patients were called 'sensitive' by the GEPR, contributing to a poor positive predictive value in this group.
To further explore the relationship between the GEPR prediction status and KRAS status, we performed a χ2 test. No association with KRAS status was found in the prediction outcomes for either the 26-gene (p = 0.2) or the 180-gene signature (p = 0.3). Thus, our test appears to be independent of KRAS status. On a per gene basis, we also examined whether any of the 180 genes were significantly different between the KRAS-wildtype and KRAS-mutant cohorts. Of the 180 genes, 32 were p < 0.05 according to a two tailed t-test (although a Bonferroni correction yielded no significantly deregulated genes). However, only 3 of these genes were included in the final 26 gene signature (ATP2C1, P2RY5, and TNFRSF10B). Thus, this test offers an explanation for why the 26-gene signature demonstrated improved predictive accuracy over the 180-gene signature, as the majority of genes associated with KRAS activation appear to have been removed during gene list filtering.
Our GEPRs, 180- or 26-gene, may be best utilized in tandem with KRAS-mutational testing. Importantly, our methodology could easily be combined with KRAS mutational testing through biopsy of metastatic sites and allotment of tissue cores for both RNA and DNA purification. The high sensitivity and negative predictive value of the test suggests that use of the model could be implemented to significantly enrich the responding patient population while minimizing the number of potential-responders (i.e. false negatives) who would be diverted from receiving cetuximab.
Competing interests
JM Balko and EP Black are pursuing patent protection on the use of the predictive method reported here.
Authors' contributions
JMB co-authored the manuscript and performed the data analysis. EPB provided scientific direction and co-authored the manuscript.