Introduction
Cancer prognosis and treatment plans rely on a collection of clinicopathological variables that stratify cancers outcomes by stage, grade, responsiveness to adjuvant therapy, and so on. Despite stratification, cancer’s enormous heterogeneity has made precise outcome prediction elusive and the selection of the optimal treatment for each patient a difficult and uncertain choice. Over the past two decades, advances in molecular biology have allowed molecular signatures to become increasingly obtainable [
1] and incorporated into determining cancer prognosis and treatment [
2]. For some cancer types, like breast cancer, gene expression signatures are now routinely used prognostically, with many research groups having identified signatures that predict cancer outcome or consider if patients will benefit from adjuvant therapy following surgical resection [
3]-[
9]. Surprisingly, however, there is little overlap in genes between the various signatures within different tissues or the same tissue (for example, breast cancer) raising questions about their biological meaning. Furthermore, even with gene expression signatures’ successes in cancer outcome prediction, improvement is possible, as the majority of these signatures are applicable only to early-stage cancers without lymph node (LN) metastasis or even previous chemotherapy. As cancer is fundamentally a disease of genetic dysregulation, specifically analyzing a tumor’s regulatory actors, such as transcription factors (TFs), may provide additional prognostic insight [
10],[
11], since transcription factors are relatively universal among different cell lines when compared to the tissue-specific gene clusters from which most gene signatures are made.
TFs are proteins that relay cellular signals to their target genes by binding to the DNA regulatory sequences of these genes and modulating their transcription [
12]. They play major roles in many diverse cellular processes [
13]-[
17]. Unsurprisingly, aberrant expression or mutation of TFs or of their upstream signaling proteins has been implicated in an array of human diseases, including cancer [
18]-[
20]. Given their central regulatory functions, monitoring of TFs is widely regarded as a potentially useful and biologically sensible method for the prediction of cancer and disease outcome [
1].
While differences in the transcriptional expression level of a TF do not necessarily correspond to differences in its regulatory activity, differences in the expression levels of a TF’s target genes do [
21]-[
23]. We have previously developed an algorithm to make this inference of a TF’s regulatory activity from the expression of its target genes, called REACTIN (REgulatory ACTivity INference) [
24]. REACTIN can calculate the activity level of a TF on each individual sample in a given dataset. By calculating these levels and generating individual regulatory activity scores (iRASs) for a given TF and sample, REACTIN reveals a given TF’s activity level for each individual sample relative to all others in a dataset, thereby enabling the incorporation of a TF’s activity level into regression-based analyses. For example, by combining these iRAS TF activity levels with survival data, Cox proportional hazard (PH) models can be employed to examine how TF activity levels correlate with survival outcomes.
In this study, we define an E2F4 signature based on its target genes identified by chromatin immunoprecipitation sequencing (ChIP-seq) experiments. Based on the signature, E2F4 activity is inferred in breast cancer samples and used for predicting clinical outcomes. We focus on the E2F4 signature, because we have previously identified it as being prognostic in breast cancer in a large-scale computational screening analysis [
24]. Further, in other work we have found E2F4’s activity level to be the most important of all TFs in predicting cell cycle phase in the HeLa and K562 cell lines, suggesting an essential role for E2F4 in cell cycle regulation [
25]. Beyond our work, it is broadly considered that E2F4 plays an important role in both cell cycle arrest [
26] and in modulating cell proliferation [
27]. Furthermore, transgenic mice overexpressing E2F4 develop tumors, and mutated E2F4 has been reported in several types of cancers, including cancers of the gastrointestinal tract and prostate [
26],[
28], suggesting a broad tumorigenic role for E2F4.
Using E2F4 with our REACTIN method and clinical outcome data, we examine E2F4’s regulatory activity in detail as a predictor of survival outcome for breast cancer. With a collection of eight publicly available datasets containing gene expression and survival data for over 1,900 breast tumor samples, we show that E2F4 regulatory activity is strongly prognostic and remains so even after adjusting for other molecular markers, clinicopathological variables, clinical risk scores, Oncotype DX stratification, and differences in patient treatment. E2F4 activity level also correlates with classification assignment of breast cancers into their intrinsic subtypes. Extending beyond breast cancer, we preliminarily analyze E2F4 regulatory activity levels in bladder, colon, non-small cell lung, glioblastoma, acute myeloid leukemia, and Burkitt’s lymphoma cancer types, respectively, and find that they appear prognostic in colon, glioblastoma, and bladder cancer. E2F4 regulatory activity level predicts breast cancer survival outcome and may be of use in augmenting prognosis in cancer types.
Discussion
Several breast and other cancer prognostic methods rely on gene expression signatures as predictors of survival or the need for adjuvant therapy for a patient. While these methods have seen prognostic success and improved decision-making regarding patient treatment plans, they intrinsically suffer from problems of overfitting and multiple comparisons that raise questions of whether the genes selected in the signatures are of biological and etiological significance. Indeed, much concern has been raised about the small degree of overlap between different prognostic gene signatures and the degree to which the given microarray platform (Affymetrix vs. Agilent, and so on) affects the gene composition of each signature [
52],[
53].
In this manuscript, we present the result of an alternative and more robust method of deriving a gene prognostic signature: using ChIP-seq data from multiple cell lines, we identify a TF’s set of gene targets, whose differential expression in patient samples can be used to calculate the TF’s regulatory activity in these samples. By examining TF activity, biological significance of the signature is preserved. Further, by inferring this TF activity through the expression levels of its set of gene targets, the TF’s actual functional activity is assessed - TFs work via altering their target genes’ expression levels - and in a way that allows for the use of widely available microarray data and regardless of platform type. Hence, the signature is fundamentally derived from mechanistic relationships of genetic regulation and is easily measured with current techniques.
With E2F4 ChIP-seq data in the three cell lines of GM06990, K562, and HeLa S3, we have identified a set of 199 genes as significantly targeted (
P <0.01) by E2F4 across the three cell lines (Figure
1A). To confirm this gene set’s ability to infer E2F4’s regulatory activity level, we have used the gene set in conjunction with REACTIN to generate E2F4 iRASs in a cell cycle phase dataset [
29], finding that E2F4 iRASs exhibits a periodic pattern and greatly outperforms E2F4 gene expression level in correlating with cell cycle phase (Figure
1B). As E2F4 is a known critical regulator of the cell cycle [
25], this result suggests that E2F4 iRASs reflect E2F4 functional activity and with much better accuracy than E2F4 gene expression level alone.
Using this method of generating E2F4 iRASs for given samples, we turned to breast cancer based on our prior work [
24] to examine E2F4’s inferred functional activity through iRASs and its ability to predict survival prognosis. In all publicly available, sufficiently sized datasets containing survival data, we have found that E2F4 iRASs are significant predictors of survival outcome (Figures
2,
3 and
4A), with a higher E2F4 iRAS predicting shorter survival. Importantly, pooled analyses further show that this predictive power remains robust even after adjusting for clinicopathological data through Cox PH multivariate regression (Table
2) and stratification, including by pharmacological treatment status (Figure
4). In contrast, most of the currently available prognostic gene signature tools apply only to early-stage, untreated cancers. Interestingly, in all Cox PH models (Table
2), the E2F4 iRAS has the largest HR and smallest
P value of all covariates, suggesting it not only provides prognostic prediction beyond the other variables but that it is additionally the most important driver of survival outcome. The finding that it remains prognostic even with stratification by treatment status further indicates that current pharmacological therapy does not disrupt the E2F4 functional program, perhaps suggesting a potential future treatment target.
Beyond examination with clinicopathological data, we examined E2F4 iRASs and their relationship to histological and molecular subtypes. We evaluated confounding of our results by patient ER and PR status, based on literature linking E2F4 with progression of estrogen-dependent breast cancer cell lines. Interestingly, E2F4 was a significant predictor of survival only in ER-positive and PR-positive and not ER-negative or PR-negative patients (Figure
5A and B), suggesting that E2F4’s regulatory activity plays a role in steroid-dependent but not steroid-independent cancers. A connection between ER-mediated regulation of the cell cycle and E2F4 has been previously suggested: Carrol
et al. proposed a mechanism for anti-estrogen drug effects on cell cycle arrest involving the phosphorylation of E2F4, which then induced cell cycle arrest in the MCF7 breast cancer cell line [
54]. Dhillon
et al. showed that MCF7-CycE, a breast cancer cell line overexpressing cyclin E (which, in turn, binds E2F4), was capable of overriding tamoxifen-mediated growth arrest in comparison to its wild-type MCF7 counterpart [
55]. These results imply a connection between E2F4 activity and ER-mediated effects on the cell cycle, which agree with our observation of ER status as a confounder of E2F4 activity in breast cancer survival.
Additionally, we have compared the results from the E2F4 signature and those from the Oncotype DX method [
49] (Figure S2 in Additional file
4). From the Ur-Rehman breast cancer metadata, we selected 557 samples that were ER-positive, LN-negative and had known RFS information. We calculated the ‘recurrence score’ of these samples using the Oncotype DX method, and then divided samples into low- (200 samples), intermediate- (124 samples) or high- (233 samples) risk groups. Survival analysis indicates that the three groups are significantly different in their RFS (
P = 2e-12). The E2F4 signature achieved comparable results in the same sample set - the iRAS >0 group (195 samples) displayed significantly shorter RFS than the iRAS <0 (362 samples) group (
P = 8e-11). There is a correlation between the Oncotype DX groups and the E2F4 groups. In the high, intermediate and low Oncotype DX groups, the fraction of samples with E2F4 iRAS >0 are 64%, 19% and 11%, respectively, consistent with their expected prognosis. More importantly, our results indicate that the E2F4 signature can further improve the Oncotype DX classification results. When the 124 intermediate Oncotype DX samples were further stratified into iRAS >0 and iRAS <0 subgroups based on their E2F4 scores, the positive subgroup showed significantly shorter RFS than the negative subgroup (
P = 0.0004). This suggests that E2F4 signature can be used in conjunction with the Oncotype DX system to achieve better performance.
Beyond E2F4 regulatory activity, cell proliferation can be captured by other molecular features. For instance, the Ki-67 protein (encoded by the MKI67 gene) is strictly associated with cell proliferation and has been used as a cellular marker for proliferation [
56]. The prognostic value of it has been demonstrated in multiple tumor types including breast cancer [
57]-[
59]. Compared to the E2F4 signature that is based on multiple genes, however, the single gene MKI67 marker is not stable and generally shows lower predictive accuracy. Specifically, when samples are stratified into two groups with high and low MKI67 expression respectively, the two groups show significant survival difference with
P = 0.0001 (data not shown), which is less predictive then the E2F4 signature (
P = 7e-9, Figure
2C).
An emerging method for breast cancer prognosis relies on classification of breast cancer into intrinsic subtypes based on their gene expression profiles, with the five subtypes of normal-like, basal-like, luminal A and B, and HER2-enriched most frequently used (Parker
et al. 2009 [
47]). Analysis of E2F4 levels in these subtypes showed a significant variation among them, with lower levels of E2F4 activity seen in a greater fraction of cancer samples classified into intrinsic subtypes with better prognosis (Figure
6). Still, E2F4 could serve to improve the prognostic power of the current molecular classifications by adding an additional factor of classification: negative (low) versus positive (high) E2F4 activity. For example, when luminal B samples were further stratified into two groups based on E2F4 iRAS, the 50% samples with lower E2F4 activity exhibit significantly longer RFS times than the remaining 50% samples with higher E2F4 activity (
P = 0.02).
Since our derived E2F4 signature was selected with the intent of being relatively tissue-independent, consisting of genes that play a role in cell cycle progression across three cell lines, we decided to evaluate E2F4 in six other cancers: bladder cancer, colon cancer, lung cancer (NSCLC), brain cancer (glioblastoma), leukemia (AML) and Burkitt’s lymphoma. Our results show that E2F4 was significantly correlated with survival time in bladder cancer, glioblastoma, and colon cancer, but not in NSCLC, AML or Burkitt’s lymphoma. These are preliminary results, and more investigations are needed to more precisely understand the role of the E2F4 regulatory program in tumorigenesis and progression of cancer types beyond breast cancer.
This study has several limitations. First, the numbers of effective samples in different cancer datasets are very different, which influence the power of statistical analysis. This also restricts the ability to examine the E2F4 signature in certain breast cancer subtypes accounting for only a small fraction of samples. Second, the breast cancer datasets used in this study are diverse in terms of sample selection, platforms for gene expression measurement, genetic background of patients, and treatment to patients. As such, it is difficult to identify the confounding variables in each dataset and correct their effect in prognosis. Finally, the quality of survival information in each dataset can vary considerably depending on the length of follow-up and other factors. This will also impact the results of the prognostic predictions.
Going forward, we aim to extend the application of the E2F4 signature to several directions. First, to refine the signature, we will select a subset of core genes from the E2F4 signature while keeping a comparable predictive power. Second, it will be useful to examine the effectiveness of this signature in more specific breast cancer subtypes, for example in ER+ LN+ (LN-positive) and ER+ LN-. Third, it will also be interesting to test whether the E2F4 signature can predict sensitivity to a specific drug or treatment, for example the CDK inhibitors, which can repress the E2F4 regulatory program. Finally, it will be useful to more thoroughly examine its effectiveness in other cancer types.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
CC and JD conceived and designed the analysis. CC, EA, SK, and MU collected the data and performed the analysis. CC, EA, SK, MU, and JD wrote the manuscript. All authors have read and approved of the final version of this manuscript.