Background
Hepatocellular carcinoma (HCC) is the fifth most common cancer in the world, accounting for approximately one million deaths, with an increasing trend of new incidences annually [
1‐
3]. Surgical resection is regarded as the standard curative treatment of HCC [
3]. However, prognosis following surgery varies substantially. This variation becomes a hurdle in searching for effective and efficacious therapies and cancer management strategies. There is an ongoing search for predictive biomarkers of cancer prognosis, where pathological parameters, protein biomarkers, mRNA expression levels, and genomic DNA abnormalities have been surveyed [
4‐
9].
On two independent Hong Kong HCC cohorts that we previously described [
10], the HCC prognosis was significantly associated with clinicopathologic parameters including tumor size, number of tumor nodules (NOTN), tumor stage (new AJCC and pTNM), venous infiltration status, serum albumin level (ALBU), and serum α-fetoprotein level (AFP). These parameters were further summarized into a linear score that was demonstrated to partially predict disease-free survival ([DFS] time to tumor recurrence) and overall survival (time to death) [
10]. A natural path to further enhance this prediction model would be to incorporate molecular level biomarkers, for example, gene expression profiles in the tumor or adjacent normal tissues. Currently, such efforts have been limited due to the availability of fresh frozen tissues [
4] forcing some studies to use paraffin-embedded samples [
9,
11]. Most importantly, the search for gene signatures should be conducted by conditioning on the clinicopathologic parameters, and focus on the identification of novel variance components that improve the prognosis prediction beyond that achieved by the clinicopathologic features alone.
Herein we carried out a carefully designed search for gene expression signatures underlying prognosis of HCC using tumor and adjacent normal tissue expression profiles. We identify a gene expression signature that significantly enhances our ability to predict HCC prognosis. Additionally, we demonstrate that this HCC prognosis signature is related to widespread changes we previously identified in the liver tissue network that are associated with HCC, providing additional mechanistic insights into tumorigenesis.
Discussion
Numerous studies have reported the ability of clinicopathologic parameters [
2,
3,
10,
19] and gene expression traits [
4,
9,
13,
14] to predict HCC prognosis. However, the sample sizes of these previous studies were small, and there were no systematic efforts to compare the performance of these two types of predictors or combine them in one unified model. As we previously showed, several clinicopathologic parameters that are easily and routinely measured, provided excellent predictive power for outcome in HCC [
10] and result in predictions that were readily applicable to clinical practice. Given their utility, it is natural to attempt to further enhance the clinicopathology-based prediction model by adding gene expression data. We conducted a head-to-head performance comparison between gene expression predictors derived from normal and tumor tissue (denoted as
h
gene-expression
) vs. predictors derived solely from clinicopathology (
h
pathology
; Materials and Methods) and benchmarked them in a LOO framework (Figure
1,
2 and Additional File
10 Figure S4). Please note, the genes used in the prediction models might be different with regard to normal vs. tumor tissue expression, as well as in each LOO iteration. Overall,
h
gene-expression
and
h
pathology
performed similarly. The
h
gene-expression
of tumor tissue was better than
h
pathology
in predicting DFS, but
h
gene-expression
slightly underperformed
h
pathology
in all other scenarios. Overall, gene expression was not superior to clinicopathology in predicting prognosis. One reason might be that gene selection primarily identified genes correlated with clinicopathologic parameters (e.g. cancer stage). To assess if expression variables could be identified that enhance predictive power, stratified analysis and computed
h
gene-expression
within the good- and poor subgroups that go beyond the clinicopathologic parameters were performed. A combination of these two types of data resulted in the identification of a group of patients with near perfect survival after surgery (blue curve, left panel, Figure
3). These patients had both favorable clinicopathologic and gene expression profiles (they enjoyed a 90% survival rate over 100 months). In contrast, we found that DFS over 30 months for patients with both poor clinicopathologic and gene expression profiles whose was lower than 10%.
The focus of this study is stratified modeling, which is a natural extension of our previous work. Alternatively, we can build a single model incorporating clinicopathologic parameters and gene expression data simultaneously (Additional File
11 Figure S5). The prediction framework is identical to the above analyses except the multivariate Cox model included both the clinicopathologic parameters and the top 6 PCs. In the gene selection step we also included clinicopathologic parameters in the Cox model and then picked 100 genes with the smallest pvalues. The overall prediction was better than using gene expression alone (Additional File
10 Figure S4), indicating clinicopathology captured valuable information beyond gene expression. However, comparing to Figure
1 and
2, adding gene expression only enhanced prediction in one scenario (tumor gene expression improved the prediction of survival). A possible explanation would be that different gene sets were associated with prognosis across the strata defined by clinicopathology, and these gene sets offer various prediction value. For example, shown in Figure
2, normal tissue expression was used in prediction in the good-survival stratum but not the poor-survival stratum. In the single model approach, genes with little prediction value also entered the model, bringing noise and reducing the performance. Lastly, we also evaluated a single model incorporating clinicopathologic parameters, and expression profiles of both normal and tumor tissue (Additional File
12 Figure S6). We reduced the expression profile of each tissue to 6 PCs, therefore, a total of 12 PCs entered the model. Overall, such models did not greatly outperform the models based on clinicopathology alone (Figure
1&
2) or models based on clinicopathology and expression profiles (Additional File
11 Figure S5). Again, this lack of improvement could be attributable to noises introduced into the prediction.
HCC tumor tissue and adjacent non-tumor tissues harbor distinct prognosis-associated signatures and lead to differences in predictive power. Importantly, we noted that the gene signature derived from the tissues significantly overlapped. Consistent findings were also reported on Chinese and Belgium HCC patients [
14], Asian [
13] and Singapore patients [
4]. However, for 82 Japanese patients, Hoshida et al found gene-expression profiles of tumor tissue failed to yield a significant association with survival [
9]. This result is inconsistent with the fact that gene expression traits in tumor tissues were correlated with cancer stage, and cancer stage was strongly associated with survival [
10]. The failure to detect gene expression traits with predictive power in this instance could be due to the small sample size (N = 82) and the use of formalin-fixed, paraffin-embedded tissues [
9].
The mechanistic basis whereby gene expression traits predict the aggressiveness of a tumor remains to be defined. One of the striking features of this analysis and others [
4,
9,
13] was the finding that signatures in normal tissue adjacent to the tumor is highly predictive of prognosis. It had been suggested that mechanistically these signatures represented a so called "field-effect" capturing damage to liver tissue and the state of inflammation related to the likelihood of subsequent tumors arising [
9,
20]. In other words, the field effect hypothesis implied that the signatures did not relate directly to processes in the tumors per se but rather the environment from which tumors might arise. An alternate hypothesis supported by our results was that the signatures are mechanistically connected to tumor specific events in some way, given the genes associated with survival (p < 2e-159, fold enrichment 31.01) and DFS (p < 5e-152, fold 29.31) in adjacent normal and in tumor tissues significantly overlap.
To address whether the "field effect" or this alternate hypothesis was better supported by the data, we first examined the evidence that tumors directly affect the surrounding normal tissues via secreted factors. If that were the case we would expect that the adjacent normal gene expression patterns would be correlated with DNA copy number abnormalities (CNA) in the tumor tissue, given we previously showed CNA was strongly connected to tumor gene expression [
10]. No significant associations beyond what would be expected by chance were found. We were also able to exclude significant invasion of tumor cells into the adjacent normal tissues given we observed no significant associations between normal tissue derived CNA and normal tissue gene expression [
10].
Given this result, how might signatures in normal tissues mechanistically relate to tumor events? Herein, we took advantage of previous work on this dataset. We described the massive gene expression network changes that occur during HCC tumorigenesis, and such rearrangements were likely driven by tumor CNA [
10]. In brief, we defined gene pairs where the pair was significantly correlated in one setting and significantly less correlated in another [
10]. Using stringent cut-offs, there were 8, 736 genes differentially connected between the adjacent normal and tumor tissues with ~86% of cases representing a loss of connectivity in the tumor (LOC) and the remaining ~14% representing a gain of connectivity (GOC) [
10]. We therefore tested whether the predictive gene expression traits from the adjacent normal tissue were enriched for differentially connected genes and found that indeed they were. The genes associated with survival in adjacent normal tissue (Cox p < 0.01, Additional File
6,
7,
8 and
9, Table S3) were enriched for genes participating in differential connections (p < 1.98e-80, fold enrichment 2.69). Similarly gene expression traits in adjacent normal tissue associated with DFS (Cox p < 0.01, Additional File
6,
7,
8 and
9, Table S3) were enriched for genes identified as differentially connected in the tumor tissue (p < 5.2e-69, fold 2.52).
Our previous reports also documented that a large fraction of expression variation can be explained by CNA in the tumor tissues [
10]. Therefore, we asked if the predictive genes (Additional File
6,
7,
8 and
9, Table S3) were enriched for genes that associated with
cis-acting CNAs in tumors. We found that both survival- and DFS-associated genes in normal tissue were enriched for genes associated with CNAs in
cis in the tumor tissue (p-values were 2.99e-8 and 1.72e-11, respectively). Since genes in adjacent normal tissues were measured entirely separately from genes in the tumor, there was no a priori reason for them to behave similarly unless there was a mechanistic connection. We found that predictive genes from adjacent normal tissue were selectively enriched in network re-arrangements and enriched for genes that associate with CNA associated tumorigenesis, strongly suggesting that these genes represent important functions targeted for alteration in tumors. Stated in another way, the predictive signatures in adjacent normal tissue are a measure of the ability of the tissue to alter expression networks to enter a more aggressive state.
If the predictive genes were the determinant of the normal-to-tumor network reconfiguration, then we would expect that genetic perturbations of these genes would also be associated with HCC prognosis. We examined the normal liver tissue eSNPs underlying two sets of genes (1) genes that are differentially connected between the normal and tumor states in the HCC cohort, and (2) genes whose expression levels were significantly explained by CNA in the liver tumor tissue. We found that the eSNPs controlling the expression level of these two sets of genes were enriched for association with HCC survival and DFS. This result supports the hypothesis that the mechanisms by which genes in normal tissue are predictive of prognosis reflects their ability to facilitate the transition from a normal tissue network to a tumor network, where this transition determines cancer progression.
We have demonstrated the excellent predictive power of our approach by combining clinicopathologic parameters and gene expression profiles. As a result, we expect that this approach to provide valuable guidance for HCC treatment/management in clinical practice. More importantly, based on our previous work on the architecture of coexpression networks in adjacent normal and tumor tissues [
10], we proposed a general mechanism of how predictive genes influence HCC prognosis. The massive rearrangement of expression networks plays a central role in HCC progression, which was reflected in the ability of such genes to predict HCC prognosis. This also explained why the predictive genes significantly overlapped between the adjacent normal and tumor tissues, since such genes would ostensibly continue to reflect the ongoing alterations in network state.
Declaration of competing interests
The authors declare that they have no competing interests.
Authors' contributions
All authors read and approved the final manuscript. KH, all figures and tables, experimental design and manuscript preparations. JL, MM, DG, MDF, CM EES, HD and JML, experimental design. CZ, processing the mRNA microarray readings and calculation of the expression values. TX and EC, processing the SNP arrays and calculation of the CNV values. KW, BZ and HZ, construction of the expression networks. NPYL, ION, PCS, RTPP, and JML, collection of tissue samples and clinicopathologic parameters.