Background
Liver cancer was reported to be the sixth most common cancer and the fourth leading cause of cancer-related death in the world according to global cancer statistics in 2018 [
1]. In the United States, approximately 42,030 people are diagnosed with liver cancer, and 31,780 die annually according to the latest cancer statistics in 2019 [
2]. Hepatocellular carcinoma (HCC) is the main type of primary liver cancer, comprising 75–85% of cases [
3]. Despite the fact that the diagnostic approaches and therapeutic efficacy of HCC have gradually improved, the majority of patients with HCC are still diagnosed at an advanced stage with severe hepatic dysfunction due to the asymptomatic nature of the disease. Accordingly, the 5-year overall survival (OS) and recurrence-free survival (RFS) rates of advanced-stage HCC patients remain extremely low, and approximately 70% of HCC patients experience recurrence or extrahepatic metastasis within 5 years [
4,
5]. Considering the poor outcomes, many researchers have sought to identify prognostic factors based on clinicopathological and molecular features to help increase life expectancy and improve quality of life. However, more reliable biomarkers associated with the molecular mechanisms that mediate prognosis remain to be deeply explored for early diagnosis and optimized therapy.
It has been reported that less than 2% of the total genome encodes protein-coding genes, so research on noncoding RNA transcripts, including long noncoding RNAs (lncRNAs), microRNAs (miRNAs) and circular RNAs (circRNAs), has become increasingly popular [
6]. In recent years, emerging evidence has indicated that lncRNAs, which consist of more than 200 nucleotides, play a vital role in a large variety of biological processes, including genetic transcription, chromosome modification, cell cycle, cell differentiation and migration [
7‐
9]. Numerous studies have shown that miRNAs, which consist of approximately 22 nucleotides, may participate in tumor initiation, progression, and invasion by post-transcriptionally downregulating target gene expression by complementation to miRNA response elements (MREs) on messenger RNA (mRNA) [
10‐
12]. Moreover, the competing endogenous RNA (ceRNA) hypothesis proposed by Salmena et al. [
13] depicted a molecular biological regulatory mechanism for posttranscriptional regulation in which ceRNAs can act as miRNA sponges and inhibit miRNA function by competitively binding to MREs on a target mRNA. Thereafter, numerous experiments have validated the hypothesis that this type of indirect regulatory mechanism is involved in tumorigenesis and progression [
14‐
16]. Several studies on ceRNAs have reported valuable factors for predicting the OS of HCC patients [
17‐
19]; however, the molecular biological mechanisms underlying the occurrence, progression, recurrence and metastasis of HCC have not yet been fully illuminated, especially the molecular mechanisms that mediate recurrence, which remain unclear and require further investigation.
In this study, microarray and sequencing data were collected from a large sample size of patients with HCC in the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases and applied to identify differentially expressed genes (DEGs) in HCC. Two predictable ceRNA networks, including the ‘upregulated’ network and the ‘downregulated’ network, were then constructed based on the ceRNA hypothesis. Twenty mRNAs involved in the ceRNA networks were identified as recurrence-related genes. Furthermore, LASSO-penalized regressions were utilized to screen the recurrence-related genes and successfully establish a prognostic signature consisting of ADH4, DNASE1L3, HGFAC and MELK. More importantly, a quantitative real-time PCR method was adopted to verify this signature in an external cohort, which showed good predictive performance. These comprehensive analyses aimed to reveal the underlying molecular regulatory mechanisms of HCC tumorigenesis and progression and develop a prognostic signature that can be used to predict the RFS of HCC patients.
Discussion
HCC, which has high morbidity and mortality worldwide, is the main pathological type of liver cancer. Surgical treatments, the major interventional measures, can effectively improve the prognosis of early HCC patients; however, a large number of HCC patients are diagnosed at an advanced stage and are thus unsuitable for such treatments and eventually experience recurrence and metastasis. It remains a clinical challenge to identify patients who are at an early stage and predict patients who are at risk for recurrence after undergoing resection for HCC. Therefore, studies on the underlying molecular mechanisms of the tumorigenesis and progression of HCC are needed to identify reliable markers that can be used to assess the risk of recurrence and guide the development of personalized therapeutic strategies. As a result, HCC could be diagnosed at earlier stages, and such at-risk patients could receive close surveillance and novel interventional treatments. Moreover, for patients who are not at-risk but exceed the Milan criteria, liver transplantation may become the alternative choice of treatment.
Nevertheless, neither the widely accepted Barcelona Clinic Liver Cancer (BCLC) staging system nor the AJCC staging system for HCC includes molecular information, which can act as a complement to optimize therapeutic strategies and improve the clinical prognosis of HCC. Emerging evidence demonstrates that ceRNAs involved in signaling pathways are of significance in the tumorigenesis and progression of HCC, indicating that molecular markers based on the ceRNA network are equally important in the prediction of HCC recurrence.
Comprehensive analyses of large-scale microarray data and high-throughput sequencing data from public databases are often used to explore molecular biological mechanisms and identify potential molecular markers to help diagnose and predict prognosis. Gene signatures have allowed the accurate prediction of prognosis, and many studies have addressed prognostic prediction in HCC using array-based gene expression profiling. For example, Hoshida et al. [
27] studied tissues from 307 HCC patients and discovered and validated a gene expression signature associated with OS with the use of a Cox regression model. As a result, they found that the gene expression profiles of the surrounding nontumor liver tissues were highly correlated with OS not only in a training set of 82 Japanese patients but also in an independent group of 225 patients from the United States and Europe. Villanueva et al. [
28] assessed 287 HCC patients undergoing resection and tested genome-wide expression platforms using tumor (
n = 287) and adjacent nontumor tissues to identify independent predictors of tumor recurrence based on Cox modeling. Finally, they developed a composite prognostic model for HCC recurrence that can predict early and overall recurrence in patients with HCC and complement findings from clinical and pathological analyses.
Certainly, many studies have explored some possible molecular regulatory pathways and feasible prognostic signatures of HCC to predict RFS, but few of them have constructed corresponding prognostic signatures based on ceRNA regulatory networks or validated the signature with other independent cohorts. Lv et al. [
29], for instance, constructed a lncRNA-based classifier based on the expression profiles of seven lncRNAs (AL035661.1, PART1, AC011632.1, AC109588.1, AL365361.1, LINC00861, and LINC02084) to predict early recurrence in HCC after curative resection but did not establish a ceRNA network for HCC. Ye et al. [
30] utilized Cox-penalized regression to develop a novel four-lncRNA (WARS2-IT1, AL359878.1, AL357060.1, and PART1) expression-based RS system for predicting the RFS of patients with HCC. Unfortunately, the RS systems were not further verified experimentally. Li et al. [
31] partially compared the 1-year recurrence group (
n = 56) with the nonrecurrence group (
n = 60) of HCC patients from the TCGA database and constructed a hsa-mir-150-5p-centric ceRNA network and two effective prognostic nomogram models for predicting recurrence. Similarly, they failed to validate the results in an external cohort.
In the present study, three datasets with paired samples from studies on HCC were downloaded from the publicly available GEO database. The published original studies, from which the data were obtained, are as follows. Makowska et al. [
20] found that gene expression profiling of HCC biopsies has limited potential to direct therapies that target specific driver pathways but can identify subgroups of patients with different prognoses. Grinchuk et al. [
21] developed a prognostic stratification approach to identify common oncogenic pathways and significant prognostic variables in HCC patients with resectable primary tumors. Yang et al. [
22] discovered and characterized an expanded landscape of lncRNAs based on high-throughput sequencing technology and bioinformatics analysis of matched samples from HCC patients.
To fully investigate the information of these datasets, multistep processing and integrated analyses were applied to reveal prognostic genes. Combined with the results of the TCGA database, a total of 116 dysregulated mRNAs (14 upregulated and 102 downregulated) were identified as DEGs and used for subsequent analyses (Fig.
3). In the GO functional analysis, the DEGs were predominantly enriched in extracellular areas and oxidation-reduction processes. With respect to KEGG pathway enrichment analysis, the DEGs were mainly enriched in metabolic-related pathways, which is in accordance with the findings of a previous study [
32]. The visualized PPI network showed that the interactions between the DEGs were almost separately enriched in upregulated and downregulated genes. Subsequently, two biologically predicted ceRNA networks were constructed by comparing three RNA levels (lncRNAs, miRNAs and mRNAs) based on the competitive relationships of the ceRNA hypothesis to elucidate the interactions and regulatory mechanisms of the DEGs. The upregulated ceRNA network consisted of 6 upregulated DElncRNAs, 3 downregulated DEmiRNAs and 5 upregulated DEmRNAs, and the downregulated network included 4 downregulated DElncRNAs, 12 upregulated DEmiRNAs and 67 downregulated DEmRNAs. A total of 20 DEmRNAs involved in the ceRNA networks were found to be closely associated with recurrence by the Kaplan-Meier analysis and log-rank test using the gene expression profiles and survival information from the TCGA database, among which four upregulated DEmRNAs were risk factors and sixteen downregulated DEmRNAs were protective factors. Combining the expression levels with survival analysis, however, they all may play harmful roles in tumorigenesis and progression. Based on the 20 recurrence-related DEmRNAs, we adopted the LASSO-penalized regression method to successfully establish a four-gene signature (ADH4, DNASE1L3, HGFAC and MELK), which was assessed by time-dependent ROC curve analysis and presented a clear relationship with RFS. The AUCs of the prognostic signature for HCC in the TCGA cohort were 0.812, 0.751, 0.751 and 0.779 for 1-year, 2-year, 3-year, and 5-year RFS, respectively. The AUCs in the GSE76427 validation cohort were 0.710, 0.700, 0.700 and 0.677 for 1-year, 2-year, 3-year, and 5-year RFS, respectively. The AUCs in the TFAHCQMU validation cohort were 0.887, 0.854, 0.854 and 0.936 for 1-year, 2-year, 3-year, and 5-year RFS, respectively. Univariate and multivariate Cox analyses also proved that the RS system was a significant independent predictor for the RFS of patients with HCC. Therefore, the present study screened several recurrence-related mRNAs and developed a prognostic signature with datasets from the TCGA database that was further verified by using two independent external validation cohorts from GSE76427 and TFAHCQMU.
Certain recurrence-related genes identified in the present study have been reported to be cancer-related genes. More importantly, all the four genes included in the prognostic signature were also reported to be associated with the prognosis of HCC. Alcohol dehydrogenase 4 (ADH4) is an important member of the ADH family that metabolizes a wide variety of substrates, including ethanol and retinol. Wei RR et al. [
33] found that the expression of ADH4 at both the mRNA and protein levels was markedly reduced in HCC tumor tissues. Similar to that in our study, HCC patients with lower ADH4 expression had shorter survival time, and multivariate Cox analysis showed that ADH4 expression was an independent predictor of prognosis. Liu XY et al. [
34] comprehensively analyzed the prognostic implications related to ADH family genes in HCC using bioinformatic methods. As a result, they found that the expression of ADH4 was significantly downregulated in HCC tissues compared to normal tissues. Moreover, they identified ADH4 as an independent factor for the survival of HCC patients. In addition, high expression of ADH4, along with several other ADHs, was found to be significantly associated with an improved prognosis in HCC patients, and negatively regulates oncogenic signaling pathways. Luo J et al. [
35] recently reported that the expressions of key alcohol-metabolizing enzymes are repressed in alcoholic hepatitis patients and revealed a new regulationary mechanism for ADH genes that the non-canonical positive regulation of miR-148a on ADH4. In short, miR-148a promotes ADH4 expression by directly binding to the coding sequence of ADH4 and increasing the mRNA stability via an AGO1-dependent manner proved by in vitro experiments using HepG2 cells, in turn, the secondary structure of ADH4 transcript affected the target accessibility and binding of miR-148a-3p, which provides new idea for the miRNA-mediated mechanisms underlying the expressions of alcohol-metabolizing enzymes. Deoxyribonuclease 1 like 3 (DNASE1L3) expression levels were significantly downregulated in numerous types of gastrointestinal cancer, and especially in HCC. Chen QY et al. [
36] demonstrated that DNASE1L3 expression levels were frequently downregulated in tumor tissues compared with normal tissues, and were identified to be significantly associated with tumor size, thrombus formation, overall survival and disease-free survival of HCC patients. In addition, the ectopic expression of DNASE1L3 suppressed cell growth and inhibited the PI3K/AKT signaling pathway activation following C3a receptor agonist treatment. Zhang JJ et al. [
37] established a comprehensive mRNA-miRNA-lncRNA triple ceRNA network, in which all RNAs, including DNASE1L3, were significantly linked to prognosis of patients with hepatocellular carcinoma. Wang S et al. [
38] proved that DNASE1L3 is downregulated in both mRNA and protein levels in HCC tissues, compared with adjacent normal tissues. Patients with positive DNASE1L3 expression had significantly longer overall survival, compared with patients with negative expression. Moreover, Multivariate COX analysis revealed that positive DNASE1L3 expression, along with higher differentiation, is an independent prognostic factor. Hepatocyte growth factor activator (HGFAC), an activator of hepatocyte growth factor (HGF), has been previously reported to be involved in liver regeneration in response to injury and several types of cancers. Yin et al. [
39] reported that HGFAC expression at the transcriptional and translational levels was decreased in liver cancer compared with normal tissues and patients with lower HGFAC expression level suffered shorter OS time. Fukushima T et al. [
40] reviewed current knowledge regarding HGFAC-mediated proHGF activation and its roles in tissue regeneration and repair. Hepatocyte growth factor (HGF) is secreted as an inactive precursor (proHGF) and requires proteolytic activation to initiate HGF-induced signaling, while HGF activator (HGFAC) is a serum activator of proHGF and produces robust HGF activities in injured tissues. Xia et al. [
41] previously performed gene expression profile analysis on HCC samples and identified maternal embryonic leucine zipper kinase (MELK) highly overexpressed, which was correlated with early recurrence and poor overall survival. They therefore further explored the functional roles of MELK and demonstrated that silencing MELK inhibited the cell growth, invasion, stemness and tumorigenicity of HCC cells by inducing apoptosis and mitosis, suggesting that MELK is a promising molecular target for therapeutic strategies against HCC [
42]. Zhang X et al. [
43] analyzed the therapeutic effect of targeted inhibition of MELK, named OTSSP167, on Glioblastoma multiforme (GBM). As a result, they found that OTSSP167 significantly inhibited cell proliferation, colony formation, invasion, and migration of GBM cells. Furthermore, OTSSP167 effectively prolonged the survival of tumor-bearing mice and inhibited tumor cell growth in in vivo mouse models. The treatment of OTSSP167 also reduced protein kinase B (AKT) phosphorylation levels, thereby disrupting the proliferation and invasion of GBM cells. In conclusion, MELK inhibition suppresses the growth of GBM by blocking AKT signals. Targeted inhibition of MELK may thus be potentially used as a novel treatment for not only GBM, but also other diseases. However, the remaining recurrence-related genes identified in the current study need further investigation and exploration.
In contrast to previous studies, the present study has several strengths. First, we used large-scale microarray and sequencing data of HCC patients from both the GEO and TCGA databases. Second, we included paired samples from three GEO datasets to eliminate errors among different patients since each gene expression level varied substantially in different patients. Third, compared with previous studies that constructed only one mixed ceRNA network, we separately constructed two predictable ceRNA networks by comparing three RNA levels (lncRNAs, miRNAs and mRNAs) based on the competitive relationships of the ceRNA hypothesis. Fourth, we not only established a prognostic signature but also conducted independent external validations, which guaranteed that the results were reasonable and reliable. The results presented in this paper were based on sufficient samples, rigorous processes and appropriate methodology, but this study inevitably has several limitations. First, a larger group of samples and longer follow-up period should be used. Second, although the AUCs for the RS of recurrence were more than 0.7, they were still relatively low. Finally, further in-depth study on the molecular mechanisms of the identified ceRNA networks needs to be performed to verify our work.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.