Introduction
According to the 2018 global cancer statistics, there are 841,080 new liver cancer cases and more than 780 thousand deaths per year worldwide, and China accounts for nearly half of the total number of cases and deaths [
1,
2]. Approximately 70–90% of all primary liver cancers are hepatocellular carcinoma (HCC) [
3,
4].
The treatment of HCC has made encouraging progress over the past few decades and primarily consists of surgical resection, chemotherapy, molecular targeting treatment, and liver transplantation [
5]. However, surgery remains the most effective treatment; it has markedly improved the overall survival (OS) of HCC patients, although the long-term survival rate is still low. Approximately 60% of patients experience recurrence or distant metastasis within 5 years [
3]. Regarding the poor prognosis, many experts have identified several prognostic factors, including patient basic features (e.g., age and gender) and tumor-related factors (e.g., tumor grade), that can be used to predict the OS of HCC patients who have undergone surgery [
6,
7]. However, effective prognostic factors are still lacking.
Although several studies have highlighted valuable biomarkers, these studies had limitations, including their inclusion of single-center cohorts, small populations, and single molecular markers. More importantly, most studies failed to validate their findings via another independent cohort, meaning that the results could not be generalized. Thus, few biomarkers have been utilized in clinical practice.
The competing endogenous RNA (ceRNA) hypothesis describes a novel regulatory mechanism by which mRNAs and long noncoding RNAs talk to each other using microRNA response elements (MREs) as letters to form a regulatory network across the whole transcriptome, which plays a significant role in cancer research [
8,
9], such as in oral carcinoma [
10] and cholangiocarcinoma [
11]. Accordingly, there is a great need to explore the regulatory relationships between lncRNAs-miRNAs-mRNAs during HCC initiation and progression. Wang et al. identified a prognostic signature based on the expression profiles of six genes for the OS of HCC patients based on independent screening of Cox-penalized regressions [
12]. To the best of the authors’ knowledge, there is still no report of the involvement of lncRNAs in the transcriptional regulation of miRNAs and mRNAs in the field of HCC with large-scale, high-throughput sequencing data.
In our study, we obtained lncRNA, mRNA and miRNA expression profiles and constructed the ceRNA network in HCC from the TCGA database. We identified 20 DEmRNAs involved in the ceRNA network that alone predicted the OS of HCC patients, termed “OS-genes”. Importantly, we conducted an integrated analysis of OS-genes using the logistic least absolute shrinkage and selection operator (LASSO) penalized regression to generate a four-gene-based signature (PBK, CBX2, CLSPN, and CPEB3) associated with OS in HCC. Then, we validated this signature using the internal set and two external validation cohorts, analyzed it in subgroups of HCC patients, and showed that it was an independent indicator. Thus, we identified and validated a new candidate marker to predict HCC OS by classifying patients into low- and high-risk groups.
Discussion
More and more evidence demonstrates that genetic alterations and disorders in the signaling pathways are of significance in tumorigenesis and the progression of HCC, meaning that molecular markers are equally important in the prediction of HCC OS. Certainly, many molecular markers have been identified to predict HCC OS. Jin et al. found that SUOX (sulfite oxidase), as an independent prognostic factor of HCC, showed better associations with OS and TTR if combined with serum AFP in different cohorts [
21]. Tao et al. found that BTBD7 expression combined with microvessel density could better predict HCC prognosis by Cox regression analysis [
22]. However, most of the recent research has focused on single gene expression, a specific protein, lncRNAs or miRNAs. However, information is now rapidly emerging on the vital functional role of the molecular network in HCC initiation and progression, indicating that we should analyze the prognosis markers as a whole. But sometimes we have high-dimensional data. At the time, lasso regression was the selective method for improving prediction accuracy. Lasso has two important characteristics, one is feature selection: automatic selection of features, it will learn to remove features without information and precisely set the weights of these features to zero, especially for high-dimensional data. Another one is interpretability: models are easier to explain, for example, we can find the independent variables that provide the most important information in the model when we have a lot of independent variables [
23‐
26]. Li et al. identified 13 differentially expressed miRNAs in the serum of HER2 + MBC patients with distinct responses to trastuzumab using miRNA microarrays and constructed a four-miRNA signature to predict survival using a LASSO model [
27]. Backes et al. [
28] used multivariable Lasso regression to develop models to identify patients most likely to benefit from adjuvant surgery by projecting their case–control data towards the entire cohort. Transcriptome profiling revealed an integrated signature, incorporating 15 mRNAs and three lncRNAs, was a powerful predictor of early relapse and had a better OS prediction than TNM staging in colon cancer [
29].
In the present study, we conducted a comprehensive analysis of whole transcriptome resequencing data and its involvement in the prediction of HCC OS. First, we identified DEGs, including a total of 1993 differentially expressed mRNAs (DEmRNAs), 1071 differentially expressed lncRNAs and 170 DEmiRNAs. After building a ceRNA visual network, we found 39 DEmRNAs, 83 DElncRNAs and 20 DEmiRNAs. Some of them were reported to be cancer-related genes, such as CCNB1 [
30], EZH2 [
31,
32], AXIN2, [
33] and FOXF2 [
34]. We also found several significant HCC-associated lncRNAs in our ceRNA network, such as HOTAIR [
35,
36] and HOTTIP [
37]. Interestingly, we noticed that lncRNA LINC00221 interacted with 12 miRNAs. Thus, LINC00221 may serve as a key regulator. Next, we studied its specific biological functions and regulatory mechanisms in HCC. Notably, miR-137 was associated with HCC OS, and in the network, we found that its corresponding mRNA was PTGS2, a key oncogene in HCC [
38]. Its candidate corresponding lncRNAs were HOTTIP, CLLU1, and GPC6-AS1. In the future, we will conduct an in-depth study of the regulatory mechanisms underlying the miRNA137-PTGS2-lncRNA network.
Subsequently, we identified a four-gene-based signature (weighted combination of PBK, CBX2, CLSPN, and CPEB3) and effectively predicted OS in HCC patients using LASSO penalized regression. PBK (PDZ-binding kinase) phosphorylates MAPKp38 and plays a crucial role in the activation of lymphoid cells. Phosphorylated PBK interacts with TP53, leading to TP53 destabilization and decreased expression following doxorubicin-related DNA damage [
39,
40]. CBX2 (Chromobox protein homolog 2) was composed of multi-protein PRC1-like complex, which inhibited the transcriptional activities of many genes, including the HOX genes [
41]. Although CBX2 has been less-studied in cancer research, the molecular profile of CBX2 suggested that it plays an oncogenic role [
42]. CLSPN, which monitors the integrity of DNA replication forks, was essential for checkpoint-regulated cell cycle arrest in response to UV irradiation-induced DNA damage [
43]. Choi et al. reported that CLSPN positively affected the survival of cancer cells and negatively affected the metastasis model in response to radiation [
44]. CPEB3 (cytoplasmic polyadenylation element-binding protein 3) contains an intron-encoded self-cleaving ribozyme that is structurally and biochemically associated with human HDV ribozymes, regulating its own translation [
45]. CPEB3 suppresses Stat5b-dependent EGFR gene transcription in neurons [
46]. All four genes may serve as key regulatory genes for cell behaviors and functions, but their abstract functions have not yet been elucidated in HCC. In the future, we will conduct an in-depth study of the regulatory mechanisms for four genes (PBK, CBX2, CLSPN, and CPEB3) based on their ceRNA network clarified in present study.
Although we constructed an OS-related predictive model based on OS-related data, we surprisingly found that the model may also serve as a tool to forecast disease-free survival (DFS) to some extent (data are not shown), low score represents a long DFS, while high-score means that patient may suffer a poor DFS, but more cohort studies are needed to confirm this.
OS for HCC is multifactorial and cannot be only determined by gene expression. HCC development is driven by the interaction of genetic predisposition, environmental factors (metabolic syndrome, alcohol, and aflatoxin B1) and viruses (HBV and HCV). Hepatocarcinogenesis is a multi-step process, and driving forces in hepatocyte transformation, HCC development and progression are chronic inflammation, DNA damage, epigenetic modifications, senescence and telomerase reactivation, chromosomal instability, and early neoangiogenesis [
47]. In the recent years, genome-wide technologies and next-generation sequencing have enabled the identification of molecular signatures to classify subgroups of HCCs and stratify patients according to prognosis. Unraveling the patterns of genomic alterations in HCCs is pivotal towards identifying targeted therapies [
48,
49]. We tried to build a model based on genomic alterations which was associated with OS, and help us better formulate individual treatment and follow-up management strategies which meet the requirements of precision medicine to a certain extent. We could imagine two HCC patients: X and Y. They have the same age, sex, and BCLC stage. However, both patients are stratified into same stage of disease, which is associated with specific outcomes. As has been widely acknowledged, the two patients will probably have different prognoses, but the question regarding how to quantify these prognoses remains unresolved. In our model, we tried to calculate the total scores of the signature individually based on molecular medicine. Different scores correspond to different prognosis. If the patients have a higher score, we would maintain closer follow-up and medical treatment.
Similar to our investigation, Wang et al. identified a prognostic signature based on the expression profiles of six genes for the OS of HCC patients, including SRL, TTC26, CPSF2, TAF3, C16orf46, and CSN1S1, based on independent screening of Cox-penalized regressions [
12]. Compared with previous studies, our study has several strengths. First, we used large-scale, high-throughput sequencing data from the TCGA database, rather than that from a single medical center, to avoid heterogeneity among different centers. Second, we established a lncRNA–miRNA–mRNA ceRNA network among the DEGs in tumor tissues and normal liver tissues. Third, we performed an in-depth screening study of DEmRNAs that were not only involved in the ceRNA network but also associated with the OS of HCC patients based on LASSO regression, in contrast to previous studies that used only one method to select prognostic markers. Fourth, we conducted internal validation and independent external validations, thus rendering the results more reliable and useful.
Survival analysis showed serum AFP, TNM stage, T stage, N stage, and M stage were found to be significantly associated with HCC OS. We further investigated various subgroups of individual clinicopathological features in HCC patients and found that they were significantly correlated with OS because of imbalances between the high-score and low-score groups with respect to clinical features. Significant correlations between signature and OS were maintained in Asians and in patients whose serum AFP ≥ 20 ng/ml. The four-gene signature was an independent prognostic factor in multivariate Cox regression and subgroup analysis, particularly for Asians patients with serum AFP ≥ 20 ng/ml.
Inevitably, our study had several limitations. First, the multivariable survival analysis contained only basic prognostic factors from the GEO database and was unable to suggest other possible clinical factors, such as status of the metastatic lesions and performance status of patients. Second, as we know, extensive evidence indicates that HCC is an extremely heterogeneous tumor at the genetic and molecular level, limited by the data of the study, all genes’ expression from TCGA, GEO, and SYMH cohorts were detected in a piece of HCC tissue from one patient. In the future, we will detect the expression of the four genes by single-cell whole-genome sequencing or quantitative RT-PCR analysis in several pieces of HCC specimens from one patient, so that we can know whether the four-gene signature is a reliable and workable OS prediction marker for HCC. In addition, we will seek for cooperation with other hospital to obtain more patients and tissues for the gene model validation.
Acknowledgements
Yongcong Yan: conceptualization, formal analysis, investigation, visualization, and writing-original draft. Yingjuan Lu: formal analysis, investigation, visualization, and writing-original draft. Kai Mao: investigation, visualization, and writing-original draft. Mengyu Zhang: investigation, methodology, and writing-review and editing. Haohan Liu: investigation and visualization. Qianlei Zhou: software and visualization. Jianhong Lin: data curation and methodology. Jianlong Zhang: methodology, resources, and supervision. Jie Wang: conceptualization, data curation, funding acquisition, investigation, methodology, project administration, resources, supervision, and writing-review and editing. Zhiyu Xiao: conceptualization, data curation, funding acquisition, investigation, methodology, project administration, resources, supervision, and writing-review and editing.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.