Background
Recurrence and metastasis after hepatic resection of hepatocellular carcinoma (HCC) usually contributes to poor long-term patient survival [
1]. HCC has strong invasiveness and metastasis abilities, thus enhancing its recurrence rate [
2]. Although novel therapies have been developed in recent years, the mortality of HCC patients has not decreased [
3]. The identification of potential indicators associated with recurrence and metastasis can improve treatment response in clinical trials and the quality of life of HCC patients. Transcriptome data from very large cohorts were used in the present study to show that paraoxonase 1 (PON1) might represent a potential signature that is strongly correlated with HCC recurrence and metastasis.
Evidence has shown that PON1 is a member of the paraoxonase family, and it encodes a protein of an enzyme with lactonase and ester hydrolase activity [
4]. It is an antioxidant defensive factor that is relevant in the pathogenesis of several inflammatory diseases [
5]. Increasing evidence has demonstrated that PON1 plays a significant role in atherosclerosis as its downregulation may lead to the disease [
6,
7]. In addition, its clinical application value in cancer studies has been gradually discovered [
8]. Increasing evidence has demonstrated that PON1 could serve as a significant clinical indicator for breast cancer and lung cancer [
9,
10]. Meanwhile, several studies have also found that serum PON1 is highly fucosylated (Fuc-PON1) and could be utilized as a novel diagnostic biomarker of early-stage HCC [
11]. However, the number of studies on PON1 in HCC is limited, and its latent prognostic value and potential in clinical applications, especially its correlation with metastasis, recurrence, overall survival (OS) and other clinical risk factors, has not been observed.
In the present study, we performed a bioinformatics analysis based on sequencing data of 1432 patients’ tissues to find reliable prognostic signatures. A weighted correlation network analysis (WGCNA) was utilized using The Cancer Genome Atlas (TCGA) for a blind selection in our research. Latent prognostic indicator was selected for further analysis. Expression difference of selected indicator in tumor tissue and nontumor tissue was validated in 7 Gene Expression Omnibus (GEO) datasets from multicenters. We also established a 5-year predictive model that was visualized with a nomogram for predicting 5-year survival. Subsequently, we validated our results on protein levels via experimental tools based on clinical data and our own samples. Finally, we detected and discussed related biological pathways and how they are affected in HCC with PON1 downregulation.
Materials and methods
Transcriptome data of HCC patients in the present study
Data retrieved from multiple research centers were used for integrated analysis in this study, including data from The Cancer Genome Atlas (TCGA) project and Gene Expression Omnibus (GEO) databases. We systematically analyzed the expression profiles of transcriptomes from the following datasets to ensure the credibility of the current study: TCGA [n = 423, National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), USA], GSE14323 (n = 124, Virginia Commonwealth University), GSE14520 (n = 481, National Cancer Institute, Laboratory of Human Carcinogenesis, USA), GSE6764 (n = 75, Mount Sinai School of Medicine), GSE51401 (n = 64, Zhongshan Hospital affiliated with Fudan University), GSE41804 (n = 40, Kanazawa University Graduate School of Medical Sciences), GSE45436 (n = 134, National Yang-Ming University) and GSE62232 (n = 91, INSERM, UMR U-1162, Université Paris Descartes). The sequencing data used in the present study were collected on Affymetrix microarray and Illumina platforms by different researchers. Complete data on the clinicopathologic characteristics of HCC patients in TCGA were obtained from Cbioportal (
http://www.cbioportal.org/), including 7th American Joint Committee on Cancer (AJCC) staging rules, serum alpha-fetoprotein (AFP) level, disease-free time, OS time, living status and recurrence time.
Identification of differentially expressed genes
Increasing evidence has demonstrated that the expression levels of cancer-related genes are abnormally changed during the initiation of HCC. Thus, discovering these genes for further analysis is critical. Based on 373 tumor tissues and 50 adjacent nontumor tissues in TCGA datasets, we identified key differentially expressed genes (DEGs) via R/language (edgeR package, R version 3.34). Varying degrees of gene expression between tumor tissues and adjacent non-tumor tissues were evaluated by log2(fold-change). Genes with abs(log2(fold-change)) > 1 (absolute value) and P < 0.05 were selected as candidate signatures for further analysis.
Weighted correlation network analysis for discovering recurrence-related gene modules
To uncover the recurrence-related DEGs, we conducted a WGCNA using DEGs [
12,
13]. In general, genes with similar expression patterns are likely to exhibit coexpressed relationships and similar molecular functions. According to the connectivity of the coexpressed genes, WGCNA clustered thousands of genes into several gene modules using a soft thresholding power [
14,
15]. The relations between different gene modules and clinicopathologic characteristics of prognosis were revealed by Pearson correlation. In the present study, we investigated gene modules that were the most relevant to recurrence and metastasis. We performed the WGCNA analysis based on clinical data and sequencing data of 373 HCC samples from TCGA dataset. Samples with the duplicated TCGA barcodes and uncomplete survival data were excluded. In total, 368 HCC patients were involved in the WGCNA, and the clinical information of these patients was also available for the WGCNA. The soft thresholding power was identified using the WGCNA algorithm and sequencing data of DEGs. Then, DEGs were clustered in different gene modules using the soft thresholding power, and the recurrence-related gene module was investigated.
Indicator selection procedure
The Kaplan–Meier method was performed for every gene in the selected gene module from the WGCNA. The gene with the most significantly different log-rank p value was retained as our final selected indicator, and its expression difference between tumors and nontumors was validated using the GEO datasets. The prognostic value of our candidate indicator was also investigated. The relationship between the clinicopathologic characteristics and expression of the candidate indicator was also investigated. Student’s t-test was utilized for continuous variables, and Fisher’s exact test was used for categorical variables. In total, 369 patients with complete clinicopathologic information were evaluated. Density plots were illustrated for clear views.
Prognostic nomogram with selected indicator
Patients with HCC usually have a poor prognosis due to tumor recurrence. Thus, establishing a recurrence-related model for predicting survival is significant. We established a nomogram to predict 5-year survival. Multivariate Cox proportional hazards regression was used for the nomogram. TCGA datasets of 368 HCC patients were used [
16]. Compared with the traditional method, PON1 expression (log2 transformed, Htseq-counts) was used as a novel variable for the establishment of a nomogram. For other variables, we selected several survival-related indicators and basic information, including age, sex, AJCC staging indicators, tumor differentiation, vascular invasion, and Child–Pugh classification. Five-year survival prediction performance was examined with the C-index. To avoid the potential bias caused by tumor heterogeneity, we performed internal validation by extracting 60% of samples randomly as the validation dataset for 10 times. C-index was calculated for testing the robustness. Calibration curves were also illustrated to test the predicting accuracy. Univariate and multivariate COX proportional hazards tests were performed for conventional clinical features, and the receiver operating characteristic (ROC) curves of conventional features for predicting 5-year survival were also displayed for comparison.
Immunohistochemical staining for validation at protein level
To validate the selected indicator at protein level, we performed immunohistochemical staining in the samples of human tissues. The slides of paraffin-embedded HCC tissues from 67 patients were obtained after surgical resection. All patients received radical resection without preoperative chemotherapy or radiotherapy in the First Affiliate Hospital of Sun-yat Sen University. All diagnoses were confirmed by pathology. The slides were incubated for 2 h at 65 °C, deparaffinized, and rehydrated. Retrieval of the heat-mediated antigen was conducted in 10 mmol/L Tris-citrate buffer (pH 7.0) with a pressure cooker. Blocking of endogenous peroxidase activity was performed by incubating sections with 3% hydrogen peroxide for 10 min at room temperature. Each section was then incubated with 5% normal goat serum in phosphate-buffered saline containing 0.1% Tween 20 for 30 min at room temperature to block nonspecific binding of the primary antibody. The slides were incubated with primary antibodies (diluted 1:250) against PON1 (Abcam, ab92466) overnight at 4 °C. After washing, each slide was incubated with the appropriate horseradish peroxidase (HRP)-labeled secondary antibody and then developed with DAB solution (DAKO, Agilent) before counterstaining with hematoxylin. Staining intensity was scored as 0, 1, 2, or 3 for negative, weak, moderate, or strong, respectively, and the staining percentage was given a score of 0 (absent) for < 5% positive staining, 1 (focal) for 5% to < 50% positive staining, or 2 (diffuse) for ≥ 50% positive staining. The sum of the intensity and distribution scores was then used to determine PON1 immunoreactivity. A score of 1 or 0 was considered to show low expression, whereas higher scores were considered to indicate high expression. Two pathologists assessed the specimens independently. Images were obtained using an Olympus BX63 microscope (Olympus, Japan).
Biological pathway and internal mechanism detection
To investigate the pathways and biological parameters correlated with PON1, we conducted a gene set enrichment analysis (GSEA) [
17,
18]. The 368 HCC patients in the TCGA dataset were separated into high-expression and low-expression groups according to the median value of PON1 and retained as the phenotypes. GSEA software was obtained from GSEA website (
http://software.broadinstitute.org/gsea/index.jsp). One thousand permutations were performed to determine the statistically significant and ensure the credibility of the results. Pathways with a FDR < 0.05 and P-value < 0.05 were selected as the enriched terms. The molecular signatures database from the GSEA were utilized as the annotation file. Pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome and Pathway Interaction Database (PID) were used. A Gene Ontology enrichment analysis was also performed to discover biological functions in HCC. Correlated biological processes, molecular functions, and cellular components were identified.
Statistical software and data format
DEGs were identified using edgeR (Bioconductor/R version 3.34) [
19,
20]. Htseq-counts (TCGA) and RSEM (GEO, log2 scaled) were used as the sequencing data formats. WGCNA was performed by the WGCNA package (R version 3.3.4). Nomogram was established using Regression Modeling Strategies (RMS, R version 3.34). Student’s
t-test and Fisher’s exact test were performed in R (version 3.34).
Discussion
Although many HCC-related prognostic biomarkers have been identified in recent years, most do not exhibit tissue specificity [
28]. Thus, these biomarkers might be affected by various factors, and they lack significant value in clinical applications. We performed a systematic analysis that discovered the prognostic value of PON1 in patients with HCC. We found that PON1 downregulation in HCC suggests worse tumor differentiation, higher recurrence rate, stronger invasiveness, and poorer outcomes. Researchers in recent years have demonstrated that serum PON1 levels can assist in diagnosing AFP-negative HCC at an early stage [
29,
30]. Moreover, PON1 played an important role in the initiation of non-alcoholic fatty liver disease (NAFLD) [
31,
32]. Meanwhile, NAFLD was found to be a potential risk factor for HCC, especially in patients without hepatitis virus infection [
33,
34]. However, discoveries about PON1 in HCC were mostly based on small cohorts from a single data source in retrospective studies, which ignored diversity in terms of race, age, and hepatitis virus infection. The prognostic application of PON1 was also rarely investigated or discussed. Finding crucial cancer biomarkers from thousands of genes is usually difficult. In the present study, we introduced WGCNA algorithm for discovering recurrence-related indicators. Compared with other algorithms, WGCNA systematically combined the sequencing data and clinical information. Using of WGCNA could assist in discovering underlying clinical significance of some key genes. WGCNA also illustrated the connection of genes with similar expression patterns. Moreover, WGCNA algorithm was gradually used in analyzing the single-cell transcriptome data by researchers [
35]. Meanwhile, our research was conducted on 1432 samples from multicenter data sources, and the prognostic and predictive values were validated at both the gene expression and protein level. We also detected possible internal mechanisms and biological processes related to HCC. Therefore, we promoted other researchers’ studies and provided more credible results.
Even though the expression of PON1 is negatively correlated with tumor recurrence, metastasis, and invasion, we did not define it as a tumor suppressor. The PON1 gene encodes an enzyme that can be released from normal liver cells into the blood circulation and is of great antioxidant significance [
36,
37]. According to our pathway detection results, PON1 downregulation may not directly affect the invasiveness and metastasis of tumor cells as PON1 is located downstream of cancer-related biological processes. In our view, PON1 downregulation in HCC might be induced by losing the original function of normal liver cells in PON1 secretion. In GSEA analysis, the results indicated that several carcinogenesis-related pathways were enriched by varying degrees by PON1, including the p53 downstream pathway, gap junction, cell cycle, apoptosis, and DNA replication. Therefore, PON1 downregulation might be increased by complicated internal mechanisms related to cell division, proliferation, and migration, which could also explain why the expression of PON1 was related to tumor recurrence and clinical outcomes. However, a decrease in PON1 might still cause changes in tumor cells, especially tumor-derived inflammation, autophagy, and apoptosis [
38,
39]. However, we still need more experimental evidence to prove our conclusions from sequencing data and pathological results. In addition, further detection of these mechanisms will be very necessary.
In recent years, next-generation sequencing (NGS) use has gradually increased [
40,
41]. However, NGS in tumors is still based on tissue obtained from surgical resection, which limits the clinical application of this method. Although some novel test methods, including sequencing of tumor-derived circulating DNA and exosomes in the blood, have been discovered, their practicability still needs to be validated [
42‐
44]. Noninvasive methods, including blood tests, are still very important. Published research has shown that PON1 is an enzyme that is mainly synthesized in the liver and released into the circulatory system [
45,
46]. The expression of PON1 is mainly affected by liver cells, and the use of PON1 in a predictive model may reduce other possible interference factors. Therefore, we built a PON1-related nomogram with a Cox proportional hazards regression to investigate 5-year patient survival. In our model, PON1 gene expression was quantified and served as an indicator to promote its specificity. Compared with conventional clinical indicators, our nomogram exhibited excellent prediction accuracy and effectiveness. However, the limitation of our model is that we did not examine serum PON1 levels or use them as a variable. The model will be more useful if serum PON1 levels are tested and used as a variable.
Authors’ contributions
XH and RW designed the studies and were responsible for correspondence. ZY, QO, FC, JB, WL, JYM processed the data and created the figures. ZY and QO helped write the manuscript. ZY and QO contributed equally to the present study. All authors read and approved the final manuscript.