Introduction
Pancreatic cancer (PC) has one of the worst prognoses among malignant tumors, the overall 5-year survival rate of patients with pancreatic cancer is less than 5% [
1]. Due to atypical symptoms, no sensitive early diagnostic biomarkers, and exceptional anatomical structures, only about 20% of patients at diagnosis are on the verge of being resectable by surgery [
2]. Investigators are currently pursuing a comprehensive medical treatment plan, which includes immunotherapy, targeted drugs, radiotherapy, and chemotherapy [
3]. Due to tumor heterogeneity, different patients respond differently to the same treatment regimen. This requires clinicians to adjust treatment regimens based on each patient's response during treatment. Therefore, a prognostic assessment method with good sensitivity and specificity plays an important role in the treatment of patients.
In prognostic assessment, the use of prognosis-related gene expression is better than the use of patient clinical characteristics [
4]. The development of next-generation sequencing technology and gene chip technology provides a convenient, accurate, and inexpensive way for the detection of prognosis-related genes. [
5]. A growing number of researchers tend to use next-generation sequencing or gene chips to detect prognosis-related genes, then a prognostic model is established to guide the treatment of patients [
6]. The mathematical modeling process includes the use of logistic regression, Poisson regression, Cox regression, lasso regression, and ridge regression [
7]. The combined use of these bioinformatic modeling approaches can significantly improve the specificity and sensitivity of prognostic models.
Here, we obtained differential genes (DEGs) using the TCGA-PAAD dataset, GTEx and two GEO datasets. For further analysis, we normalized and de-batched all datasets. Eight prognosis-related genes were screened in TCGA-PAAD dataset using univariate Cox regression and lasso regression. The eight genes were permuted and combined, and the AUC value of each combination was calculated separately. The optimal AUC value is then screened using the Gaussian model and validated in the validation set. Ultimately, we found five genes that were excellent in evaluating the prognosis of pancreatic cancer patients in both the training set and validation set.
Discussion
In this study, we used TCGA and GEO datasets to construct 5-mRNA signature (ANKRD22, ARNTL2, DSG3, KRT7, PRSS3) that is associated with the prognosis of pancreatic cancer patients. The superiority of 5-mRNA signature was verified using the validation dataset. Our results suggested that patients with higher risk score that calculated on 5-mRNA signature, had shorter survival time.
Ankyrin repeat domain 22 (ANKRD22), a novel mitochondrial membrane protein. Several studies show that the expression of ANKRD22 is significantly elevated in various tissues and cells. Such as macrophages of patients with an acute rejection reaction after a renal transplant [
8], peripheral blood mononuclear cells of pancreatic cancer patients [
9], basal type I basal-like breast cancer tissues [
10], non–small cell lung cancer (NSCLC) tissues [
11]. In colorectal cancer cells, ANKRD22 plays a role in promoting glycolysis and reducing ATP levels [
12]. Several studies suggest that the expression level of ANKRD22 is related to prognosis of pancreatic cancer [
13], endometrial carcinoma [
14], hepatocellular carcinoma [
15].
Aryl hydrocarbon receptor nuclear translocator like 2 (ARNTL2), which encodes a basic helix–loop–helix transcription factor, is a member of PAS (PER ARNT, SIM) superfamily. ARNTL2 plays a role in biological processes like hypoxia and circadian [
16]. Serval malignant tumors, including lung adenocarcinoma [
17], colorectal cancer [
18], breast cancer [
19], are associated with dysfunction of ARNTL2.
Desmoglein 3 (DSG3) is an adhesion protein in desmosomes and is a member of the cadherin superfamily. Recent studies identify that DSG3 is a key role in several pathways, like cell adhesion and proliferation, morphogenesis, differentiation and migration [
20,
21]. Recent evidences suggest that DSG3 might play an important role in the prognostic assessment of head and neck squamous cell carcinoma [
22], skin cutaneous melanoma [
23], and triple negative breast cancer [
24].
Integrin subunit beta 6 (ITGB6), as one of Integrins family, has an increased expression level in some biological processes like wound healing, fibrosis, and malignant tumor formation [
25]. ITGB6 regulates many basic pathways of the cell, such as ECM degradation, proliferation [
26]. ITGB6 tends to be identified as an oncogene, which is upregulated in several solid tumors, and is associated with poorer prognosis and increased invasiveness [
27].
Thyrotropin releasing hormone degrading enzyme (TRHDE), the only downregulated gene in 5-mRNA signature, the protein translated by it, has the function as extracellular inactivation of TRH (Thyrotropin releasing hormone) [
28]. However, the role in cancer has not been elucidated. Only limited studies have demonstrated its role in tumor prognosis [
29].
In clinical work, clinicians tend to use tumor TMN stage to evaluate the prognosis of pancreatic cancer patients. With the development of imaging technology, prognostic assessment combined with imaging data is also a feasible method. In any case, the current prognostic assessment method requires a high level of diagnosis capability for doctors and requires a certain amount of time for learning and training. At the beginning of the project, we wanted to find a simple, low-cost, universally adaptable way to perform prognostic assessment. Our prognostic assessment model includes 5 genes, making it easier and cheaper to test. By calculating the risk score, it is easier for clinicians to assess the prognosis of patients, to make clinical decisions and drug selection.
Since the prognosis evaluation of pancreatic cancer patients is important for the treatment of pancreatic cancer patients, many studies have focused on the role of prognosis-related genes in the prognosis evaluation of pancreatic cancer. Luo et al. identified 7-gene signature (ARNTL2, DSG3, PTPRR, ANLN, S100A14, ANKRD22, and TSPAN7) by using of TCGA, ICGC and GEO datasets. The assessment of 7-gene signature was carried out using ROC curves, which is same to our study [
13]. Wu et al. conducted a 5-gene signature (AADAC, DEF8, HIST1H1C, MET, and CHFR) which was potential molecular targets for overall surviving of resectable pancreatic cancer patients [
30]. Not only gene expression data, but DNA methylation data can also be used to evaluate the prognosis of pancreatic cancer patients, and it has achieved good results in the prognosis evaluation of pancreatic cancer patients [
31]. On the other hand, noncoding RNA expression data can also be used to assess the prognosis of pancreatic cancer patients with the same accuracy [
32]. Our study not only used lasso regression and multivariate Cox regression commonly used by other researchers, but also used a Gaussian mixed model to further screen variables. The results of our study were evaluated using the ROC curve and showed that 5-gene signature had good performance in both the training set and the validation set. The comparison of AUC values showed that our 5-gene signature was superior and comparable to previous studies and has not been reported by other researchers yet. This shows that our prognostic gene screening method is superior and provides a better model for pancreatic cancer prognosis evaluation.
However, our study also has limitations, the ultimate of which is that we did not use a larger external validation dataset for testing, all validation datasets are from public databases. In addition, we did not explore the biological functions of the five genes, which need to be verified by further in vivo and in vitro experiments, which is also the focus of our future research.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.