Introduction
Clear cell renal cell carcinoma (ccRCC) is a major histological subtype of renal cell carcinoma (RCC), and accounts for approximately 60%-85% of RCC cases. ccRCC is characterized by epithelial cells of renal proximal convoluted tubules [
1,
2]. The early stages of ccRCC typically present as asymptomatic, with approximately 25%-30% patients exhibiting metastasis at the time of diagnosis [
3]. The relapse or distant metastasis rate for ccRCC patients after radical nephrectomy exceeds 20%. Furthermore, the resistance of ccRCC to radiotherapy and chemotherapy results in a poor prognosis [
4,
5]. Thus, improved prognosis prediction of advanced ccRCC patients will greatly assist clinicians in decision-making. Moreover, the identification of key genetic drivers for progression can aid in the development of new treatments.
Relevant prognostic factors have been observed in addition to the American Joint Committee on Cancer (AJCC) Tumor Node Metastasis (TNM) stage [
6]. It is worth noting that gene expression profiling has the potential to classify different tumor types because of the significant involvement of genes in tumor development and metastasis [
7]. The rapid development of gene sequencing technology has made Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases increasingly important in bio-informatics analysis [
8,
9]. These databases offer sequencing data for discovering new functional genes and analyzing their impact on prognosis. Thus, the analysis of the objective need for clinical variable gene combinations and nomograms can serve as an effective tool in the development of individualized patient treatment strategies.
Nomograms are established based on Cox regression analysis results and are widely used for cancer prognosis, primarily because of their ability to reduce statistical predictive models to a single numerical estimate of the probability of an event, such as death or recurrence [
10,
11]. In ccRCC patients, a nomogram combining differentiation-related gene (DRG)-based risk score and prognostic clinicopathological variables was previously constructed to provide a visual method for determining prognosis [
12]. However, another nomogram based on risk gene signature and clinical features may provide a practical method for recurrence prediction and facilitate personalized management of ccRCC patients after surgery [
13]. Thus, research on the prediction model based on tumor stage requires further investigation.
This study found the key prognostic genes affecting different stages of ccRCC. Phosphoserine aminotransferase 1 (PSAT1) is a protease of class V pyridoxal phosphate-dependent aminotransferase family. A gene mutation in PSAT1 leads to metabolic and genetic disorders such as phosphoserine aminotransferase deficiency, serine deficiency, and Neu-Laxova syndrome, wherein patients require postnatal serine and glycine supplementation for symptom alleviation. In recent years, an increasing number of investigations have shown that PSAT1 is highly associated with the occurrence, development, treatment, and prognosis of various cancers [
14‐
18]. Therefore, the objective of this study was to conduct a comprehensive bioinformatics analysis to identify the prognostic genes in patients at different stages of ccRCC and to develop a new nomogram model for predicting overall survival (OS) in patients with late-stage ccRCC based on the data from the GEO and TCGA databases.
Materials and methods
Acquisition of microarray data
The discovery phase involved the identification of datasets comparing mRNA expression in tissues of patients with late-stage (stage III+ stage IV) ccRCC with that in the tissues of patients with early-stage (stage I+ stage II) ccRCC. Gene expression profiles of GSE73731 (with 256 samples), GSE89563 (with 16 samples), and GSE150404 (with 60 samples) were obtained from the National Center for Biotechnology Information (NCBI) GEO database (
https://www.ncbi.nlm.nih.gov/geo/). The GSE73731 dataset was based on the GPL570 platform, whereas GSE89563 and GSE150404 were based on the GPL17692 platform.
Screening for integrated differentially expressed genes (DEGs) at different stages
The GEO2R tool, which relies on the R package “Limma” provided by the GEO database, was used for identifying DEGs in each dataset. The cut-off criteria for screening over-expressed DEGs were adjusted p-values < 0.05 and log2FC > 1. The significantly up-regulated genes were separately extracted.
Genes over-expressed in all datasets were identified by constructing a Venn diagram using an online tool (
http://bioinformatics.psb.ugent.be/webtools/Venn/), which depicted three lists of up-regulated genes. The expression levels of all genes and survival analysis of selected genes at different stages were verified using the Assistant for Clinical Bioinformatics (ACBI) tool (
https://www.aclbi.com). The levels of potential hub genes were determined using R software to create a heatmap.
Collection of clinical and bioinformatics data
The TCGA database was accessed on June 9, 2023, and the clinical data including tumors RNA expression data of 532 ccRCC patients were collected (
https://tcga-data.nci.nih.gov/). Clinical parameters included sex, age, race, pathologic T stage, pathologic N stage, pathologic M stage, grade, neoadjuvant therapy, vital status, and follow-up duration (days). Depending on the stage of ccRCC patients, we divided the patients into late-stage and early-stage groups with the help of the ACBI module of TCGA (
https://www.aclbi.com/static/index.html#/tcga). The RNA sequencing expression profiles and corresponding clinical information by stage-group downloaded from the TCGA dataset (
https://portal.gdc.com) were matched with the TCGA-ccRCC dataset (
https://tcga-data.nci.nih.gov/). Considering the influence of surgical factors, we excluded the data of patients whose follow-up time was less than 30 days. The median RNA expression value in the two groups was regarded as the cut-off to the RNA expression levels as high or low in each group.
Development of risk prediction model
According to the TCGA data, we developed a nomogram combining gene expression with clinical information (new model) for the prediction of 3-year and 5-year OS in individuals with different stages of ccRCC. Another nomogram only using clinical variables was developed for a head-to-head comparison with the first comprehensive model in ccRCC patients at different stages.
Patients and tissue specimens
A total of 20 pairs of ccRCC specimens were obtained from patients who underwent radical nephrectomy or partial nephrectomy at the Affiliated Hospital of Jiangnan University. None of the patients in our study received neoadjuvant chemotherapy. In all, 20 matched fresh ccRCC specimens (10 pairs of late-stage cases and 10 pairs of early-stage cases) and adjacent noncancerous renal tissues were selectively used for qRT-PCR, Western blotting, and immunohistochemical analysis. The diagnosis for each patient was confirmed by histopathological analysis. Informed consent was obtained from the patients before inclusion in the study, and the study protocol was approved by the Ethics Committee of the Affiliated Hospital of Jiangnan University.
RNA extraction and qRT-PCR assays
Total RNA was extracted by RNA-easy (RC101, Vazyme, Nanjing, CN) according to the reagent instructions. In all, 1 µg of total RNA was used for cDNA synthesis using a cDNA reverse transcription kit (R323, Vazyme, Nanjing, CN). Real-time PCR was performed in triplicates on a Bio-Rad CFX96 PCR system to detect PSAT1expression. The results were normalized to the expression of GAPDH. The primer sequences are listed below: PSAT1-F: ACAGGAGCTTGGTCAGCTAAG, PSAT1-R: CATGCACCGTCTCATTTGCG; GAPDH-F: GGAGCGAGATCCCTCCAAAAT, GAPDH-R: GGCTGTTGTCATACTTCTCATGG.
Immunohistochemistry analysis
The ccRCC tissue samples were obtained from the Affiliated Hospital of Jiangnan University according to institutional guidelines. Tissue paraffin blocks were sectioned, and stained with antibodies specific to PSAT1 (10501-1-AP: 1:400, Proteintech, Wuhan, CN), followed by scanning with a Pannoramic Scanner (3DHISTECH, Budapest, Hungary).
Western blotting
The kidney tissues were treated destructed and then lysed by boiling for 10 min in sample buffer (2% SDS, 10% glycerol, 10% β-mercaptoethanol, bromophenol blue, and Tris-HCl, pH = 6.8). The lysates were fractionated by SDS-PAGE and the isolates were transferred to PVDF membranes (Millipore, IPVH00010, NH, US). The blots were probed with specific primary antibodies followed by a secondary antibody and the membranes were then detected by ECL (Sigma, WBULS0500, MO, US). PSAT1 (10501-1-AP: 1:10000) and GAPDH (66009-1-Ig; 1:10000) antibodies were purchased from Proteintech Group (IL, US). Secondary antibodies were conjugated with HRP (Proteintech Group; SA00001-1, SA00001-2; 1:10000). Uncropped WB are shown in Fig. S
3.
Statistical analyses
Statistical analyses were conducted using SPSS version 27.0 (SPSS Inc., IBM Corp., Armonk, NY, USA) and R software for Windows, version 4.2.3. Data are presented as mean ± SD or median and range. Student’s t test was performed for normally distributed continuous variables, while Mann-Whitney U test was performed for non-normally distributed data. Chi square or Fisher’s exact test was applied to compare categorical variables.
The Cox proportional hazard regression model was used to estimate the hazard ratio and its 95% confidence interval (CI) for each potential risk factor, and data were visualized through Forest plots. The stepwise multivariate Cox regression analysis included inclusion and exclusion criteria of type I error = 0.1.
Discrimination reflects the ability of a model to distinguish events and non-events correctly, and these were validated using C-statistics. The Concordance index (C-index) is analogous to the area under the receiver operating characteristic (ROC) curve. The predictive capacity of models was summarized using ROC curves [
19]. Calibration refers to the closeness between the predicted probabilities and the actual outcomes, and this was validated using calibration plots [
20].
A two-sided p-value of < 0.05 was considered statistically significant.
Discussion
The prognosis of ccRCC patients is closely associated with tumor stage, with the later stages often indicating a poor prognosis [
21]. Treatment options for advanced stage are limited. Currently, there is a scarcity of effective therapeutic strategies for recurrent and metastatic ccRCC [
22]. Thus, the development of a new prognostic tool is crucial for identifying high-risk patients requiring additional treatment and attention. Moreover, finding a promising therapeutic target is crucial for developing anti-tumor drugs and improving the survival rate of patients with advanced ccRCC.
With the advancement of bioinformatics, an increasing number of genes have been identified as closely associated with ccRCC occurrence and development [
23,
24]. Accordingly, we focused on gene expression in different stages of ccRCC using the GEO dataset and verified our findings with ACBI-TCGA. In our study, we found three DEGs (PSAT1, PRAME, and KDELR3) between late-stage patients and early-stage patients across three mRNA arrays. The post-match data included complete variables for comparison in both stage groups. Except for the stage parameter, another principal clinical factor for ccRCC prognosis is the grade parameter [
25,
26]. We concluded that late-stage group patients also tended to have a higher disease grade. This further confirms that tumor staging and histological grading are the main parameters associated with ccRCC prognosis [
27].
In survival analysis, we found that patients in the late stage of the disease showed a poor prognosis, and PSAT1 was the only gene associated with OS in this group. However, the relative expression of the target genes may be involved in the prognosis [
23]. The ccRCC clinicopathological information downloaded from TCGA was processed via univariate and multivariate Cox regression analysis. Age, PSAT1 expression, grade, and neoadjuvant therapy were found to be significant independent prognostic factors associated with OS. Neoadjuvant therapy has been proved to be a protective factor. However, in previous studies, the benefit of neoadjuvant therapy for locally advanced ccRCC with currently available therapeutic agents has been controversial [
28]. Christopher et al. [
29] suggested that neoadjuvant treatment with pazopanib is effective in treating patients with localized ccRCC, which is similar to the findings of our study.
Various nomograms have been identified to determine the prognosis of ccRCC patients. For instance, Xia et al. [
12] constructed a prognostic nomogram based on the prognostic risk signature and clinicopathological characteristics, which exhibited high accuracy and a robust predictive performance. The study by Zhu et al. [
30] reported that combining methylation risk scores with conventional clinical covariates improved the prediction of clinical prognosis in ccRCC patients. In our study, we developed a new graphical nomogram that combines PSAT1 expression with clinicopathological data for predicting OS in ccRCC patients at different stages. By assigning values to clinical variables and PSAT1 expression for each patient, we calculated a total score that predicts the OS of late-stage patients at 3 and 5 years. The abovementioned estimates can be used for patient counseling and informed decision-making. The C-index, AUC, and calibration curve indicators from the entire TCGA set confirmed the discriminative accuracy of our nomogram, possibly making it a preferred predictive model. Moreover, the predictive power of our new model was higher than that of single clinical variable model.
Of the three hub-genes identified in our study, PSAT1 was finally included in the model to predict the survival of late-stage ccRCC patients. Previous research reported that the dysregulation of PSAT1 activity may alter glucose and glutamine utilization in serine biosynthesis, promoting tumorigenesis and chemoresistance in colorectal cancers given that PSAT1 is a metabolism-related gene [
31,
32]. Increased transcription of PSAT1, caused by promoter hypomethylation, was also linked to a poor response to tamoxifen therapy and cancer recurrence in early-stage breast cancer [
33,
34]. Furthermore, studies have shown that the up-regulation of PSAT1 promotes cell proliferation and is associated with a poor outcome in patients with non-small cell lung cancer [
35,
36]. These studies indicate a strong correlation between PSAT1 levels and tumor progression as well as prognosis.
The confirmation of the association of PSAT1 expression levels with ccRCC indicates its involvement in metabolism, development, and progression. Zhang et al. [
37] screened ccRCC-related glycolytic genes in public databases and constructed a prediction model of 13 genes including PSAT1, which could be valuable for diagnosing and predicting ccRCC. The study by Cheng et al. [
38] introduced a new gene signature, including PSAT1, to determine the ccRCC prognosis in TCGA cohorts based on amino acid metabolism-related genes. In light of the GEPIA2 analysis, some other tumors (BLCA, CESC, COAD, DLBC, GBM, LGG, LUAD, LUSC, OV, PRAD, READ, STAD, THYM, UCEC, and UCS) are high expressed of PSAT1 (Fig. S
4), which bodes well for the possibility of studying PSAT1 as a biomarker in other tumors. However, for advanced stage ccRCC patients, the role of PSAT1 remains elusive. In our research, PSAT1 was found to be highly expressed in patients with late-stage ccRCC and affected the OS of patients. Our model further demonstrated the application value of PSAT1 in accurately determining the prognosis in advanced ccRCC patients.
The results of mRNA and protein level validation may provide guidance for clinical decision making. For example, in clinical practice, when the clinical renal cell carcinoma patients after surgical treatment, the histopathology suggests that the renal cell carcinoma is clear cell carcinoma, the expression of PSAT1 can be further detected by immunohistochemistry. If PSAT1 expression is positive, the prognosis of the patients is poor, which provides a reference for clinicians for the next treatment of the patients.
To the best of our knowledge, this is the first nomogram to predict OS in patients with different stages of ccRCC by combining genetic information and clinical data. Furthermore, the significance of PSAT1 in late-stage ccRCC was confirmed in this study, providing more specific and precise insights on its role.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.