Introduction
Bladder cancer (BLCA), referred to as urothelial carcinoma, is one of the most common incident urological malignancies with more than 90% originating in the uroepithelium. It is estimated that more than 550,000 new cases are diagnosed and more than 200,000 deaths per year [
1]. It has become the fourth and tenth most common malignancy among men and women, respectively [
2]. There are two main types of BLCA pathologically, non-muscle-invasive bladder cancer (NMIBC) and muscle-invasive bladder cancer (MIBC). Approximately 75% of patients initially present with NMIBC while the remaining 25% are diagnosed with MIBC [
3]. The evidence-based guideline recommends radical cystectomy with pelvic lymphadenectomy as the mainstay of treatment for patients with high-risk NMIBC or MIBC [
4]. Although patients receive aggressive treatment including surgery, immunotherapy, chemotherapy, and radiotherapy, the 5-year overall survival (OS) rate remains unsatisfactory, with a median OS of approximately 14 months [
5]. Reasons for this poor prognosis include delay in diagnosis and the lack of effective therapy. But most importantly, the unsatisfactory prognosis was closely related to the aggressive and highly proliferative capacity of cancer cells, as well as the heterogeneity of disease characteristics. Therefore, there is a compelling urge to uncover the molecular mechanisms involved in tumourigenesis and thereby explore novel potential molecular biomarkers, which are essential for the early diagnosis, targeted therapy, and prognostic assessment of BLCA patients.
With the rapid development of cancer genomics in recent decades, bulk transcriptome sequencing (bulk RNA-seq) has become a major tool for transcriptomics, and more and more gene alteration has been identified as an effective treatment target for BLCA [
6]. For instance, Xie et al. found exonic circular circPTPRA could inhibit cancer progression through endogenous blocking of the recognition of IGF2BP1 to m6A-modified RNAs [
7]. In addition, Yang et al. indicated that exosome-derived circTRPS1 could modulate the intracellular reactive oxygen species balance and CD8
+ T cell exhaustion via the circTRPS1/miR141-3p/GLS1 axis in BLCA [
8]. However, in contrast to bulk RNA-seq or microarray experiments, which probe average gene expression in cell populations. Currently, single-cell RNA-seq (scRNA-seq) elucidates information about cellular transcriptomic heterogeneity, allowing us to access underlying gene expression distributions [
9,
10]. Using scRNA-seq, we can develop personalized therapeutic strategies that are potentially useful in cancer diagnosis and therapy resistance during cancer progression [
11]. Xu et al. integrative analyses of scRNA-seq and scATAC-seq revealed CXCL14 as a key regulator of lymph node metastasis in breast cancer, which improves our understanding of the mechanism of tumor metastasis [
12]. Besides, Obradovic A et al. demonstrated that HNCAF-0/3 could reduce TGFβ-dependent PD-1
+TIM-3
+ exhaustion of CD8
+ T cells, increase CD103
+NKG2A
+ resident memory phenotypes, and enhance the overall cytolytic profile of T cells [
13]. Given this advantage, numerous studies have focused on identifying potential biomarkers for BLCA by integrating bulk RNA-seq and scRNA-seq analysis, which could precisely stratify and recognize patients.
In this study, we performed systematic bioinformatics analyses using scRNA-seq and bulk RNA-seq data to construct a prognostic model of BLCA patients, with two external validation cohorts to validate its ability to stratify risk. Meanwhile, we outline the immune infiltration landscape and determine how it contributes to the development of BLCA. Moreover, we deliberated the relationship between the risk model and infiltrating immune cells to gain a better understanding of the potential molecular immunity process during the progression of BLCA. Overall, our study provides a novel insight that may benefit the clinical management of BLCA.
Materials and methods
Data sources and processing
Bulk RNA-seq data, clinical information, and SNP mutation site data of TCGA-BLCA were downloaded from the TCGA database (
https://portal.gdc.cancer.gov/), containing 19 normal tissue samples and 411 BLCA samples. Samples with incomplete survival information and clinical information were excluded to obtain a training set of 406 BLCA patients for this study. The scRNA-seq dataset GSE129845 of BLCA was downloaded from the GEO (
https://www.ncbi.nlm.nih.gov/) database, containing scRNA-seq of paracancerous tissues from 3 BLCA patients, the patients information and sequencing statistics were shown in Additional file
9: Table S1. GSE13507, containing 165 BLCA patients with complete survival information, was also downloaded as external data to validate the model feasibility. The samples were integrate using anchors method in the R package "Seurat" [
14] and core cells were obtained by filtering scRNA-seq. Ineligible cells include genes that can only be detected in 3 or fewer cells and low-quality cells with less than 200 genes detected will be excluded from subsequent analysis. Gene expression of core cells was normalized using a linear regression model, and then the top 2000 genes with highly variable characteristics were screened by ANOVA. Principal component analysis (PCA) was performed on single-cell samples, and the top 20 principal components (PC) were selected for subsequent analysis. The umap algorithm [
15] was used to perform an overall dimensionality reduction analysis on the top 20 PC pairs of samples. Using the R package "singleR" package [
16], HumanPrimaryCellAtlasData, BlueprintEncodeData, and ImmuneCellExpressionData were used as reference data for auxiliary annotation, followed by the CellMarker database [
17] and previous studies to find marker genes for manual annotation of different clusters.
Screening of core cells and functional enrichment analysis of their marker genes
The FindAllMarkers function in the Seurat package was used to find marker genes for each cluster by setting the parameters min.pct = 0.2 and only.pos = TRUE, and the Wilcoxon rank sum test was used to identify DEGs in the process of screening marker genes. Based on the significantly different marker genes for each cell type, ssGSEA [
18] scores were calculated for each cell type in the TCGA dataset (BLCA/normal) using ssGSEA, and the differences in scores between BLCA and normal samples for each cell type were analyzed by Wilcoxon, and cells with significant differences (p < 0.05) in the control and normal groups were recorded as core cells. The marker genes of core cells were enriched for GO and KEGG functions using the "clusterProfiler" [
19] in R software, respectively. To explain the molecular mechanism of BLCA progression, pseudo-temporal analysis was performed on each of the seven cells using the Monocle 2 algorithm. CellPhone DB v2.0 was used to explore the potential interactions between core cells.
Identification and functional enrichment analysis of DEGs in TCGA-BLCA
Differential analysis was performed on 19 control and 411 disease data using the limma package. p_value < 0.05 and |Log2FC|> 1 were designated as DEGs. The heatmap and volcano maps of DEGs were visualized using the ggplot2 [
20] and pheatmap packages [
21], respectively. Subsequently, the most significant enrichment pathways and biological processes of DEGs were investigated using the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses using the R software "clusterProfiler" package.
WGCNA analysis
In the training set, the genes associated with BLCA are filtered using the R package WGCNA [
22]. First, the goodSamplesGenes function of the R package "WGCNA" is used to check whether the genes of the samples need to be filtered and to select a suitable soft threshold. Then, the co-expression network was constructed by setting the minimum number of genes per gene module to 300 according to the criteria of the hybrid dynamic tree-cutting algorithm. Finally, Pearson correlation coefficients were used to analyze the association of module signature genes (ME) with BLCA.
Construction and validation of a prognostic model
The mark genes of core cells and BLCA-related genes and DEGs were taken as the intersection set, and the obtained genes were defined as candidate genes. The 406 samples in the TCGA-BLCA dataset were divided into training and validation sets in the ratio of 7:3, with 7 as the training set (285 cases) and 3 as the validation set (121 cases). Univariate Cox proportional risk regression analysis was performed on candidate genes in the training set to screen the characteristic genes associated with prognosis. Variables with p-values < 0.05 were included in the least absolute shrinkage and selection operator (LASSO) regression analysis, which was performed with the R software "glmnet" package [
23] to reduce the number of genes in the final risk model. The prognostic model was constructed according to the formula: risk score = gene exp1 × β1 + gene exp2 × β2 + … + gene expression n × βn (gene expression denotes the gene expression value and β denotes the corresponding LASSO regression coefficient). Patients' survival curves and risk maps were visualized by the R software, "survminer" and "ggrisk" packages. The ROC curves were plotted using the "survROC" package [
24] to assess the performance of risk scores in predicting OS at 1, 2, 3, 4, and 5 years in BLCA patients. In addition, the validity of the prognostic model was verified by the internal validation set and external datasets GSE13507 and GSE32548.
Analysis of subtype clinical characteristics
Samples with multiple clinicopathological characteristics were classified into the following subtypes, including age (> 60 and ≤ 60), sex, M stage, T stage, TME, stage, and OS status. Within each subtype, cancer samples were divided into two risk groups (high and low). The distribution of clinicopathological characteristics among subtypes was assessed using the Kruskal–Wallis test or the Wilcoxon rank test. To more closely understand the correlation between clinicopathological characteristics and survival, a stratified survival analysis of clinical factors was performed for high and low-risk groups.
Independent prognostic analysis
Univariate analysis was used to assess the risk model and clinical parameters (age, T, M, N stage, state, RiskScore) for each predictive value, whereas multivariate Cox analysis for OS was used to identify independent risk factors. To predict the overall survival of BLCA. Based on the independent prognostic factors screened by multivariate Cox independent prognosis, the nomogram model was drawn using the "cph" function in R to visualize this prediction model and to predict the likely 1, 3, and 5-year survival plots of patients. Calibration curves were used to verify the validity of the bar graphs.
GSEA enrichment analysis
GSEA enrichment analysis was performed using the clusterProfiler package for all genes in samples from the high and low-risk groups in TCGA to explore the differences in function and associated pathways between the high and low-risk groups. A set of 50 human cancer marker pathway genes was downloaded from the Molecular Signature Database (MSigDB) (
http://www.gsea-msigdb.org/gsea/index.jsp), and GSVA enrichment analysis was performed on all genes from samples in the high- and low-risk groups and the differences in GSVA [
25] scores between high- and low-risk samples were analyzed using the limma package.
Immune microenvironment analysis
To analyze the immune cell characteristics between different risk groups, we used the ssGSEA based on R package gsva to obtain 28 immune cell infiltration statuses for each sample in TCGA-BLCA. The correlation between risk score and immune infiltrating cells was analyzed by the Pearson correlation coefficient. In parallel, T-cell inflammatory GEP (18 inflammatory genes) associated with ICB response were introduced to assess the predictive potential of the risk score for cancer immunotherapy. We also performed GO enrichment analysis of GEPs and used Cytoscape to plot the top 4 pathways with the highest enrichment significance with gene interaction network regulatory map and PPI network. Finally, we also extracted the expression levels of four immune check loci (PD-1, PD-L1, CTLA-4, and TIGIT) in BLCA and assessed their expression differences in high and low-risk groups using the Wilcoxon test. Differences in mutations between high and low-risk groups were analyzed using the "maftools" R package [
26].
Chemotherapy drug sensitivity analysis
To further explore the potential guidance of risk scores for chemotherapy. In this study, the IC50 values of drugs were obtained in Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Therapeutics Response Portal (CTRP) using the R package oncoPredict. The correlation between drug IC50 values and risk scores was analyzed by Spearman's analysis to screen drugs. We then compared the differences in IC50 between the high and low-risk groups for drugs with absolute values of correlation greater than 0.4. The results were then visualized by plotting box plots and lollipop plots using the R language ggplot2.
Discussion
BLCA is one of the most common malignancies worldwide, and its incidence is on the rise in many countries. Despite many efforts that have been made recently toward the management of BLCA, the heterogeneous and aggressive characteristics of BLCA are still limited for prognostic assessment [
27]. Therefore, screening novel biomarkers to help develop patient-specific therapies and improve prognosis remains critical and urgent. Distinct from bulk RNA-seq focusing on the average expression level of genes in cells, scRNA-seq has emerged as a useful tool for transcriptional stratification to define the cell subpopulations and realize specific biomarkers and heterogeneity among different cell types in various cancers, including BLCA [
28]. Therefore, in this study, we conducted a comprehensive analysis of bulk RNA-seq and scRNA-seq to develop a risk model that exhibited excellent prognostic and predictive efficacy for immunotherapy response in BLCA.
First, we identified 7 core cells in the scRNA-seq profile containing 13,490 cells, namely: fibroblasts, B cells, T cells, monocyte cells, endothelial cells, smooth muscle cells, and epithelial cells, in which cellular communication was highly frequent but expression levels were generally down-regulated in tumor samples, and precisely such heterogeneity and interaction with TME that is essential in tumorigenesis and therapy resistance [
29]. The results of GO/KEGG analysis of DEGs obtained in TCGA were mainly enriched in the cell cycle and PI3K-Akt signaling pathway, and MAPK signaling pathway, which may be contributed to the proliferation and progression of BLCA. Existing studies indicated that alterations in cyclins, TP53, and Rb genes are ubiquitous in BLCA, particularly in MIBC with higher frequency, and therapy targeting against aberrant cell-cycle regulators may be beneficial in BLCA. Numerous studies have also confirmed that PI3K-Akt and MAPK signaling pathway activation play essential roles in the initiation and progression of BLCA [
30‐
32]. Moreover, we recognized the brown module composed of 2334 genes as the key module using WGCNA. 123 candidate genes were collected by taking the intersection of the above three gene sets to enhance the stability of signatures.
Next, the 3-gene prognostic model was established by univariate Cox regression analysis and LASSO algorithm, involving: MAP1B, PCOLCE2, and ELN, with the ROC curve results demonstrating that it had promising predictive efficacy for prognosis and was an independent prognostic factor for OS in BLCA. Contrary to other models [
33], our signatures were the results of integrating multiple datasets and algorithms, validated by internal and external validation sets inconsistent with the training set, and showed AUC values between 0.590–0.813, suggesting higher reliability and relevance, of which we also analyzed the relationship between the model and clinicopathological characteristics, and the results revealed that risk scores were significantly associated with patients' lymph node metastasis and tumor stage, indicating that the model is not limited to having predictive value for OS. Besides, we also observed 3 signature genes related to the tumor microenvironment. MAP1B is one of the Microtubule-associated proteins (MAPs), which is involved in cytoskeleton composition. It has been reported that MAP1B was remarkably overexpressed in BLCA tissues and positively correlated with tumor pathological tumor stage, grade, lymph node metastasis and vascular invasion, knockdown of MAP1B could reverse chemoresistance by interrupting the cell cycle [
34]. PCOLCE2 is a collagen-binding protein that functions as a pivotal component in tumor microenvironment remodeling [
35], and a previous study also demonstrated that down-regulation of PCOLCE2 expression resulted in better OS [
36]. ELN is a crucial element of the extracellular matrix that promotes breast cancer progression by enhancing the activation of matrix metalloproteinases (MMPs) but is scarcely documented in BLCA [
37]. We also considered that clinical characteristics could have an impact on the prognosis of patients, so clinical characteristics were subjected to multifactorial Cox analyses, and the findings revealed the independent influences of Stage and risk score on the OS of BLCA patients, which were further constructed as a nomogram model, with a calibration curve verifying the remarkably favorable predictive ability.
Furthermore, all samples were divided into low- and high-risk groups according to the calculated risk score, and we observed that the high-risk group was mainly enriched in immune processes and immune-related pathways, thus it was hypothesized that the risk score could be a potential predictive indicator for BLCA patients undergoing immunotherapy. We approached this from different perspectives evaluating TMB, immune infiltration, immune checkpoints, and associated inflammatory genes, and concluded that the high-risk group presented higher TMB and significantly elevated infiltrations of multiple immune cells, but significantly lower expression of inflammatory genes and immune checkpoints than the low-risk group, supporting that patients with low-risk scores are more likely to benefit from immunotherapy.
Finally, potential druggable targets and corresponding compounds for BLCA patients were identified from the GDSC and CTRP databases in light of developed prognostic models, primarily including cell cycle (staurosporine and RO-3306), PI3K/mTOR pathway (XL765, TGX-221, AZD8186, and AZD8055) and Wnt pathway inhibitors (CCT036477 and SB216763), which were compatible with the pathway enrichment results of DEGs. Owing to the potency and promiscuity of these drugs, they have not been adopted in the clinic yet, but they will become quite promising antitumor drugs in the future with technological renovation.
Taking into utmost consideration tumor heterogeneity, interactions of each cell population, immune infiltration, TMB, and clinical characteristics, the strength of this work lies in the identification and construction of a novel prognostic model capable of accurately discriminating survival outcomes and immunotherapeutic response in BLCA, and the findings acquired in this study provide direct evidence for stratified and precise treatment of BLCA patients. However, there are several inescapable limitations of this study: (1) the sample size of scRNA-seq data is relatively small; (2) the regulatory mechanisms of signature genes in BLCA remain ambiguous, and which is exactly what future work arising from this study should continue to explore.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.