Background
The latest statistics for GLOBOCAN 2020 showed that gastric cancer (GC) was the fifth most frequently diagnosed cancer, and the incidence is especially high in Eastern Asian countries, and it is extremely harmful with mortality rate that ranks the fourth in cancer-related death after lung cancer, colorectal cancer, and liver cancer [
1,
2]. Following the successful application of targeted therapy and immunotherapy in clinical practice, we present a novel strategy in advanced GC. However, the therapeutic efficacy did not achieve the desired improvement, which emphasizes the need to focus not only on the tumor cells themselves but also on the significance of the surrounding environment, the tumor microenvironment (TME), which comprises all nontumor cells and their noncellular components, such as the extracellular matrix (ECM) and soluble molecules [
3]. The crosstalk between tumor cells and the TME directly influences tumor cell growth and cancer progression. Among the different nontumor cells, cancer-associated fibroblasts (CAFs) deserve special attention.
Activated fibroblasts, also defined as CAFs, have long been considered to coevolve with tumor cells as the dominant component of the tumor stroma [
4]. CAFs secrete a variety of cytokines, growth factors and chemokines that form fertile soil for the growth of tumor cells; for example, CAFs secrete interleukin-6 (IL-6) or interleukin-11 (IL-11), resulting in tumor progression and the development of chemotherapeutic resistance [
5‐
7]. In turn, tumor cells secrete numerous factors such as transforming growth factor-β (TGF-β), epidermal growth factor (EGF) and C-X-C motif chemokine ligand 12 (CXCL12), which can activate and educate CAFs [
8]. Accumulating studies have confirmed that CAFs are involved in almost every aspect of tumors, including tumorigenesis, metabolism, invasion, metastasis and drug resistance, and CAFs provide an attractive therapeutic target [
9‐
11].
Notably, researchers are presently unable to achieve breakthroughs in developing viable therapies for CAFs owing to the highly dynamic heterogeneity of CAFs. Indeed, CAFs have diverse potential cellular origins, including resident fibroblasts, mesenchymal stem cells, adipocytes, epithelial cells, mesothelial cells and endothelial cells, and form various subpopulations in different tumor types [
8,
10,
12]. Additionally, CAF heterogeneity could possibly be the result of a common precursor in cells at various stages of differentiation that have adopted distinct states based on signaling cues both inside and outside the TME. Currently, α-smooth muscle actin (αSMA), fibroblast-specific protein 1 (FSP1), fibroblast activation protein (FAP), platelet-derived growth factor receptor-α (PDGFRα), PDGFRβ, discoidin domain-containing receptor 2 (DDR2), insulin-like growth factor-binding protein 7 (IGFBP7), caveolin‐1 (CAV1), CD90 (Thy1), tenascin‐C (TNC), periostin (POSTN), podoplanin (PDPN), decorin (DCN), desmin, vimentin and integrin β1 are considered activated CAF markers, and no single specific biomarker can categorize the whole CAF population or distinguish CAFs from all other cell types [
8,
10,
13,
14]. As a result, identifying CAFs is extremely difficult and poses a huge challenge for targeted treatment of CAFs.
Additionally, the exploration of the prognostic value of CAFs is also an important reference for individualized treatment, and numerous studies have attempted to validate CAFs as potential pathological indicators of tumor prognosis. In this regard, αSMA serves as a hallmark of prognostic factors. Immunohistochemical (IHC) staining analysis of hepatocellular carcinoma (HCC) patients shows a significantly shorter disease-free survival rate in patients with tumors overexpressing α-SMA [
15,
16], and the same negative correlation was shown in colorectal cancer (CRC) and breast cancer [
17,
18]. Furthermore, the differential expression signatures of specific genes in CAFs can be used as prognostic tools. In CRC research, Alexandre et al. revealed that high expression levels of the 4-gene signature identify patients with poor prognosis in the CAF cluster [
19]. Zou et al. also reported a 12-gene signature of CAFs and its high expression was significantly correlated with pathological and increased clinical events of tumor progression of HCC [
20]. However, these CAF-related signatures do not overlap, which presents the same nonspecificity concern in the application of CAFs. Therefore, further clarification of the relationship between CAFs and prognosis and its value in predicting survival will also accelerate the transition from basic CAF research to clinical application. Therefore, the exploration of new biomarkers of CAFs will be of significance.
In this study, we aimed to identify the gene signature of CAFs in GC by performing an integrated analysis of single-cell RNA sequencing (scRNA-seq) and transcriptome RNA sequencing (RNA-seq) with data from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO). Based on CAF-related genes, we constructed a risk score for prognostic prediction by LASSO, and the analysis revealed that the risk score can be an independent prognostic factor. Then, we established a nomogram model to perform quantitative scores derived from the risk score and other clinicopathological features. In addition, we aimed to identify promising small molecular drugs for gene therapy of CAF-related gene signatures in GC patients.
Materials and methods
Data acquisition
We downloaded the expression matrix of 414 GC and 36 normal gastric samples, and 387 GC samples contained overall survival (OS) data. The clinical information included age, gender, pathologic stage, grade and fraction genome altered, which were procured from the UCSC Cancer Genomics Browser. As a validation set, the GSE62254 dataset including 300 GC samples was downloaded, simultaneously containing OS information generated via the GPL570 platform. In addition, we downloaded the single-cell transcriptome expression profiles of 158,641 cells in 40 samples (29 GC samples and 11 normal samples) from GSE183904 via the GEO database.
Estimation of immune infiltration
The Microenvironment Cell Populations-counter (MCP-counter) package has been applied to study the cellular composition of the microenvironment [
21], which uses the gene expression matrix to produce the scores of immunocytes and stromal cells [
22]. Therefore, the mRNA data were translated into nontumor cell infiltration levels within the TME using the MCP-counter package of R software.
Processing of single-cell RNA-seq data
We generated a “Seurat” object based on the transcriptome sequencing data of 158,641 cells using the “Seurat” package [
23]. The top 2000 genes with highly variable features accounting for cell-to-cell differences were identified by variance analysis and subjected to data scaling and centering. These variable genes were further used for principal component analysis (PCA) with linear dimensionality reduction. The top 35 principal components (PCs) were applied for graph-based clustering (res = 0.4) to identify distinct groups of cells. The cell clusters were visualized based on the “UMAP” method of dimensionality reduction. Clusters were annotated through the “SingleR” package based on the reference gene list of 713 samples from the “HumanPrimaryCellAtlasData” function [
24].
Risk assessment model construction and evaluation
In the creation of innovative clinical prediction models, the least absolute shrinkage selection operator (LASSO) regression model is typically utilized [
25]. Based on the gene signature generated by LASSO, we calculated the risk score for each patient by applying the following formula:
$$Risk\;score\;=\;\sum_{i=1}^n\beta i\ast i$$
\(\beta i\) refers to the coefficients of each gene; \(i\) represents the expression value of the gene; and \(n\) is the number of genes selected.
Clinical value of the risk assessment model
The samples were divided into high-risk and low-risk groups by the threshold of median score, and the high- and low-risk groups were further analyzed for differential expression with human leukocyte antigens (HLA) and immune checkpoints.
A nomogram was constructed to calculate an individual’s probability of OS by using the package “rms” of R software. In the nomogram, the samples were scored according to the risk assessment model and clinical indicators. The final sum of the scores was expected to be the corresponding 1-, 3-, and 5-year survival probability. The calibration curve was drawn by comparing the predicted probability of the nomogram with the Kaplan–Meier estimate of the observed survival probability.
Gene set enrichment analysis (GSEA)
To further research the potential mechanism between diverse risk groups (median value), we performed GSEA [
26]. GSEA was performed to find enriched terms that were predicted to have a correlation with the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway in C2 (“c2.cp.kegg.v7.4.symbols”) [
27].
P < 0.01 and FDR (false discovery rate) q < 0.05 were considered to indicate statistical significance.
Identification of potential small molecule drugs
The connectivity map (CMAP) database (
http://www.broadinstitute.org) was used to predict potential drugs that may reverse or induce the biological states of GC based on the differentially expressed genes (DEGs). The DEGs were submitted to the CMAP database to search for small molecular drugs that could be used for GC treatment. The enrichment scores ranged from –1 to 1. A negative score suggested that the drug could be beneficial for GC treatment.
Validation of the gene signature expression
The human gastric mucosa epithelial cell line (GES-1) and six GC cell lines (AGS, HGC-27, MKN-45, SGC-7901, MGC-803 and BGC-823) were cultured in Dulbecco’s modified Eagle’s medium (DMEM) or Roswell Park Memorial Institute (RPMI) 1640 medium with 10% fetal bovine serum following the recommended conditions of cell culture. Total RNA was extracted by using TransZol Up, and cDNA was synthesized and mixed with primers (Supplementary Table S
1), and placed on the machine following the manufacturer’s protocols. The relative expression of the gene signature mRNA was analyzed by the 2
−ΔΔCt method with glyceraldehyde 3-phosphate dehydrogenase (GAPDH) as the internal reference gene.
Twenty pairs of GC tissues and matched adjacent normal tissues were proceeded to validate the expression of the gene signature mRNA in the same operational and statistical manner as described above.
The protein expression levels of the gene signature were compared between normal and malignant tissues with The Human Protein Atlas (HPA:
https://www.proteinatlas.org/).
Statistical analysis
All statistical analyses were performed by using R 4.0.2. The “limma” package was used to analyze the DEGs between tumor and normal samples. The “Survival” package was used to assess the association of each gene with survival. The Survival predictive accuracy of the risk assessment model was assessed based on a time-dependent ROC curve analysis, and survival rates were calculated using the Kaplan–Meier method. The significance of differences between survival curves was determined using the log-rank test. Student's t-test was used to determine the statistical significance of the differences. P values were two-tailed.
Discussion
Emerging clinical applications of targeted therapy and immunotherapy underscore the importance of the TME, the complex regulatory network of which poses great challenges to therapeutic efficacy. In light of the dominant sector in the TME and its functional heterogeneity, CAFs have gradually become an intense area of research. On the one hand, it is important for further exploration of tumor mechanisms in CAFs to develop more novel therapeutic targets, but the prediction of prognosis is also a vital part of clinical decisions. However, the specific markers and origin of CAFs remain controversial. In this study, by integrated single-cell and RNA sequencing analysis, a novel signature in GC was developed to identify feature genes of CAFs.
CAFs, as the absolute dominant component of the tumor stroma, secrete various components that participate in constituting and remodeling the ECM. We observed that the higher the abundance of fibroblasts was, the poorer the survival of patients with GC. The reason for this may be that the dense ECM forms a physical barrier that promotes tumor progression and prevents drug penetration [
28]. As with the results of our analysis, fibroblast content can be utilized to predict prognosis, which has been validated in numerous tumors [
29‐
33]. In particular, there is a highly aggressive subtype of GC with a very poor prognosis –scirrhous gastric cancer (SGC), which is characterized by rapid infiltration and proliferation of tumor cells with extensive stromal fibrosis [
34]. In this fibrotic TME of SGC, researchers explored the biological behavior by constructing SGC cell lines and mouse models [
35], gradually depicting the crosstalk between tumor cells and CAFs [
34,
36].
CAFs have been demonstrated to promote migration and EMT in GC by activating the JAK2/STAT3 signaling pathway through the secretion of IL-6 [
5], as well as activation of the ERK1/2-SP1-ZEB2 pathway via the secretion of IL-33 [
37]. Other factors induced by CAFs, such as IL-11, IL-22, IL-17a, FGF9, TGFβ1, lumican, LOXL2, SDR1 and CXCL12, are also involved in the migration and invasion of GC [
38‐
40]. Likewise, CAF-derived galectin-1 and HGF can promote angiogenesis, supporting the progression of GC [
41,
42]. Acquired drug resistance severely affects patient treatment prognosis. Numerous studies have shown that CAFs play an important role in mediating drug resistance [
43]. CAFs can regulate drug resistance via the secretion of the IL-11-mediated gp120/JAK/STAT3/Bcl2 pathway [
7], and activate the PI3K/AKT signaling pathway by generating IL-8, which causes NF-B activation and cisplatin resistance [
44]. In addition, Yang et al. found that CAFs can promote chemoresistance by mediating VEGF/NRP2 signaling via CXCL12 secretion [
45]. Emerging evidence has demonstrated that CAFs can also affect tumor progression and drug resistance by forming extracellular vesicles (EVs). Studies have shown that CD9-positive exosomes generated from CAFs can be taken up by SGC cells, which promote cancer cell migration and invasion by activating the MMP2 signaling pathway [
46]. Similarly, exosomal circ_0088300 derived from CAFs promotes GC malignancy by activating miR-1305/JAK/STAT1 [
47], and annexin A6 in CAF-EVs induces drug resistance via activation of β1 integrin-FAK-YAP [
48]. Nonetheless, CAF-derived exosomal miRNA-34 and miRNA-139 could inhibit the progression of GC [
49,
50]. Collectively, the mystery of the diverse biological functions of CAFs is gradually being unraveled, for which we will also further explore the value of their clinical application.
In this study, we performed scRNA-seq profiling to reveal the fibroblast subset and identify CAF-related marker genes. A total of 280 feature genes were obtained with the intersection of the stable CAF-related DEGs and the prognosis-related genes. By LASSO Cox regression, we successfully constructed and validated a novel 9-gene CAF-related signature to predict the prognosis of GC, and the signature was confirmed as an independent predictor of OS by univariate and multivariate Cox regression analyses. Most of these CAF-related genes are associated with tumorigenesis and cancer progression, including GLT8D1 [
51], GPX3 [
52], NRP1 [
53], PPP1R26 [
54], SERPINE1 [
55], and TMSB15A [
56]. Of these, we focused on SERPINE1, one of the genes upregulated in this gene signature, which encodes plasminogen activator inhibitor-1 (PAI-1). Studies confirm that its overexpression is involved in the progression and unfavorable outcomes in various cancers [
55]. Sakamoto et al. proved that PAI-1 from CAFs stimulated esophageal squamous cell carcinoma (ESCC) cell migration and invasion through contact with LRP1 via phosphorylation of Akt and Erk1/2 [
57]. Furthermore, CAFs induced M2 polarization in macrophages by secreting CXCL12, which in turn induced PAI-1 secretion and enhanced the malignant behavior of HCC [
58]. As a result, the gene signature we constructed can serve as a target reference for CAFs in tumor research. However, the detailed mechanisms in GC warrant further investigation.
Subsequently, we tried to discover promising small molecular drugs for gene therapy of CAF-related gene signatures in GC patients. Traditional Chinese herbal extracts have been demonstrated to be effective in slowing the progression of GC. For example triptonide, a small molecule (MW358) extracted from Tripterygium wilfordii Hook F, efficiently inhibits development and metastasis by blocking the oncogenic Notch1 and NF-B signaling pathways [
59]. Wang et al. discovered that several natural products inhibit CAF activity in a series of investigations. By rectifying aberrant microRNA expression, astragaloside IV and treponil restricted the malignancy-promoting capacity of CAFs [
60,
61]. In contract, Paeoniflorin suppressed the malignancy of CAFs by decreasing its IL-6 secretion [
62]. In this study, we screened three small molecular drugs for the treatment of CAFs. The one with the most significant p value is the one we are interested in, Genistein is a phytoestrogen and a naturally occurring chemical constituent found primarily in legumes. It has anticancer properties, and studies have shown that by targeting distinct biological processes, it can suppress the growth of various cancer cells [
63]. In regard to GC research, genistein inhibits tumor cell proliferation by suppressing cancer stem cell-like properties and inducing G2/M arrest [
64,
65], as well as improving chemotherapy sensitivity by inhibiting ERK1/2 activity [
64]. Nevertheless, the practical application of these potentially therapeutic small molecule compounds requires further exploration and validation.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.