Introduction
Lung cancer is the leading cause of cancer-related deaths worldwide, with non-small cell lung cancer (NSCLC) accounting for about 85% of cases and lung adenocarcinoma (LUAD) being the most common subtype [
1]. Despite advancements in clinical treatments and individualized therapies, the 5-year overall survival (OS) rate for LUAD remains low at around 16% [
2]. Therefore, searching for appropriate therapeutic strategies for LUAD patients is a research hotspot.
Genomic instability, which increases the likelihood of acquiring mutations [
3], is a hallmark of most cancers, including lung cancer [
4], and is caused by factors such as smoking, air pollution, and radiation exposure. The total number of somatic mutations can be quantified by tumor mutation burden (TMB) [
5], associated with poor prognosis in certain cancers, including NSCLC [
6]. Abnormalities in transcriptional and post-transcriptional regulation are related to genomic instability, suggesting the potential of molecular markers as a quantitative measure of genomic instability [
7]. For example, Habermann et al. analyzed the gene expression profiles of 48 breast cancer specimens and identified 12 genes characterized by genomic instability [
8]. Geng et al. established a gene tag of seven genes associated with genomic instability that could predict the prognosis of patients with LUAD [
9]. The genomic instability is crucial for the occurrence and development of lung adenocarcinoma. It is necessary to explore further the effects of genes derived from genomic instability on the progression of LUAD.
Tumor microenvironment (TME) refers to the internal and external environment during tumors’ occurrence, growth, and metastasis. TME comprises immune cells, stromal cells, and various cytokines [
10], among which immune cells play a crucial role in tumor development. For instance, current studies suggest that tumor-infiltrating B lymphocytes (TIL-B) can promote anti-tumor immunity through their unique antigen-presenting mode, leading to the persistence of an immune “hot” TME involving T cells, bone marrow cells, and natural killer cells [
11]. In addition, the expression of immune checkpoints on tumor cells helps them evade host immune surveillance [
12], whereas inhibiting immune checkpoints with immune checkpoint inhibitors (ICIs) can restore immune cell function. The expression levels of PD1 and PD-L1 significantly affect LUAD patients’ response to ICI therapy [
13]. Moreover, the roles of specific chemokines and their corresponding receptors in the immune therapy response for LUAD are complex. They may affect tumor immune cell infiltration, immune regulation, tumor growth, and metastasis [
14]. However, the association between genes derived from genomic instability and the composition of TME, expression of immune checkpoints and chemokines, and ICI efficacy remains ambiguous.
In this study, we constructed a genetic signature associated with genomic instability (GSAGI), including five factors, ANLN, RHOV, KRT6A, SIGLEC6, and KLRG2. The GSAGI showed favorable prognostic results for LUAD patients. The Nomogram model created based on this was more accurate. In addition, patients in the two subgroups distinguished with GSAGI may have differences in TME due to different intercellular communication patterns. And significant differences in the composition of chemokine and immune checkpoint expression profiles may also influence the optimal treatment of LUAD patients.
Materials and methods
Data collection and preprocessing
The RNA-Sequencing, somatic mutation, and clinical data of LUAD patients were downloaded from the TCGA (
https://portal.gdc.cancer.gov) database. The Ensemble (
http://www.ensembl.org/) database was used to annotate mRNA. 499 clinical samples with mRNA expression profile data and survival data were randomly divided into a “training set” (n = 251) and a “testing set” (n = 248) by the R package “caret.“ In addition, LUAD patients with paired RNA-seq data and clinical data in GSE31210 (n = 246), GSE30219 (n = 85), GSE50081 (n = 127), GSE42127 (n = 133), and GSE41271 (n = 182) from GEO (
https://www.ncbi.nlm.nih.gov/geo/) database are independent external testing sets. The detailed information on these patients is listed in Table
S1 (Additional file 1: Table
S1). GSE126045 (n = 16) as an independent external testing set for immunotherapy prediction. The R package “DESeq2” was used to screen for differential genes.
Functional enrichment analysis
KEGG is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information [
15]. It is now one of the most utilized biological databases because of its practical values. Together with an improved annotation procedure for KEGG Orthology assignment, an increasing number of eukaryotic genomes have been included in KEGG for better representation of organisms in the taxonomic tree [
16,
17]. The R package “clusterProfiler” and “org.Hs.eg.db” perform the GO and KEGG (
www.kegg.jp/kegg/kegg1.html) pathway enrichment analysis of all candidate differential genes [
18]. The threshold for significant pathway enrichment was set to a P-value < 0.05 and visualized using R software’s “ggplot2” package.
Construction and validation of the GSAGI
First, univariate Cox regression analysis was performed on the candidate differential genes in the TCGA-LUAD training set using the R package “survival”, to screen for genes that were significantly correlated (P-value < 0.05) with survival in LUAD patients. The genes with prognostic value were then filtered by the least absolute shrinkage and selection operator (LASSO) algorithm, and the penalty parameters were adjusted by 10-fold cross-validation with the R packages “glmnet” and “survivor”. Finally, the multivariate Cox regression analysis on the genes screened by the LASSO algorithm to obtain the best candidate genes. We constructed the following risk score formula using the expression levels (expr) of the best candidate differential genes and the regression coefficients (coef) from the multivariate Cox regression analysis:\(Risk score={\sum }_{i=1}^{n}expri*coefi\)
The expri represents the expression level of the ith gene, and coefi represents the coefficient of the ith gene. Patients were divided into high-risk and low-risk groups using the median risk score of the samples in each data set as the threshold value. Survival curves were plotted by the Kaplan-Merier method. The R packages “survival” and “survminer” were used to compare the survival of patients in the high-risk and low-risk groups. P-value < 0.05 indicates significance. The predictability of the prognostic model was assessed by plotting time-dependent receptor operating characteristic (ROC) curves with R package “survROC”.
Nomogram model construction and validation
In the TCGA-LUAD set, we performed a multivariate COX regression analysis on patients’ age, gender, disease stage, smoking history, EGFR mutation status, and grouping information based on the GSAGI. P-value < 0.05 indicates statistical significance. Finally, the GSAGI and the disease staging were used to construct a Nomogram model as a quantitative analysis tool. The R package “rms” [
19] was used to produce it. Calibration curves and ROC curves validated the predictive performance of this Nomogram model.
Genomic enrichment analysis (GSEA)
To explore this gene signature’s impact on LUAD patients’ biological function, we downloaded “c5.all.v7.0.entrez.gmt” from the MSigDB database (
http://www.gsea-msigdb.org/gsea/downloads.jsp) for GSEA annotation. The R package “enrichplot” selected pathways [
20].
Predicting chemotherapy response levels
The R package “pRRophetic” [
21] inferred the sensitivity of chemotherapeutic agents in LUAD patients. The RNA expression profiles of 68 LUAD cell lines were obtained from the Broad Institute’s Cancer Cell Line Encyclopedia (CCLE,
https://portals.broadinstitute.org/ccle/) [
22]. IC50 values of LUAD cell lines to chemotherapeutic drugs were obtained from Genomics of Drug Sensitivity in Cancer (GDSC,
https://www.cancerrxgene.org/) [
23].
Cell viability assay
LUAD cells were seeded in 96-well plates at 5,000 cells per well and incubated overnight. The cells were treated with different concentrations of Etoposide (20, 40, 60, 80, 100, 120, 150 and 180µM) followed by 24-hour incubation. Next, the cells were treated with WST-8 from CCK-8 (NCM Biotech, Suzhou, China) for 0.5-1 h; then, their viability was measured by detecting the absorbance at OD 450 nm.
In a 6-well plate, 3000 LUAD cells were plated in triplicate and incubated overnight, then grown for ten days in a growth medium with Etoposide (0, 0.5, 1.0, 1.5, and 2.0µM). We then washed the cells thrice with PBS, fixed them in cold methanol for 20 min, and cleaned and stored them. Settled cell colonies were visualized by incubating the cells with 0.5% (w/v) crystal violet for 0.5 h. Extra crystal violet was removed by washing with PBS. Visible colonies formed by LUAD cell growth were identified by ImageJ version 1.8.0.112 software. Colony numbers would reflect cell survival and proliferation.
Evaluation of immune cell infiltration
The ESTIMATE algorithm calculated the immune score and tumor purity [
24]. The R package “CIBERSORT” evaluated the infiltration of 22 kinds of immune cells in LUAD patients with different risk scores [
25]. The differences in the degree of B-cell infiltration between patients in high-risk and low-risk groups were compared using the TIMER database (
http://timer.cistrome.org/) [
26].
Prediction of the response to immunotherapy
The Immunophenoscore (IPS) was obtained according to The Cancer Immunome Atlas (TCIA,
https://tcia.at/home), and the higher the IPS, the more responsive the patients were to immunotherapy [
27]. In addition, the TIDE website (
http://tide.Dfci.harvard.edu/) predicts the degree of response to immunotherapy in patients with different risk scores based on transcriptomic data from patients [
28].
Single-cell data quality control and identification of major cell types
Twenty-six LUAD samples from the GSE148071 dataset were used for single-cell level analysis. We ended up with 4593 cells for downstream analysis after removing cells with gene expression values below 200 or above 5000 and discarding cells with mitochondrial content higher than 10%. “NormalizeData” and “ScaleData” functions in the “Seurat” R package normalized the expression matrix. Then the “FindVariable” function was applied to select the first 2000 variable genes and perform principal component analysis. The first ten main components and a resolution of 0.5 were used for cell clustering by the “FindClusters” function. The “FindVariable” function generated groups of differentially expressed genes (DEGs). We manually annotated cell types for each cell cluster based on the normalized expression of DEGs in conjunction with typical markers from the CellMarker (
http://xteam.xbio.top/CellMarker/) website [
29] (Additional file 2: Table
S2). And visualization was performed by uniform manifold approximation and projection (UMAP).
Cell-cell communication analysis
The R package “CellChat” determines cellular communication between tumor cells and other cell types [
30]. The “netVisual bubble” function allows us to observe the differences in receptors and ligands of tumor cells interacting with other cells between low and high-grouped patients.
Cell lines and RNA extraction and real-time quantitative PCR (qRT-PCR)
The human lung adenocarcinoma cell lines (A549, PC-9, NCI-H1299, and NCI-H1975) and the human bronchial epithelial cell line BEAS-2B were gifted by Zhiyou Fang’s group at the Center for Basic Medicine, Institute of Health and Medical Technology, Hefei Institute of Material Science, Chinese Academy of Sciences. In this study, all cell lines were cultured in RPMI-1640 containing 1% (100×) streptomycin/penicillin and 10% FBS. The culture environment was humid, with a temperature of 37 °C and 5% CO2. We used an RNA preparation kit (TransGen Biotech, 220 Beijing) to extract total RNA from the cell lines. The reverse transcription reaction system consisted of total RNA, 2 µg; Anchored Oligo(dT)18, 1 µL; 2*TS Reactiob Mix, 1 µL; TransScript RT/RI Enzyme Mix, 10 µL; and RNase-free Water, added to the total system for a total of 20 µL. The cDNA was prepared by placing the reverse transcription system in HiScript II Q RT SuperMix for qPCR (+ gDNA wiper) (Vazyme Biotech, Nanjing, NJ), after 42 °C, 5 min; 85 °C, 5 s. ChamQ Universal SYBR qPCR Master Mix (Vazyme Biotech, Nanjing) was used for quantitative RT-PCR analysis. RT-PCR analysis was performed in 3 replicates using a X 960 Real-time PCR (Heal Force). The three-step amplification procedure we used was as follows: The first step, denaturation at 94 ℃ for 30 s; The second step (40 cycles): denaturation at 95 ℃ for 10 s, annealing at 55–60 ℃ for 20 s, and extension at 72 ℃ for 20 s. The third step: terminate the extension at 72 ℃ for 20 s. Finally, the melting curve was output. Relative mRNA expression was calculated using 2
−ΔΔCT software. β-actin was used as an internal control gene. Primers used in the study (Additional file 3: Table
S3) were purchased from Sangong Bioengineering Co Ltd (Shanghai, China).
Tissue samples and immunohistochemistry (IHC)
Tissue sections from six LUAD patients were collected at the Hefei Cancer Hospital of the Chinese Academy of Sciences, and performed IHC staining on normal and tumor samples. Tissue sections were deparaffinized and rehydrated through graded ethanol. Citrate buffer was used for antigen repair. 3% H2O2 was used to block endogenous peroxidase activity. 5% bovine serum albumin (BSA) was used to block non-specific binding. The sections were then incubated with primary antibodies against the protein of interest overnight at 4 °C. The primary antibody was detected with a secondary antibody conjugated with horseradish peroxidase (HRP) and visualized using diaminobenzidine (DAB) as a substrate. The sections were counterstained with hematoxylin and mounted with coverslips. The primary antibodies we used were: mouse monoclonal anti-human ANLN antibody (1:200; Santa Cruz, sc-271,814) mouse monoclonal anti-human RHOV antibody (1:200; Santa Cruz, sc-515,072), and rabbit monoclonal anti-human KRT6A antibody (1:200; Proteintech, 10590-1-AP). To quantify the positive staining of immunohistochemical (IHC) slides, we used ImageJ software [
31].
Statistical analysis and visualization
R version 4.1.2, GraphPad Prism 9.0, GraphPad Prism version 9.0, and ImageJ version 1.8.0.112 software were used for statistical analysis and visualization. |Log2foldchange (FC)| ≥ 1 and adjusted P-value < 0.05 as the threshold for differentially expressed genes. P-value < 0.05 was considered statistically significant. Kaplan-Meier curves were used to determine differences in survival between patient groups, and Log-rank tests were used to calculate statistical significance. Non-normally distributed continuous variables were analyzed by the Wilcox test method. Pearson’s correlation coefficient is used to explore the correlation between two continuous variables. A one-way ANOVA analysis was utilized to evaluate the differences between more than two groups.
Discussion
The development of LUAD is highly complex and closely related to the abnormal expression of specific genes. In the past few decades, many therapeutic targets and predictive biomarkers have been identified due to the continuous development of high-throughput sequencing technologies [
42]. LUAD is a type of tumor with high genomic instability, and genomic instability as the basis of cancer characteristics can accelerate the acquisition of genetic diversity and promote the formation of various cancer characteristics. Genomic instability is crucial for the progression and recurrence of cancer and is associated with poor prognosis, metastasis, and treatment resistance [
43]. Therefore, an in-depth understanding of the molecular mechanisms that affect the genomic instability of LUAD can provide more accurate biomarkers for diagnosing and treating tumors. However, there are no reliable biomarkers to detect the association between genes related to genomic instability and the tumor microenvironment and immune features of LUAD patients.
In this study, we used the FDA (Food and Drug Administration)’s TMB > = 10 criteria for high TMB. The 98 LUAD samples with TMB > = 10 from the TCGA database were used as the high TMB group, and the 98 samples corresponding to the lowest TMB values were used as the low TMB group. We identified 347 genes that are associated with the occurrence of lung adenocarcinoma (LUAD) and genomic instability. Based on this, we established a GSAGI consisting of five genes that can more accurately predict the prognosis of LUAD patients. We found that patients in the high-risk group had higher levels of genomic instability. This is consistent with the conclusion of Owada-Ozaki et al. that NSCLC patients with lower TMB levels may have a better prognosis [
8]. Further, we found that patients in the high-risk group were more suitable for chemotherapy, and the abnormal activation of microtubules, microfilaments, and other pathways in patients may cause better chemotherapy results. Moreover, the predicted results from the TIDE website suggest that LUAD patients in the low-risk group are better suited to receive immunotherapy. This contradicts the previous view that immunotherapy efficacy is better in LUAD patients with higher TMB levels [
44]. Surprisingly, a study by Nie W et al. in 2020 showed that NSCLC patients with low TMB may because of significantly higher levels of Th1 and Th17 cells more suitable anti-PD-1/PD-L1 immunotherapy [
45].
Further studies found that LUAD patients in the low-risk group had higher levels of immune cell infiltration, especially B-cell infiltration. The significantly higher levels of immune cell infiltration in low-risk group may have contributed to the persistence of a “hot” tumor immune microenvironment. Thus low-risk group showed to be more suitable for ICI treatment. In addition, PD-L1 expression and TMB were not significantly correlated in most cancer subtypes, and the non-overlapping effects of PD-L1 expression and TMB on the response rate to PD-1/PD-L1 inhibitors could be widely used to classify immunosubtypes of cancers. PD-L1 expression and TMB may each provide information for the use of ICI [
46]. Previous studies have also shown that PD-L1 is a crucial indicator for ICI treatment in LUAD patients [
47]. The negative correlation between the risk score and the expression of immune checkpoints, chemokines, and their receptors may also be one of the reasons why LUAD patients in the low-risk group are more suitable for ICI treatment. Accordingly, we believe that although high TMB levels may increase the chance of the immune system recognizing and attacking tumor cells, it is not the only factor affecting patients’ immune response. Immune checkpoints, chemokines and their receptors, as well as TME are also important factors influencing the efficacy of ICI in patients with LUAD. At the same time, the GSAGI was shown to possibly better identify LUAD patients who are more suitable to receive immunotherapy.
Findings at the single-cell level suggest that this GSAGI may influence the TME of LUAD. Tumor cells tended to receive higher GSAGI scores than immune cells. CellChat analyses showed stronger interactions between tumor and B cells, T cells, and macrophage cells in low-grouped patients. The receptor-ligand mode of action between tumor cells and immune cells in LUAD patients (MDK-NCL, ANXA1-FPR1, LGALS9-CD45, LGALS9-CD44, and CXCL-CXCR et al.) may serve as targets for immunotherapy.
The high expression of ANLN, RHOV, and KRT6A is associated with significantly worse survival in LUAD patients (Additional file 4: Figure
S6A,
S6B, and
S6C). ANLN is a myosin-binding protein whose expression level and localization are regulated by the cell cycle, and it is an essential component of cytokinesis [
48]. The loss of ANLN may affect the progression of the cell cycle. ANLN has been reported to be significantly upregulated in various tumors [
49]. And genomic instability shown due to alterations in the cell cycle is one of the characteristics of many cancers [
50]. This may explain the existence of some degree of positive correlation between ANLN and TMB. RHOV is an atypical member of the Ras superfamily of small GTPases. It regulates the cell cycle, promotes cell differentiation, and affects cell adhesion and migration [
51]. It has been reported that RHOV activates the JNK/c-Jun pathway, leading to the metastasis of LUAD. Similarly, The effect of RHOV on the cell cycle may partially influence genomic instability in LUAD patients, resulting in a weak positive correlation between RHOV expression and TMB. KRT6A, a member of the keratin family, has been shown to influence the epithelial-mesenchymal transition [
52], and its overexpression promotes the proliferation and invasion of NSCLC cells. Epithelial cells acquire a mesenchymal phenotype during epithelial-mesenchymal transition, and recent studies have linked epithelial-mesenchymal transition to many cellular functions including genomic instability, cancer cell drug resistance, and metabolic adaptations [
53]. Moreover, a study by Chantapet et al. found that fragment 19 of another member of the keratin family, KRT19 (CYFRA 21 − 1), can serve as a serum biomarker for diagnosing NSCLC [
54]. Therefore, it is necessary to explore further whether KRT6A can be used as a serum biomarker for diagnosing NSCLC. According to our research, SIGLEC6 is a protective factor for LUAD (Additional file 4: Figure
S6D). Current research indicates that SIGLEC6 is expressed explicitly in B cells, monocytes, and placental trophoblasts [
55]. SIGLEC6 belongs to the sialic acid-binding immunoglobulin-like lectin family, a family of immune regulatory receptors [
56]. The interaction between sialylated glycans and SIGLEC6 can modulate immune cell function during tumorigenesis, resulting in an immunosuppressive tumor microenvironment. Another gene in the GSAGI, KLRG2, also showed a favorable impact on the prognosis of LUAD patients (Additional file 4: Figure
S6E). KLRG2 plays a vital role in carbohydrate recognition and binding [
57]. KLRG2 contains a C-type lectin/C-type lectin-like domain (CTL/CTLD), and its receptor is expressed on various immune cells. In addition, this domain is involved in cell adhesion, migration, pathogen recognition, and intercellular signaling. We evaluated the expression of KLRG2 using the Tumor Immune Single-cell Hub 2 (TISCH2) database, which focuses on the tumor microenvironment. In an NSCLC dataset (GSE99254) on the GEO platform, KLRG2 was mainly expressed on Mono/Macro cells (Additional file 4: Figure
S6F). The results of CIBERSORT analysis showed significantly higher levels of M2 macrophages, resting mast cells, and monocyte infiltration in the high KLRG2 expression group of LUAD patients (Additional file 4: Figure
S6G). These findings suggest that KLRG2 may significantly affect regulating the tumor immune microenvironment. However, the two current protective factors in GSAGI, SIGLEC6, and KLRG2, have not been studied for genomic instability. The negative correlation between SIGLEC6 and TMB levels and a certain degree of positive correlation between KLRG2 and TMB levels found in our study necessitates further confirmation. The association between SIGLEC6 and KLRG2 and genomic instability necessitates in-depth exploration.
In summary, the GSAGI identified in this study provides a direction for prognosis prediction and individualized treatment of LUAD patients. Although these studies have revealed significant findings, there are also some limitations. Firstly, our analysis based on public databases may need to be more convincing. Although the GSAGI performed well in several external testing sets, prospective studies in the future are necessary to validate the feasibility of this gene signature. Secondly, although IHC validation confirmed the high expression of ANLN, RHOV, and KRT6A in LUAD patients, we need a large clinical sample to increase the reliability of the results. The specific mechanisms by which these risk factors promote cancer also need further exploration. Finally, the prediction of personalized treatment for LUAD patients with different risk stratification needs to be further confirmed. The in-depth exploration of molecular features associated with genomic instability will provide direction for diagnosing and personalized treating LUAD.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.