Background
Lung cancer is one of the most common malignant tumours in the world and has the greatest morbidity among all cancers. Lung cancer has become the leading cause of death from malignant tumours in China's urban population [
1]. Most cases of lung cancer are non-small-cell lung cancer (NSCLC) [
2]. NSCLCs account for approximately 80% of lung cancers, of which approximately 30% are LUSCs [
3,
4]. Although many effective therapies have been applied, including surgery, chemotherapy, radiotherapy and targeted therapy, the prognosis of LUSC patients remains poor [
5]. It is estimated that more than 60% of clinical stage I and II LUSC patients die 5 years after surgery due to relapse. Furthermore, approximately 75% of the patients have stage III or stage IV disease at diagnosis, and only 5% of these patients survive 5 years after surgery [
6]. Chemotherapy with platinum therapy are currently used as basic treatments for patients with LUSC, but chemoresistance is a major obstacle leading to clinical failure [
6]. Thus, it is necessary to identify novel molecular indicators in LUSC to calculate survival and identify chemoresistance in LUSC patients.
DNA damage develops in various kinds of cells during life. Cells have a DNA repair mechanism to avoid the fatal effect of DNA damage [
7]. If the repair mechanism does not work properly, it leads to genome instability, cell apoptosis, cell cycle arrest, and even tumorigenesis [
8]. Many kinds of DNA repair gene mutations exist in lung squamous cell cancer [
9‐
11]. DNA damage repair is implicated not only in regulating the development of LUSC but also in resistance to chemoradiotherapy [
12]. For instance, Ji W et al. evaluated the sensitivity of
BRCA1- and
BRCA2- deficient NSCLC cells to PARP inhibitors. However, few studies have concentrated on the relationships between DNA damage repair genes and the outcomes of LUSC patients.
In the present study, prognostic predictors were identified by performing Cox regression analysis of DNA repair genes. Risk scores were calculated based on the level of ten DNA damage repair genes related to LUSC patient prognosis. According to the expression levels of the ten genes and other clinical factors, we constructed a nomogram and model for prognosis prediction. We hope this research will identify potential molecular targets for predicting the prognosis and chemotherapy response of LUSC patients.
Method
Consensus clustering of DNA repair genes
The LUSC tissue data were clustered into k (2 to 9) groups by the ConsensuClusterPlus package in R software based on DNA repair genes. The k value was optimized according to the unsupervised clustering method, and LUSC cancer tissues showed consistent clustering. Two subgroups were obtained and verified by PCA. The survival of patients was compared by Kaplan–Meier analysis.
Acquisition of DNA damage repair genes and clinical information of LUSC patients from the TCGA dataset
The DNA damage repair genes and the clinical information of the patients from whom the LUSC samples were derived were downloaded from the TCGA database. The information can be found in additional file Table S
1. In total, 504 lung squamous cell cancer tissues were included in this study. A list including 150 DNA damage repair genes was downloaded from the hallmark gene set of the GSEA database to screen the gene expression matrix.
Screening of differentially expressed DNA damage repair genes
The expression levels of DNA damage repair genes were compared by the Wilcoxon rank-sum test between the normal and tumour groups. The screening criteria were FDR (false discovery rate) < 0.05 and log2|fold change|> 1. The results of the differential DNA repair gene analysis are presented as volcano plots, heatmaps and box.
Construction of the prognostic model
First, univariate Cox regression with the Wald χ2 test was used to establish the relationship between overall survival (OS) and DNA damage repair genes in LUSC patient tumour tissue. DNA repair genes with p values calculated by the Wald χ2 test less than 0.05 were considered statistically significant. According to the median expression level of DNA damage repair genes, the patients were divided into two groups: high and low expression groups. The overall survival of the two groups was analysed by the log-rank test, and survival curves were drawn. The multivariate Cox regression model was constructed by applying all the statistically significant variables in the univariate Cox regression. It was optimized by the AIC value in a stepwise algorithm. Then, a risk score based on the significant prognosis-related DNA damage repair genes was developed for LUSC patients: (\(\mathrm{riskscore}={h}_{0}(\mathrm{t})\mathrm{exp}({\sum }_{j=1}^{n}{\mathrm{Coef}}_{j}\times {\mathrm{X}}_{j})\), where n is the quantity of sorted genes, h0(t) is the baseline risk function, Coefj is the coefficient of each DNA repair gene, and Xj is the relative expression level of each DNA damage repair gene. The survival of LUSC patients with different risk scores was evaluated with prognostic hazard curves. Then, significant prognosis-related DNA damage repair genes were employed to construct a prognostic model with other clinical factors by multivariate Cox regression analysis. The predictive ability of the risk score and other clinical features were evaluated by time-dependent receiver operating characteristic (ROC) curve and area under the curve (AUC) analyses. The survival ROC package in R software was applied to draw the ROC curve. The AUC value, which indicates the sensitivity and specificity of the predictive indicators, varied from 0.5 to 1. The predictive ability of prognostic indicators increases with increasing AUC. The prognostic prediction model was ultimately developed into a nomogram, the calibration of which was measured with a calibration curve, and the discriminative ability was measured by C-index analysis.
External validation of the risk score
To validate the prognostic predictive value of the risk score calculated based on the prognosis-related DNA repair genes, a gene expression level data matrix of lung cancer tissues with corresponding patient clinical data was downloaded from the GEO database (GSE31210). The risk score was calculated based on the formula constructed by the TCGA database. The prognosis-predicting ability of the risk score was estimated by time-dependent ROC curve analysis. According to the median risk score, the lung cancer patients in the GSE31210 dataset were divided into two groups: a high-risk group and a low-risk group. Kaplan–Meier curves of the two groups were drawn and compared by the log-rank test. Subsequently, the prognostic value of the risk score was estimated by univariate Cox proportional hazard regression. Furthermore, multivariate Cox proportional hazard regression revealed the risk score as an independent prognostic predictor.
Immune and DNA repair genes in LUSC
Single-sample gene set enrichment analysis was performed by the "GSVA" package in R software with the method “ssGSEA” to calculate the infiltration scores of 16 types of immune cells. The infiltration scores of each tumour sample from LUSC patients in the high-risk group and low-risk group were calculated and compared by the Wilcoxon rank sum test. The immune infiltration scores of each type of immune cell and patient group were displayed as a box plot, as were the activities of 13 immune-related pathways (see additional file Table S
2) [
13,
14].
Anticancer Agent Sensitivity Analysis
The IC50 values of six kinds of anticancer agents (etoposide, imatinib, methotrexate, rapamycin, vinorelbine, and vorinostat) were analysed in each lung squamous carcinoma sample. The pRRophetic package [
15] in R software was applied to calculate the IC50 of each drug on the Genomics of Drug Sensitivity in Cancer website [
16]. The half maximal inhibitory concentrations of drugs were compared between the high groups and low groups by the Wilcoxon rank-sum test.
Real-Time Quantitative PCR
Total RNA from each specimen was purified by TRIzol (Invitrogen, USA). Then, RNA was transcribed into cDNA (complementary DNA) by the PrimeScript® RT Reagent Kit with gDNA (genomic DNA) Eraser (Takara, Japan). Real-time quantitative PCR was performed using a SYBR green master mix kit (ABI technology, USA). The QuantStudio System (Q6, Applied Biosystems, USA) was used to perform RT–qPCR. All samples were normalized to endogenous GAPDH (glyceraldehyde-3-phosphate dehydrogenase) with 2−△△Ct algorithms. GenScript company (China) provided the primers for each gene.
Immunohistochemistry
The LUSC tissue microarrays were incubated with antibodies (anti-RAE1, anti-POLR2H, anti-RAD51, anti-ZWINT and anti-RFC4) for immunohistochemical staining. The intensity and extent of staining were taken into consideration by the scoring system. Staining intensity was classified as 0 (negative), 1 (weak), 2 (moderate), or 3 (strong). The IHC score result was stratified as follows: 0 to 1, negative (-); 2 to 4, weakly positive (+ +); 5 to 8, moderately positive (+ +), and 9 to 12, strongly positive (+ + +).
Discussion
Genomic DNA damage caused by smoking or exposure to harmful chemical and physical factors is believed to be the first stage of carcinogenesis in lung cancer. It has been reported that the process of cancer development can be affected greatly by the expression level of DNA repair genes in tumour tissues, which can help sustain the stability of the cancer cell genome [
17]. A case–control study showed that lung cancer patients had a reduced DNA repair capacity (DRC) [
18]. On the other hand, another case–control study pointed out that lung cancer patients with higher DNA repair capacity had elevated chemoresistance [
19]. These previous reports found similar to our research showing that DNA repair genes may have both protective and unfavourable effects in the development of LUSC in specific patients [
20]. In light of the important role that DNA repair genes play in the origination and development of lung cancer, we performed bioinformatics analysis to identify significant prognosis-related DNA repair genes in LUSC.
Our research uncovered and evaluated the prognostic value of ten DNA repair genes (
POLD4,
MRPL40,
ITPA,
ERCC3,
TK2,
POLR3GL,
VPS28,
CANT1,
SDCBP, and
CCNO). The function of these genes in lung adenocarcinoma has been reported in previous studies.
POLD4 has an important role in genomic instability, double-stranded DNA breaks (DSBs) and lung cancer.
POLD4 decreases the intrinsically high induction of γ-H2AX, a marker of DSBs [
21]. The expression levels of
TK2 were significantly associated with prognosis in lung cancer tissues. The levels of
TK2 were higher, and the prognosis of LUSC patients was better [
22]. Higher
CANT1 expression was closely related to the TN stage. High expression levels and promoter demethylation of
CANT1 were related to worse prognosis in LUSC [
23,
24]. Other papers have also shown that
CCNO is a key protein in lung physiology, and
CCNO mutations result in lung disease [
25]. Moreover
CCNO upregulation is significantly associated with reduced overall survival in lung cancer patients [
26]. In our study, prognostic predictors were identified via Cox regression analysis based on DNA repair genes. The risk scores of each LUSC patient were calculated based on the expression levels of the ten prognosis-related DNA repair genes. Overall, the prognostic model based on these ten genes was a useful tool for predicting the prognosis of LUSC patients.
Many DNA repair-related genes have been proven to be involved in the progression of distinct kinds of cancer. Such genes have been applied as signatures for determining the prognosis of cancer. Wang et al. identified eleven genes that were able to predict the survival of patients with colon cancer [
27]. Hu et al. constructed a prognostic prediction model based on 13 DNA repair genes for lung adenocarcinoma patients [
28]. Twenty-eight DNA repair genes related to the prognosis of patients with ovarian cancer were identified, and some of them were applied to construct a prognostic model of ovarian cancer [
29]. A set of seven genes were used to predict the survival of patients with hepatocellular carcinoma [
30]. Liu et al. discovered that a nine DNA repair gene set had prominent clinical implications for prognosis evaluation and could predict the survival of patients with endometrial carcinoma. Similarly, a DNA repair gene signature was applied to establish a prognostic nomogram for predicting the biochemical recurrence-free survival of prostate cancer patients [
31]. However, the relationship between the expression level of DNA repair genes and LUSC patients remains unclear. In this study, we created a novel prognostic prediction model based on DNA repair genes for lung squamous carcinoma. Our model provides clinicians with a way to evaluate the survival of lung squamous carcinoma patients.
Chemotherapy with cisplatin is currently used as basic treatments for patients with LUSC, but chemoresistance is a major obstacle leading to clinical failure [
32,
33]. Actually, LUSC is the least sensitive to chemotherapy compared with other types of NSCLC. It is an important question how to select suitable chemotherapeutic drug for patients in order to obtain more benefit. DNA repair has been reported to be involved in the progression and chemoresistance of LUSC. In our study, prognostic predictors were identified by performing Cox regression analysis of DNA repair genes. Patients with a low-risk score may be more sensitive to etoposide, methotrexate and vinorelbine, and high-risk group patients is more sensitive to the imatinib, vorinostat and rapamycin, suggesting that different groups of patients have different sensitivity to drugs. Therefore, we hoped that we established this novel prognostic model based on DNA damage repair gene expression that can be used to predict therapeutic efficacy with LUSC patients.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.