Introduction
Head and neck squamous cell carcinoma (HNSCC) is the most common malignancy of the head and neck, arising from the mucosal epithelium of the mouth, pharynx, and larynx. Each year, HNSCC is identified in over 870,000 new cases worldwide, killing about 440,000 people [
1]. Smoking, alcohol consumption, Human Papilloma Virus (HPV) infection and exposure to environmental pollutants are all risk factors for HNSCC [
2,
3]. Although HNSCC can be treated with surgical resection supplemented with radiotherapy or chemotherapy plus radiotherapy, HNSCC patients’ 5-year survival rate keeps low since there are few early diagnoses [
4]. Therefore, to develop new therapies and improve patient prognoses, it is essential to identify new predictive biomarkers for HNSCC.
Over the past few years, high-throughput sequencing and RNA sequencing have made it easier to develop molecular markers, which have made individualized treatment and better cancer prognosis possible [
5]. Among these, the presence of hypoxia in solid tumors is an intrinsic characteristic and the role of HAGs in cancer is receiving increasing attention [
6]. In addition, the role of hypoxia in tumor angiogenesis, cell proliferation, differentiation, and apoptosis has been established [
7,
8]. Hypoxia influences the immune microenvironment and is linked to a poor patient prognosis [
9]. In HNSCC, hypoxia induces epithelial-mesenchymal transition (EMT) which provides a powerful driver for tumor progression [
10] and enhances the proliferation, migration, and invasion of tumors [
11,
12]. Ding et al. reported immune cells infiltrated into HNSCC in a variety of risk groups [
13].
In this study, public databases were employed to evaluate the mRNA profiles and associated clinical characteristics of HNSCC patients. Univariate and multivariate Cox regression algorithms were used to screen the HAGs related to the prognosis of HNSCC from The Cancer Genome Atlas (TCGA) database. Prognostic features were then established in TCGA database and validated using Gene Expression Omnibus (GEO) datasets. Afterward, we explored potential mechanisms of prognosis by exploring the relationship between the risk model and immune status. Finally, the expression levels of core genes in HNSCC were validated by quantitative real-time polymerase chain reaction (qRT-PCR).
Methods
Data sources
The raw RNA sequence (RNA-seq) data and corresponding clinical parameters of 502 HNSCC patients and 44 adjacent normal tissues were obtained from the TCGA database on 23 May 2022. A total of 499 HNSCC patients with complete clinical information and survival data were included for further analysis. For external validation purposes, RNA-seq data and clinical parameters of HNSCC patients were obtained from HNSCC-related mRNA datasets (GSE27020, GSE41613, GSE42743, and GSE117973) in the GEO database. The clinical baseline data of all included patients are shown in Additional file
1: Table S1 (Patient baseline information table) and Additional file
2 (Details of patients in the TCGA database).
Identification and functional enrichment analysis of differentially expressed genes associated with hypoxia
A total of 200 HAGs were downloaded from the Molecular Signatures Database (
https://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_HYPOXIA.html). The same approach was used to obtain HAGs in the previously published study [
14]. The "limma" package of R software was performed to distinguish the differentially expressed HAGs between HNSCC and adjacent normal tissues [
15]. The FDR was adjusted by the Benjamini–Hochberg method. FDR < 0.05 and |logFC|> 1.0 was set as the cut-off criteria of differently expressed HAGs. To explore the biological functions of the hypoxia-associated gene signature, GO and KEGG enrichment analyses were performed by the R-package "Clusterprofiler" (version 4.0) [
16,
17].
Development and validation of prognostic features
To establish a prognostic model, the R package "glmnet" was used to perform the univariate Cox regression and LASSO analyses on the differentially expressed genes screened previously. The penalty factor λ was identified by the minimum parameters. Next, a risk prognostic model was developed by multi-factor Cox regression, and the following formula was used to calculate the risk score: Risk Score = Gene1 CoefixExpi + Gene2 CoefixExpi + …GeneN CoefixExpi (Coef: coefficients, Exp: gene expression levels). 499 HNSCC patients from the TCGA database were classified into low and high risk subgroups based on the median risk score. Subsequently, the overall survival curves of different subgroups were compared by Kaplan–Meier analysis, the overall survival at 1, 3, and 5 years was described by time-dependent receiver operating characteristic (ROC) analysis, and the area under the curve (AUC) was used to access the model’s predictive power. Finally, this prognostic gene signature was also demonstrated to have prognostic value in predicting OS in HNSCC patients using the datasets GSE27020, GSE41613, GSE42743, and GSE117973.
Prognostic value of the 8-gene prognostic model independent of other clinical characteristics
The TCGA-HNSCC samples were randomized into two groups to clarify the association between the prognostic model and various clinical characteristics, such as staging, grade, age, gender, T stage, N stage, and M stage. There were two subgroups of patients: stage I/II and III/IV subgroups, grade I/II and III/IV subgroups, age < 60 and age ≥ 60 subgroups, male and female subgroups, T0-T2 and T3/4 subgroups, N0 and N + subgroups, and M0 and M1 + Mx subgroups, respectively. To confirm the 8-gene prognostic features’ independent prognostic value, Kaplan–Meier survival analysis was performed on specific subgroups with various clinical characteristics.
Analysis of immune infiltration
Immune differences between the two groups of the TCGA database were synthesized by using computational methods for assessing immune infiltration and function, including ESTIMATE [
18], TIMER [
19], MCP-counter [
20], CIBERSORTx [
21], and single-sample gene set enrichment analysis (ssGSEA). A two-sample Wilcoxon test was applied to compare immune infiltration and immune-related functions between the high and low risk groups.
Construction and evaluation of the nomogram
Based on the TCGA HNSCC cohort, a nomogram containing characteristic risk scores and other clinical characteristics was developed to better predict HNSCC prognosis. Using univariate Cox regression analysis, clinical characteristics with significant associations with HNSCC prognosis were screened. Then, the variables screened in the previous step were included in a multivariate Cox regression analysis to search for independent predictive variables for OS in HNSCC patients and to construct the nomogram by statistically significant variables. Finally, the predictive performance of the established nomogram, risk score, and clinical characteristics was compared by using the decision curve analysis (DCA), calibration curves, ROC curves, and consistency index (C-index).
Identification of risk score-related genes and functional enrichment analysis
To better understand the biological processes of hypoxia-associated genes, the most relevant genes (Pearson |R|> 0.5,
P < 0.05) were identified in the TCGA database, and functional enrichment analysis was performed using the R-package "Clusterprofiler". Subsequently, gene set variation analysis (GSVA) enrichment analysis [
22] and Gene set enrichment analysis (GSEA) [
23] were used to screen the HNSCC cohort from TCGA for signaling pathways significantly associated with risk groupings and to screen out significant pathways according to a false discovery rate (FDR) < 0.05. Benjamini–Hochberg method was used to correct the FDR.
Validation by the quantitative real-time polymerase chain reaction
The Zhongnan Hospital of Wuhan University provided nine pairs of HNSCC and adjacent non-cancerous tissues for this study, all of which were authorized by the ethics committee. Surgically resected patients with HNSCC who did not receive chemotherapy or radiotherapy were recruited for the study. Additional file
1: Table S2 presents the baseline information of the included clinical samples. Total RNA was extracted from tissues using TriQuick Reagent (Solarbio, Beijing, China, R1100); reverse transcription was carried out using a Prime Script RT kit (TaKaRa, Dalian, China, RR037A); and quantitative PCR was performed using standard protocols from the SYBR Green PCR kit (Toyobo, Osaka, Japan, QPK-201). The primer sequences for PCR are shown in Additional file
1: Table S3. Candidate genes’ relative mRNA levels were normalized to GAPDH mRNA expression, while differences were compared using paired t-tests. The changes were calculated using the 2
−∆∆Ct method. Data were presented as the mean values ± standard error of the mean (SEM) from at least three independent experiments. To further verify the differences in eight core genes between high and low risk groups for HNSCC. Tumor samples from 16 HNSCC patients were relatively quantified using qRT-PCR. The relative expression levels of 8 genes were input into the risk model and the risk score was calculated. The patients were divided into high and low risk groups according to the median risk score. qRT-PCR results were used to further compare the gene expression differences between high and low risk groups.
Statistical analysis
The Student’s t-test and one-way ANOVA were applied to compare continuous variables, while the chi-square test was used to compare categorical variables. Using a log-rank test, Kaplan–Meier survival curves were compared. Hazard ratios (HRs) and 95% confidence intervals (CIs) for genes and clinical parameters associated with hypoxia were calculated using univariate and multivariate Cox regressions. Statistical analyses were performed using R software (v4.0.2). P < 0.05 was considered a significant level.
Discussion
HNSCC is characterized by late diagnosis, easy metastasis, relapse, and resistance to treatments. The five-year survival rate of patients is very low, which seriously endangers the health of patients [
24]. Although great progress has been made in diagnosis and treatment strategies in the past decades, the overall survival rate of HNSCC patients has not been significantly improved [
25]. Therefore, it is necessary to explore new prognostic prediction schemes to accurately assess the tumor progression and survival status of patients. Hypoxia is a common phenomenon in tumor tissues, which has a wide impact on tumor angiogenesis, proliferation, migration, and the prognosis of cancer patients [
26,
27]. So far, hypoxia-associated genes have been used as risk factors to establish prognostic risk models for a variety of tumors, including liver cancer, breast cancer, bladder cancer and so on [
28‐
30]. In HNSCC, the effect of hypoxia on tumor progression has been demonstrated. However, the exact molecular mechanism of hypoxia-associated genes in HNSCC is still unclear, and their prognostic value is far from elucidated.
In this study, 54 differentially expressed hypoxia-associated genes in HNSCC patient samples from TCGA database were screened by differential expression analysis. Eight hypoxia-associated genes (SELENBP1, CSRP2, ISG20, TGFBI, STC2, HOXB9, DTNA and HS3ST1) with high predictive value were screened by univariate Cox regression analysis, LASSO regression analysis and multivariate Cox regression analysis. Finally, a prognostic risk model for HNSCC was constructed. The prognostic value of the risk model was validated in four HNSCC datasets (GSE27020, GSE41613, GSE42743 and GSE117973) from the GEO database. Kaplan–Meier survival analysis and ROC curve analysis showed that the AUC values of 5-year survival predicted by this model were 0.672 (TCGA), 0.687 (GSE27020), 0.680 (GSE41613), 0.789 (GSE42743) and 0.736 (GSE117973), significantly better than previously reported HNSCC prognostic models (AUC = 0.607) [
31]. The prognosis prediction model we constructed contains 8 hypoxia-associated genes, which is significantly better than the 24-gene prognosis model constructed by Ding et al. [
13]. Reduction in the number of genes reduced the difficulty of detection. In addition, Ding et al. only compared the prognostic value of risk scores and clinicopathological factors as independent prognostic factors. Our study not only assessed the predictive value of risk scores and clinicopathological factors as independent prognostic factors, but also combined risk scores with independent prognostic factors such as Age and N stage to build a nomogram. Most importantly, compared to Ding et al., we used clinical samples to validate differences in predictive genes between tumors and adjacent normal tissues and between high-low risk groups.
The eight hypoxia-associated genes identified in this study have been studied in the field of tumor. As a member of the family of selenium-binding proteins, SELENBP1 is a tumor suppressor, and its low expression has been reported to contribute to a poor prognosis in the lung [
32], ovarian [
33], and colorectal cancers [
34]. CSRP2 is involved in tumor cell proliferation, migration, and invasion in breast cancer [
35], gastric cancer [
36], and lymphocytic leukemia [
37]. CSRP2 is associated with a better prognosis in oral squamous cell carcinoma [
38]. ISG20 is a 3'-5' exonuclease that can degrade viral RNA in vitro [
39]. In human gliomas, patients expressing high ISG20 had a poor prognosis, which was inconsistent with our study [
40]. The inconsistent may be due to cancer type specific that gliomas are a non-epithelial cell derived whereas HNSCC is epithelial cell malignancy. These studies revealed the tumor suppressor role of SELENBP1, CSRP2, and ISG20, whose low expression is often a risk factor for poor tumor prognosis. In this study, we found that low expression of SELENBP1, CSRP2 and ISG20 in HNSCC has predictive value for poor prognosis of HNSCC. Therefore, SELENBP1, CSRP2 and ISG20 were included as part of the influencing factors to construct a prognostic prediction model in this study.
In addition, TGFBI is a secreted extracellular matrix (ECM) protein that is induced by transforming growth factor β (TGFβ). It is reported that p-EMT-related genes including TGFBI were highly expressed in HNSCC samples compared to normal tissue, and this was linked to a poor prognosis [
41]. Hypoxia-induced EMT has been demonstrated in a variety of tumors, and whether upregulated TGFBI under hypoxia affects the prognosis of HNSCC patients through EMT requires follow-up studies. Under hypoxia conditions, the expression of glycoprotein hormone STC2 is activated [
42], which drives the growth, proliferation, and tumorigenesis of tumor cells. HOXB9 is a HOX gene involved in the regulation of several human cancers [
43]. DTNA encodes a scaffolding protein that keeps muscle cells structurally intact. Previous study have identified DTNA as a valuable diagnostic marker for colon adenocarcinoma [
44]. In the analysis results, DTNA was highly expressed in tumor tissues, and in the validation results, DTNA expression was downregulated in tumor samples; this inconsistency may be due to the selection of advanced HNSCC tissues for validation. HS3ST1 is a rate-limiting enzyme involved in the biosynthesis of heparan sulfate. Conditional deletion of HS3ST1 significantly inhibited tumor development in colorectal cancer [
45]. These studies suggest tumor promotion by TGFBI, STC2, HOXB9, DTNA, and HS3ST1. In our study, high expression of TGFBI, STC2, HOXB9, DTNA and HS3ST1 was found to be associated with poor prognosis in HNSCC. Whether these genes contribute to the progression of HNSCC by a similar mechanism requires further investigation.
It is reasonable to infer that the eight genes we identified as a whole had high prognostic value for HNSCC. Additionally, we combined characteristic risk scores and clinical staging to develop a nomogram. The nomogram calibration curves predicted the OS of HNSCC patients more accurately. Meanwhile, we used ESTIMATE, TIMER, MCP counter, CIBERSORTx, and single sample gene set enrichment analysis (ssGSEA) to assess the immune state between various risk groups. The results showed differences in immune cell scores between the high and low risk groups, suggesting that risk scores may influence the prognosis of HNSCC patients through tumor microenvironment. Finally, qRT-PCR was performed on eight genes (HOXB9, SELENBP1, DTNA, ISG20, STC2, HS3ST1, CSRP2, and TGFBI). According to risk score, HNSCC samples were grouped into high and low risk groups. STC2, TGFBI and HOXB9 were found significantly up-regulated in the high-risk group.
In recent years, several studies[46–48]have been dedicated to establishing prognostic signatures related to hypoxia in HNSCC patients using the TCGA database. However, our study employed distinct approaches, such as multi-dataset validation, multi-algorithm for immune infiltration analysis, combined clinicopathological features of patients, clinical sample validation, etc.
This study is based on database data mining and preliminary qPCR experimental validation, which still has some limitations. Firstly, the limited sample size may lead to selection bias. To make the results more stable, more clinical cases should be further included to carry out large sample studies. Secondly, this study verified the expression trend of 8 HAGs at transcript level between tumor and normal tissues, as well as between high and low risk groups based on qPCR. Whether there is a corresponding trend at the protein level needs further study. In addition, for better clinical application value, large-sample multi-center clinical trials and prospective studies are still needed to confirm the prognostic predictive value of the diagnostic model. Finally, functional and mechanism studies of the identified hypoxia-related genes in HNSCC may also be new directions.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.