Background
Head and neck squamous cell carcinoma (HNSCC), as the most recent report, appears to be an increasingly diagnosed sample of 890,000 per year and threatens human life (450,000 dead cases per year) [
1,
2]. Early HNSCCs can be resected by appropriate surgery with postoperative radiotherapy, whereas when the tumor progresses to an advanced stage, the 5-year prognosis is extremely poor and the alive percentage is lower than 50% [
3‐
5]. For patients with HNSCC, a prediction of prognosis is necessary to guide clinical treatment [
5,
6]. Recently, immunotherapy has presented cheerful results in improving living conditions and prolonging the overall survival of tumor patients [
7]. Among them, immune checkpoint inhibitor (ICI) therapy is used widely and commonly in the management of tumors by activating patients’ own immune defense system [
8,
9]. However, few patients can gain benefits from immunotherapy due to immune escape and the complex tumor immune microenvironment (TIME) [
10,
11]. For HNSCC patients, ICI therapy promotes potential therapeutic prospects and the possibility of improving prognosis; nevertheless, the individual TIME for each patient requires systematic and accurate evaluations to formulate the immunotherapeutic schedule. Therefore, it is also important and crucial to explore and develop a reliable predictive signature to assess the TIME for patients [
5,
6].
5-Methylcytosine (m
5C) methylation was considered an mRNA modification approach first reported in 1925 [
12‐
14]. As recognized by previous studies, this RNA modification is also regulated by writers, readers, and erasers and plays an important role in biological progression by influencing RNA stability, transcription efficiency, and localization [
15‐
17]. Reportedly, m
5C can affect tumor progression, prognosis, and TIME as well as resistance to immunotherapy and chemotherapy [
15‐
18]. DNMT1, as investigated by Zhang et al., could strengthen and increase the sensitivity of radiotherapeutic effects for HPV-positive HNSCC patients [
19]. Additionally, compared with normal samples, NSUN2 is more enriched in tumor lesions and can significantly influence the cell cycle [
20]. Similarly, long non-coding RNAs (lncRNAs) are crucial in affecting tumor progression, invasion and metastasis, and the TIME [
3,
21]. Therefore, this kind of lncRNA is also considered a promising biomarker and potential target for tumor diagnosis and may provide a novel strategy to guide individualized precise treatment for tumor patients. Increasing evidence-based studies have determined that m
5C can regulate related lncRNAs to participate and influence biological processes [
15‐
17,
22]. Previous studies have shown that NSUN2 can alter gene and lncRNA expression as well as enhance protein synthesis and translation [
14,
23]. It is recruited by the lncRNA forkhead box protein C2 (FOXC2)-AS1 and upregulated to lead a shorter survival time in HNSCC patients [
15,
24]. Similarly, a significantly upregulated expression of NSUN5 was also found in tumor samples [
12]; and the X-inactive specific transcript of lncRNAs can be regulated by m
5C genes [
25,
26]. It is strongly recommended that m
5C-related lncRNAs (mrlncRNAs) be regarded as potential biomarkers to predict prognosis and immune infiltration. However, more evidence-based studies are needed to clarify the detailed mechanism and relationship among m
5C, lncRNAs, and HNSCC.
Hence, in this study, we used bioinformatics analysis to establish a m5C-related lncRNA signature to predict prognosis and immune infiltration and identify tumor subtypes in HNSCC patients.
Methods
Obtaining the RNA-seq matrix and mrlncRNAs
Data about the RNA sequencing matrix of HNSCC was downloaded by screening The Cancer Genome Atlas (TCGA) database as fragments per kilobase million (FPKM) format, including 504 tumor tissue and 44 normally paracancerous tissue. Detailed data about clinicopathologic features and tumor-mutated frequency for each HNSCC patients were also extracted from the TCGA-HNSC cohort of the TCGA database. Subsequently, HNSCC patients from the entire cohort were equally and randomly separated into two groups (train set and test set) at a ratio of 1:1 for further model establishment and data analysis.
Additionally, according to previous studies [
12‐
26], we obtained 15 m
5C genes to identify their related lncRNAs, including 11 writers of
NOP2,
NSUN2,
NSUN3,
NSUN4,
NSUN5,
NSUN6,
NSUN7,
DNMT1,
TRDMT1,
DNMT3A, and
DNMT3B, 2 readers of
ALYREF and
YBX1 and 2 erasers of
TET2 and
TET3. Furthermore, a correlation analysis of Pearson test was performed to identify relevant mrlncRNAs with the criteria of |Pearson R coefficient|> 0.04 and
p value < 0.001 [
3].
Construction of a prognostic model and validation of predictive effects
Considering the expression of mrlncRNAs and overall survival (OS) data, univariate Cox (uni-Cox) hazard regression was performed to identify the survival-related mrlncRNAs based on the standard of a p value less than 0.05. Furthermore, the least absolute shrinkage and selection operator (LASSO) regression analysis was performed with tenfold cross-validation and 1000 cycles to avoid overfitting. The expression correlation between m5C genes and model mrlncRNAs was calculated by the Pearson correlation test and reflected in the heatmap with the application of the “limma” and “pheatmap” packages. Subsequently, the coefficient of each eligible lncRNA was calculated by multivariate Cox (multi-Cox) regression analysis, and patients in both the train and test cohorts were assessed and calculated with the following formula: m5C-related lncRNA risk score (MLRS) = ∑ coef (mrlncRNA)i × exp (mrlncRNA)i, where coef means coefficient and exp means expression. Based on different MLRSs, patients were then clarified as two risk groups (low-MLRS and high-MLRS groups) concerning the median of MLRSs. The expression of model mrlncRNAs between the normal and HNSCC samples was compared, and the survival analysis was displayed referring to the best optional cutoff value. Subsequently, Kaplan‒Meier (K-M) analysis was conducted to compare the survival differences between the low-MLRS and high-MLRS groups in the test, training, and entire sets, including OS, progression-free survival (PFS), disease-free survival (DFS), and disease-specific survival (DSS). The risk score and expression of prognostic model mrlncRNAs were calculated and are shown in the plots. Furthermore, areas under the curves (AUCs) of survival receiver operating characteristic (ROC) curves about 1-, 3-, and 5-year survival status for train, test, and entire sets were calculated and compared to assess the predictive effects of the MLRS assessing system.
In addition, while performing uni- and multi-Cox survival analyses to investigate and select the independent predictive factors (p value less than 0.05), a survival nomogram for predicting prognostic status was constructed based on the MLRS system and the above independent clinicopathologic indicators. A calibration plot was used to estimate the consistency between actual observations and nomogram predictions, and concordance index (C-index) was also applied to test and compared the reliability of the prediction.
The distribution of MLRSs in different clinicopathological features was compared via the Wilcoxon test. Subsequently, patients were divided into different subgroups to compare the difference of OS between the low-MLRS and high-MLRS groups in each subgroup by K-M survival analysis.
Biological function analysis
Based on the LncSEA database, we pooled the above 8 model mrlncRNA to investigate their potential influence in tumor survival and function with the
p value less than 0.05 [
27]. To further explore the related function of the risk models, the differentially expressed genes (DEGs) between the two MLRS groups were identified according to the standard of |logFC|> 0.585 and false discovery rate (FDR) less than 0.05. Protein‒protein interaction (PPI) network among the DEGs was calculated by the STRING database and subsequently re-visualized via Cytoscape version 3.6.2 software. In addition, the top 10 hub DEGs were selected with the application of cytoHubba. Furthermore, to explore the potential biological functions about these DEGs, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses were conducted via the “clusterProfiler” and “bioconductor” R packages. Furthermore, gene set enrichment analysis (GSEA) was performed to investigate the pathways enriched in the MLRS groups via the GSEA software with the assistance of a related gene set. The eligible pathways in the two MLRS groups were selected while the FDR was less than 0.05.
Exploration of the relationship of MLRS, tumor mutation burden (TMB), and stemness
The relationship between MLRS and TMB was explored with the application of the “limma” and “matftool” R packages. The Wilcoxon signed-rank test was used to compare the mutation frequencies of the top 20 genes in the low-MLRS and high-MLRS groups, and the survival analysis referring to TMB plus MLRS was also evaluated. In addition, the correlation between MLRS and stem cell-like features, including DNA stem score (DNAss) and RNA stem score (RNAss), was conducted by the use of the Spearman test.
Assessment of the tumor immune infiltrated microenvironment and clinical treatment
To further assess the TIME, immune-related analyses, including immune cell infiltration, immune function activation, TME scores, and expression of immune checkpoint-related genes, were conducted and compared between the two MLRS groups. Across them, immune cell infiltration status was assessed by multiple algorithms obtained from the TCGA-pancancer dataset. Correlation analysis was conducted based on Spearman’s test, and the results are summarized in the bubble plot. In addition, immune-related analysis (including cells and functions) was also assessed by using the technology of ssGSEA. Furthermore, TME scores, consisting of immune scores, stromal scores, ESTIMATE scores, and tumor purity, were calculated for each sample in the TCGA-HNSC cohort with the “estimate” R package.
The expression of immune checkpoint genes was compared between the low-MLRS and high-MLRS groups to predict the potential immunotherapeutic response. Additionally, the differences in immunotherapy between the two MLRS groups were predicted and compared concerning the immunophenoscore (IPS) from the TCIA database. In addition, the drug sensitivity of HNSCC patients to five commonly used chemotherapeutic agents was evaluated according to the half-maximum inhibitory concentration (IC50) values.
Identification of tumor subtypes based on the model mrlncRNAs
To further identify the tumor subtypes and assess the TIME, patients of the TCGA-HNSC were then grouped into different clusters with the application of the “ConsensususClusterPlus” R package. Principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) analysis were used to assess the distribution about clusters, and survival comparison TIME analysis and immunotherapy prediction were also investigated.
Discussion
Multiple studies have determined lncRNAs can be regulated by m
5C-related to participate in the biological process of tumors by regulating of RNA localization, stability, and transcription efficiency [
28]. Referring to the summarized list reviewed by Cusenza et al., a large number of lncRNAs have been verified as signatures modified by m
5C in malignancies, especially for squamous cell neoplasms [
29]. For HNSCC patients, a reliable signature for prognostic prediction and immune infiltration assessment is necessary to develop an individual and precise treatment [
3]. Based on the results of this study, a novel MLRS system with eight prognostic mrlncRNAs was established in order to conduct a comprehensive evaluation. Among these model mrlncRNAs, lncRNA of ALMS1-IT1 can accelerate tumor malignant progression (e.g., lung adenocarcinoma) via AVL9-mediated activation of the cyclin-dependent kinase pathway [
30]. In addition, as indicated by previous evidence-based analysis, it was also identified as one of four lncRNAs for survival prediction of HNSCC [
31]. As for SLC7A11-AS1, it can confer malignant progression by repressing miR-4775 and TRAIP expression in lung cancer and reduce tumor growth via the ASK1-p38 MAPK/JNK pathway in gastric cancer [
32‐
34]. Besides, this lncRNA is involved in the cisplatin resistance for gastric tumor with downregulated expression via the SLC7A11-AS1/xCT axis [
35]; nevertheless, downregulation of SLC7A11-AS1 can significantly decrease the NRF2/SLC7A11 expression and inhibit the progression of colorectal cancer [
36]. And as investigated by Yang et al., these lncRNAs can promote chemoresistance by blocking SCF
b−TRCP-mediated degradation of NRF2 in pancreatic cancer [
37]. While knocking down the MIR9-3HG, in cervical cancer, the proliferation of tumor cells will be inhibited and the apoptosis can be promoted via the EP300 [
38]. Similarly, MIR9-3HG can promote carcinogenesis of squamous cell carcinoma by affecting LIMK1 mRNA and protein levels via sponging miR-138-5p and recruiting TAF15, and it was also considered a predictive biomarker in HNSCC via multiple machine learning studies and q-RT PCR [
39‐
42]. Based on previous studies and the results from the LncSEA database, the model mrlncRNAs are considered specifically related to HNSCC and contribute important roles in tumors.
As indicated by our risk model, patients who were assessed with low MLRSs displayed better prognoses in OS, PFS, and DFS, which indicated that increasing MLRS may enhance the risk and shorten the survival time for patients. Similarly, as supported by the results of K-M survival analysis in different clinicopathological subtypes, HNSCC patients with lower MRLSs also showed a better prognosis than these higher MLRS patients. The AUC values for the 1-, 3- and 5-year ROC curves for the MLRS model revealed much more reliable predictive effects than other clinicopathological characteristics and can be used to establish a predictive nomogram with the highest C-index. In addition, although there was no significant statistical correlation between TMB and the MLRS groups, patients could be predicted precisely with different prognostic states when TMB and MLRS were combined. In addition, the increasing MLRS may reduce the RNAss based on the Spearman correlation analysis, suggesting that the high-MLRS group has fewer stem-like cells. Previous studies have noted that stem-like cells are strongly associated with chemotherapy and are considered the main determinant of drug resistance [
43,
44]. Therefore, this correlation analysis can explain the results regarding chemotherapeutic sensitivity that the higher MLRS patients exhibited a better chemotherapeutic response to chemotherapy agents.
Furthermore, as indicated by the functional enrichment analysis, the DEGs between the low-MLRS and high-MLRS groups were associated with biological processes and pathways of the immune response. Similarly, GSEA also supported the results that low MLRS was enriched and associated with immune-related biological processes. In addition, the comparison of TIME, including immune cells, functions, and related scores, determined that those in the low-MLRS group revealed more sensitivity to immunotherapy. For patients with low MLRSs, much more CD8 + T cell infiltration promotes better cancer cell killing and immune tolerance disruption [
45,
46]. This can be determined by the comparisons of TCIA that the low-MLRS exhibited higher IPSs when treated by the PD-1 inhibitors, and the low-MLRS exhibited higher expression of ICI-related genes (e.g., CD274).
In addition, while regrouping patients into novel tumor subtypes referring to the prognostic model mrlncRNAs, those in cluster 1 had a better prognosis but immunosuppressive status, which resulted in less immune cell infiltration and lower stromal scores. Patients in cluster 2, as determined by the TCIA databases, were more sensitive to immunotherapy and can be considered the hot tumor subtype [
47,
48]. Therefore, while being diagnosed with HNSCC, our risk model can assess their risks and identify the tumor type clearly as well as provide detailed immunotherapeutic treatment for those considered hot tumors with a poor prognosis.
However, although our predictive model performed satisfactorily, there were still several limitations in our study. As a predictive model, there is a lack of external lncRNA cohorts to verify the predictive effects. Prospective studies with experimental assays and clinical information are necessary and crucial for further exploration and verification. Actually, we built a model based on m5C-related lncRNAs in TCGA-HNSC cohort and validated it internally based on random allocation, which enhanced the reliability of our results. In addition, we also investigated its predictive value in immune infiltration and immunotherapy based on various algorithms. The coinciding tendency proved our finding also serves as a treatment response indicator and indirectly demonstrated the reliability of this predictive tool. Besides, we applied multiple methods to assess the biological functions, TIME, and clinical therapy to laterally and externally test the prediction, and these results coincided and can be mutually corroborated. Hence, this mrlncRNA risk model can be considered useful and reliable for prognostic prediction.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.