Background
Head and neck squamous cell carcinoma (HNSCC) is the sixth most common tumor in human [
1], numbering eighth in the list of causes of tumor-related death [
2] and accounting for over 500,000 new cases each year worldwide [
3,
4].
The two traditional main risk factors for HNSCC are alcohol and smoking, while the past decades witnessed an increasing population of HNSCC patients with persistent infection of human papillomavirus (HPV) [
5,
6], which is lately responsible for 60–80% of the oropharyngeal cancer incidence in the United States and Europe [
7‐
9]. Patients who develop HPV + ve HNSCC generally end up with a better prognosis when compared to patients with HPV-ve HNSCC [
10‐
13]
Previous studies have demonstrated that immune system plays a significant role in the development of HNSCC [
5], and there are significant differences in the composition of tumor microenvironment (TME) [
14‐
16] between HPV-infected and non-HPV-infected HNSCC. TME, consisting of tumor tissue, micro vessels, cytokine and tumor-infiltrating immune cells (TICs) etc., are essential for tumor progression. Differences in the compositions of TICs, including cytotoxic T cells, helper T cells, dendritic cells (DCs), as well as related inflammatory pathways lead to different clinical tumor behavior [
17]. However, immune landscape and its interaction with HPV status is still unclear.
Currently, the majority of patients are commonly provided with the standard treatment of surgery, radiotherapy, chemotherapy, or a combination of these therapies, but recurrence and resistance to following treatment will occur in about 40–60% of treated patients [
16]. Therefore, the 5-year OS rate of HNSCCs hasn’t changed much over the past decade [
16]. Studies on the immune infiltration feature and underlying molecular mechanism of HNSCC population may provide potential molecular targets that improve therapeutic selection and better predict the therapeutic response.
With the development of bioinformatics tools, it is possible to process a huge scale of data at one time. In terms of immune system, a great quantity of algorithms has emerged, such as Estimation of STromal and Immune cells in Malignant Tumor tissue Expression (ESTIMATE), Single sample Gene Set Enrichment analysis (ssGSEA), Cibersort, Tumor Immune Dysfunction and Exclusion (TIDE),etc. Weighted gene co-expression network analysis (WGCNA) has been applied in various types of tumor, in which it is used to look for interaction between gene expression features and clinical characteristics. ESTIMATE is used to assess the extent of immune cells and stromal cells infiltration of tumor tissue by gene expression signatures.
In this study, we obtained HNSCC samples containing both HPV status and sequencing data from public available databases. Combined with the algorithms mentioned, we aimed to identify hub genes associates with immune cell infiltration and HPV status in HNSCC population. We also assessed the infiltrative immune cell type and function, willing to explore the effect of hub genes on immune microenvironment of this specific tumor subtype. In addition, we analyzed the response to immune check point treatment and drug sensitivity in samples with high hub genes expression.
Methods
Data procession of gene expression omnibus
We downloaded two gene expression microarrays (GSE65858 and GSE40774) of HNSCC from NCBI Gene Expression Omnibus (GEO;
https://www.ncbi.nlm.nih.gov/geo/). GSE65858 contained 268 samples (73 HPV + ve, 196 HPV-ve), and GSE40774 contained 134 (58HPV + ve, 76 HPV-ve) samples. We used R package limma to normalize the data and screen the differently expressed genes (DEGs).
Data procession of TCGA
We filtered the clinical characteristics and gene expression data of the Cancer Genome Atlas Project database (TCGA,
https://cancergenome.nih.gov/) HNSCC cohort, in which we collected 102 samples with both HPV status imformation and sequencing data (30 HPV + ve, 72 HPV-ve). We used Variance Stabilizing Transformation function of R package DEseq2 to normalize data and do differential expression analysis between HPV-infection and none-HPV-infection group.
Immune microenvironment assessment
Among TICs, the two major type of cells are immune and stromal cells. Immune and stromal scores were calculated by analyzing specific gene expression signature, and combined to represent a measurement of tumor purity. In this study, we used R package ESTIMATE [
18] to assess the immune infiltration of HNSCC samples.
We implemented CIBERSORT [
19] (
http://cibersort.stanford.edu/) to quantify the tumor infiltrating lymphocytes (TILs) in HNSCC samples. CIBERSORT is a deconvolution algorithm. By using signature matrix (referring to gene expression values), which minimally represents each cell type, as well as support vector regression, it measures the population of each type of cell from bulk tumor samples. We uploaded the standardized gene expression matrix to the website of CIBERSORT (mentioned above), then ran the algorithm under LM22 signature and 1000 permutations. The LM22 gene signature includes over 500 genes which features high sensitivity and specification of common human immune cell types such like neutrophils, NK cells, DCs, macrophages, T cells, eosinophils, B cells, etc. Patients with
P value under 0.05 were enrolled for further analysis.
We then quantified the infiltration levels of immune cell types by ssGSEA in R package gsva [
11,
20]. The ssGSEA uses the scoring result to individual tumor samples. In our study, we enrolled 24 immune cells of both innate immunity and adaptive immunity. We further analysised the function status of TILs by Tumor Immune Dysfunction and Exclusion (TIDE) algorithm [
21].
Weighted co-expression network construction
The gene expression microarray GSE65858 was further used for WGCNA. We selected the top 25% genes (ranked by SD from high to low) to construct co-expression network. We calculated the adjacency coefficient (aij) through R package WGCNA, which represented the association extent between each two genes as follows:
$$ {\mathrm{S}}_{\mathrm{ij}}=\left|\mathrm{cor}\left(\mathrm{xi},\mathrm{xj}\right)\right|{\mathrm{a}}_{\mathrm{ij}}={{\mathrm{S}}_{\mathrm{ij}}}^{\upbeta} $$
Supposing that X
i and X
j refer to expression value vectors for gene i and j, s
ij refer to the Pearson’s correlation coefficient of X
i and X
j, and is exponentially transformed into a
ij, which measured the network association extent between gene i and j. The soft-thresholding borderline was set by soft threshold function of R package WGCNA to construct a scale-free network. The power of β = 4 (scale-free
R2 = 0.95) was set as the parameter of soft-thresholding. We gathered genes with high absolute value of correlations in one module in the network. We calculated the topological overlap measure (TOM) by the following equation:
$$ {\mathrm{TOM}}_{\mathrm{i},\mathrm{j}}=\frac{\sum_{K=1}^N Ai,k\ast Ak,j+ Ai,j}{\min \left( Ki, Kj\right)+1- Ai,j} $$
Modules are identified through hierarchical clustering of the weighting coefficient matrix. TOM refer to the overlap in neighboring genes of i and j.
HPV/immune-associated key modules and gene signature identification
We analyzed the relation between modules and their clinical or genetic features in order to identify the HPV/immune-associated key modules. We first assessed the relation between module eigengenes (MEs) and certain feature. ME is the first principal component of each module, of which the expression represents all genes in the module. Then, we calculated the gene significance (GS) respectively for each gene (GS = lgP) in the linear regression between sample features and gene expression. Then we calculated the module significance (MS), referring to the average GS of all genes in a module. We selected the module with highest absolute MS value to be the most relevant module of selected sample features.
We downloaded the single cell sequencing data of profile GSE103322 and gene signatures of tumor-related pathways from CancerSEA (
http://biocc.hrbmu.edu.cn/CancerSEA/). GSE103322 included 2105 tumor cells. The expression of tumor-associated pathways (including apoptosis, angiogenesis, differentiation, cell cycle, DNA damage, EMT (epithelial to mesenchymal), DNA repair, inflammation, invasion, quiescence and stemness pathways) in tumor cell was analyzed by through Gene Set Enrichment Analysis (GSEA). We applied k-NN (k = 15) algorithm from R package “FNN” to calculate the nearest neighbor of each cell at gene expression level and generate the nearest neighbor graph. The neighbors of each cell were taken as input for the visualization. Fruchterman Reingold algorithm was applied to calculate the force-directed layout on the k-NN graph via Gephi Toolkit. The red color referred to the pathway with high expression in the cell, while blue color referred to a low expression.
Hub genes correlated with HPV-infection and immune infiltration microenvironment were expected to be candidate genes. We identified the hub genes by two steps. Firstly, in GSE65858, the module membership (MM) of each gene in key module was calculated. The genes with MM over 0.8 was considered to be associated with the module features, as is described above. Secondly, we used R package limma for GSE40774 and DEseq2 for TCGA HNSCC to do differential expression analysis respectively. The cut off value was set as log2FC > |1|, and adjusted P-value < 0.01. We picked the intersection between the results from GSE65858, GSE40774 and TGCA database, among which we identified top eight hub genes. Differential expression analysis of hub genes was performed between tumor and normal samples, and also among different grade of tumor.
Prediction of therapy response
The expression degree of hub genes in GSE65858, GSE40774 and TCGA HNSCC were selected for further analysis. We used GEPIA2 (
http://gepia2.cancer-pku.cn/) to anlyze survival outcome of hub genes; P under 0.05 was considered to be statistically significant.
Immune checkpoint blockade (ICB) is a novel tumor therapy, but the effective population is still unsure. To measure ICB response, we used the TIDE algorithm and subclass mapping. Two immune blockade drugs, CTLA4 and PD-1, were chosen. Besides, in order to explore whether hub genes could add information to exsisting criteria in therapy chosen, we applied a multivariate analysis. We firstly divided the patients into 6 subgroups(p16 positive/negative, CD274 score high/low, immune cell score high/low). Then we divided each subgroup into two groups based on the expression of these hub genes. Finally, we applied subclass mapping algorithm and pRRophetic algorithm for each subgroup in three independent datasets to test the independency and robustness of the hub gene marker.
We predicted the chemotherapeutic response of our samples on the pharmacogenomics database [the Genomics of Drug Sensitivity in Cancer (GDSC),
https://www.cancerrxgene.org/]. Four widely used chemotherapeutic agents for HNSCC, bleomycin, docetaxel, methotrexate, cisplatin, were selected. We used R package “pRRophetic” for prediction. The half-maximal inhibitory concentration (IC50) of the samples, as the parameter, was calculated by ridge regression. Rediction accuracy was assessed by 10-fold cross-validation, basing on the GDSC training set. Subgroup analysis was performed to test the independency and robustness of the hub gene marker in chemotherapy. The subgroups chosen was immune cell score high/low, HPV + ve/−ve and CD274 score high/low.
Discussions
In this study, we used a combination of several bioinformatics tools, including WCGNA, ssGSEA, CIBERSORT, TIDE,etc., to explore significant genes both related to HPV infection status and immune cell infiltration in HNSCC patients. A total of eight hub genes was identified, including LTB, CD19, CD3D, SKAP1, KLRB1, CCL19, TBC1D10C and ARHGAP4. By analyzing downloaded single cell sequencing data, we identified the underlying function of these hub genes. We found that in the HNSCC population, the hub genes had a stable co-expression, which was related to immune cell infiltration, especially CD8+ T cells, and the infiltrative cells were in a dysfunctional status, which had corresponded with several previous studies [
23,
24]. HPV + ve. Samples with high hub genes expression presented with better response to immune check point block (ICB) therapy, bleomycin and methotrexate.
In total we got 8 hub genes, which were LTB, CD19, CD3D, SKAP1, KLRB1, CCL19, TBC1D10C, ARHGAP4. Previous studies has researched on some of these genes respectively. LTB gene encodes a type II membrane protein of the TNF family lymphotoxin-β. Hsu DS [
25] has reported that LTB Interacts with Methylated EGFR to mediate acquired resistance to Cetuximab in HNSCC. KLRB1 protein is expressed by NK cells and may be involved in the regulation of NK cell function. Previous studies shows that high expression of KLRB1 is closely related to the better prognosis of HNSCC patients [
26]. CCL19, as the ligand of CCR7, induces several CCR7 activated pathways in metastatic HNSCC [
27‐
38]. TBC1D10C protein is an inhibitor of both the Ras signaling pathway and calcineurin, and is reported to be related to the immune response, inflammatory response and formation of the tumor microenvironment [
39]. In our research, we developed a combination of these eight genes, in which the hub genes had a stable co-expression in HNSCC population among different sample sets. The co-expression of hub genes may indicate immune microenvironment of HNSCC patients. Besides, some studies also found other underlying genetic markers. Previous studies [
40,
41] demonstrated the relation of NF-kappaB proteins and AP-1 super-family proteins with oral cancer. An activation and differential expression of NF-kappaB proteins in HPV infected oral cancer was found, especially p50 and p65, leading to heterodimerization of p50/p65, which may be related to improved differentiation and better prognosis. P50:p50/NF-kappaB homodimer formation may has a cross-talk together with overexpression of AP-1 pathways in oral cancers. The relation of NF-κB proteins and AP-1 family proteins with HPV infection in tongue squamous cell carcinoma (TSCC) was mentioned by Gupta S et al. [
42,
43] as well. Further exploration of the association between hub genes and NF-κB proteins, AP-1 family proteins need to be applied.
We found that expression of hub genes has a positive association with T cell dysfunction in tumor. HPV + ve HNSCC population has been found to have a typical T cell infiltration in, especially CD4+ and CD8+ T cells [
14‐
16], and feature a high level of immune cells infiltration but high degrees of immunosuppression [
23,
24,
44]. Our result corresponded previous studies. The hub genes we found related to CD8+ T cell infiltration in HPV + ve HNSCC, and the infiltrative immune cells were in dysfunction status.
While HPV infection status could divide HNSCC patients into two clinically, genetically, and immunologically different subtypes, current treatment for both subtypes are still similar [
10]. Immune cell infiltration have been shown to be significant in the progression of HPV + ve HNSCC and in the favorable prognosis of HPV + ve HNSCC. Thus, ICB therapy could be a more promising treatment. HPV + ve HNSCC was reported to express higher PD-L1 proteins compared with HPV-ve HNSCC [
45]. Based on this situation, we further assessed PD1/PD-L1 targeting therapy response in HPV + ve HNSCC patients. The TIDE result showed that hub genes expression was significantly associated with the activation of CD274(PD-L1), and the subclass mapping result was similar. This suggested that HPV + ve HNSCC population may be more suitable to ICB treatment. We also predicted the chemotherapy response of patients with high hub gene expression. Among all common used drugs, we found a positive association between HPV infection and response to bleomycin and methotrexate, which may be related to the better prognosis of HPV + VE HNSCC. To summarize, the eight hub genes we found stably co-express in immune cell infiltration of HPV + ve HNSCC population. The co-expression of hub genes related to an immune microenvironment featuring an increase in immune cells, especially CD8+ T cells, but high degree of immune dysfunction status. Patients with high hub gene expression had a better response to ICB treatment, bleomycin and methotrexate. The concrete mechanism of hub genes function demands further exploration.
This study has several strengths. We applied a combination of several bioinformatic tools to explore the underlying genetic mechanism of the better prognosis in HPV + ve HNSCC. Besides, we found the relation of hub genes with therapeutic response, which may help with immunotherapy and chemotherapy chosen. The weaknesses are as follow: Firstly, different prognosis may exist among HPV + ve samples with different hub genes expression. However, due to the incidence of HPV negative HNSCC is higher, we didn’t find an appropriate cohort containing enough HPV positive samples with survival information to draw a reliable conclusion. Secondly, hub genes expression has positive association with T cell dysfunction leading to treatment failure and tumor progression, but in contrast HPV + ve HNSCCs show better prognosis following standard treatment. The correlation between T cell dysfunction and the better prognosis of HPV + ve HNSCC toward standard treatment should be further studied. Finally, the relation between hub genes expression and response of docetaxel was still unclear according to our study. Since docetaxel has been part of one-line therapy for HNSCC, further research is needed to explore the possibility of hub genes expression in predicting the response to docetaxel.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.