Introduction
Glioma is the most common primary brain tumors, which arises from glial cells within the central nervous system (CNS). The World Health Organization (WHO) classifies glioma on a grading scale of I, II, III, IV. Low Grade Glioma (LGG) typically ranges from grades I–III, while high grade glioma (HGG) are categorized as grades III–IV [
1]. Glioblastoma multiforme (GBM) is a grade IV glioma subtype which highly invasive, making tumor recurrence certain even after a complete resection [
2]. With the current standard of care, the median survival of patients diagnosed is approximately between 12 and 15 months. Thus, there is a substantial need to discovery of more effective therapies to improve patient outcomes.
Microglia, the resident macrophages of the CNS, are directly derived from yolk-sac erythro-myeloid precursor cells (EMP) during embryonic development [
3,
4]. These mono-nuclear cells are 5–10% of cells and distributed throughout the brain, and their functions include regulating immune responses, supporting the homeostasis of the neurons, and maintaining the integrity of the blood–brain barrier (BBB) [
4]. In the healthy brain, there is little turnover for microglia, however, the blood macrophages exhibit a high turnover rate. Although these two immune cell sub-populations were major included in the brain immune system, the different functions of microglia and peripheral macrophages were observed in brain pathology. And, the opposing effects of these two macrophage populations were reported in GBM tumors [
5‐
7]. Furthermore, the genes/proteins used to distinguish these two populations are not exclusively expressed by either microglia or macrophages, but are only enriched, bringing more challenge for exploring the microglia specific biological roles.
Glioma microenvironment consists of various non-neoplastic cells that play an important role in tumor growth, progression, immune response evasion [
8‐
10]. Among these cells, microglia is composed of about 30% of tumor mass [
10] and displayed a close interaction with neoplastic cells. Also, the glioma cells secrete various cytokines acting as polarizing factors on the resident microglia. Recently, many single-molecular signatures which associated microglia and glioma cells were identified by low-throughput experiment strategy. In the study of Sarkar et al., the authors identified a novel factor Gas1 through which microglia arrest the growth of brain tumor initiating cells and displayed anti-tumor property [
11]. And, study of Miyauchi et al. showed that Nrp1 could manipulate the immune functions of macrophages or microglia, and further exhibit its anti-glioma biological roles [
12]. At the aspects of patient prognosis, it was indicated that M2-type microglia hold an unfavourable prognostic value in glioma by low-throughput methods [
13]. However, most of these previous researches identified a limited number of microglia signatures at the gene level, and confirmed the glioma relevance using limited number of datasets. Also, the biological differences between microglia and macrophages, as well as cell-specific involvement in glioma events, were not performed.
Here, we performed a systematically integrated analysis based on several glioma-related microglia datasets. By comprehensively considering the difference between glioma-related microglia profiles and normal microglia profiles, as well as glioma-related microglia profiles and macrophage profiles, we explored the inner biological mechanisms involved in microglia at the glioma condition. In the meanwhile, the gene and subpathway-level signatures were specifically identified for microglia, and the closely connections between these signatures and glioma biological issues were revealed. Finally, a global drug-subpathway network was constructed for exploring the complex drug target relationship and identifying candidate treatment target regions. Based on comprehensive analysis of large-scale microglia and glioma data sets, several novel gene and functional signatures were identified to link microglia features and glioma biological events, with the potential of further clinical applications.
Discussion
In this study, we performed comprehensive analysis for identifying microglia specific gene and subpathway signatures and exploring their associations with glioma biology based on large-scale transcriptomic datasets. Among the gene signatures, P2RY2 displayed predictive performance in glioma patients when considering other clinical factors such as tumor purity. Furthermore, the subpathway-level risk model, subP28, were constructed as a functional signature for predicting patient prognosis and clinical treatment response. Finally, the complex associations between these subpathways and candidate drugs were explored. All these findings indicated inner evidence for connecting microglia and glioma, also provided core signatures for glioma prognostic, drug response and clinical treatment guidance.
Among the MicT/MicN group, nine consistent up-regulated genes were identified by four datasets. And these genes displayed closely associations with glioma formation, recurrence, and prognosis events. Notably, there exists connection between P2RY2 signature and glioma, both in low-purity and high-purity samples. Several subtypes of the P2Y receptors and their functions have been identified in microglia. A previous study found that upregulation of the P2RY2 is detected in macrophage/microglia after spinal cord injury [
30]. P2RY2 expression was also found to be increased in activated microglia. In mice model, P2RY2 is an important receptor for the recruitment and activation of microglia [
31]. Among the purinergic receptors that are activated by ATP, P2RY2 could regulates cell proliferation in various tumors, such as lung and bladder cancer [
32,
33]. Moreover, P2RY2 up-regulation occurs in response to stress or injury in blood vessels and epithelium, and has been linked to the stimulation of smooth muscle growth [
34,
35]. Thus, the novel mechanisms of P2RY2 up-regulation and function in the nervous system warrant further investigation, providing new strategies for the treatment and management of corresponding brain diseases.
Based on the global random walk, the microglia specific subpathways were identified. The subpathways, different regions within whole pathway, displayed more specificity than whole pathway and more robust than gene-level signature. As shown in Fig.
3A, the different subpathway signatures derived from the same pathway indeed displayed different patterns, such as Path: 04810_15. Similarity, we also observed that this subpathway displayed higher activity in samples with mesenchymal type (Additional file
13: Fig. S13A) and samples with IDH1 wide type (Additional file
13: Fig. S13B). And the similar results were also shown in another independent datasets (Additional file
13: Fig. S13C).
Glioma cells secrete various cytokines and chemokines acting as chemo-attractants and polarizing factors on the resident microglia [
10]. In tumor microenvironment, infiltrating microglia adopt different activation states between antitumor M1 and protumor M2 phenotypes, and these functional phenotypes are defined by differential expression of surface markers, secreted cytokines, and roles in immunoregulation [
36]. Activated microglia assume the M1 phenotype characterized by the expression of STAT1 and are capable of stimulating antitumor immune responses by presenting antigens to adaptive immune cells, producing proinflammatory cytokines, and phagocytosing tumor cells. In comparison, the alternatively activated pathway, M2 is characterized by expression of the scavenger receptors, intracellular STAT3 and the production of immunosuppressive cytokines. M2 polarization prevents the production of cytokines required to support tumor-specific CD8
+ T, CD4
+ Th1, and Th17 cells and promotes the function of tumor-supportive CD4
+ regulatory T cells [
37]. Recent studies indicate that glioma cells induce a mixed population of GAMs expressing both M1-and M2-related molecules [
38]. And in this study, the microglia specific SubP28 score was positively related with macrophage M2 condition, which was also the high risk factor in glioma prognosis (see Fig.
4E). The concept of GAMs playing a key role in glioma pathobiology has been verified in recent studies that demonstrated reduction of glioma growth after macrophage ablation or pharmacological inhibition [
39]. Furthermore, we systematically compared our SubP28 signatures with several glioma functional sets [
40‐
42]. As shown in Additional file
14: Fig. S14, it was shown that SubP28 shared many overlapping genes with microglia/macrophage gene sets, and glioma NPC2 & MES conditions. In the meanwhile, these exists no associations with SubP28 signatures and cell cycle characterization (G1/S and G2/M sets).
To comprehensively explore specific biological roles involved in microglia not in macrophage, we searched and obtained datasets from brain tissues which contained both microglia/macrophage and glioma/normal information. Some datasets derived from blood or other fluid tissue were removed, which provided limited resource for resident microglia functional exploration. With the development of more available datasets, the gene and subpathway signature will be further confirmed as their clinical roles. We conclude that engaging microglia signatures identified, reflecting the immune roles within the microenvironment of glioma, will lead to novel therapies that improve the outcome of patients suffering from this terrible disease.
Materials and methods
Gene expression data sets
In this study, available gene expression datasets were identified from Gene Expression Omnibus (GEO) database by specifically choosing only studies that performed gene expression analysis of both resident microglia and macrophages from brain tissue in glioma condition. Datasets utilized for microglia integrated analysis included GSE65868, GSE86573 and GSE80338. The disease condition (glioma or normal) and cell populations (microglia or macrophage) were obtained from the previous researches. Two species involving mouse and human were included, and gene IDs conversation were performed using R package org.Hs.eg.db. Within the GSE86573, the blood tissue was not considered, and the samples from brain tissue were regarded as normal samples. Within the GSE80338, epilepsy and postmortem samples were respectively regarded as normal samples.
To explore the biological roles of microglia gene or functional signatures in glioma pathology, large amount of gene expression data sets with glioma clinical information were obtained from GEO, The Cancer Genome Atlas (TCGA), Chinese Glioma Genome Atlas (CGGA), and Pan-Cancer Analysis of Whole Genome (PCAWG) databases. And a total of 26 data sets were included. These data sets contained three kinds of glioma events, including glioma and normal samples, recurrence and primary samples, and samples with prognostic information. The detailed description of all gene expression data sets mentioned above were given in Additional file
15: Table S1.
Differentially expression analyses
We utilized two methods to respectively performed differentially expression analyses for RNA sequencing and microarray datasets. For RNA sequencing datasets (GSE86573 and GSE80338), we utilized R DEseq2 package to identify differentially expression genes based on the raw count matrix. For one microarray dataset (GSE65868), we identified the differentially expression genes based on FPKM expression matrix by integrating fold change and T-test methods. For all datasets, the differentially expression genes were obtained by absolute log2-based fold change > 1.5 and false discovery rate (FDR) or adjusted P-values < 0.05.
Functional exploration for microglia specific genes
Based on the consistent differential expression genes shared by 2 MicroT/MacroT datasets, and 2 or 3 MicroT/MicroN datasets, we further performed functional enrichment analysis by using the R
clusterProfiler package [
43]. And the Gene Ontology (GO)—Biological Process (BP) terms were considered. Also, the tumor hallmark gene sets were obtained from the molecular signature database (MsigDB) for functional association analysis. And hypergeometric distribution test was used to evaluate the associations between consistent genes and known hallmarkers, and the P-values were calculated as follows:
$$P = 1 - \sum\limits_{x = 0}^{r - 1} {\frac{{\left( {\begin{array}{*{20}c} t \\ x \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {m - t} \\ {n - x} \\ \end{array} } \right)}}{{\left( {\begin{array}{*{20}c} m \\ n \\ \end{array} } \right)}}}$$
where m was the number of the human whole genome, and t was the number of genes included in one hallmark gene set. The number of consistent signature genes was n, and
r genes out of n genes were included in the hallmark gene set.
A novel framework for identifying microglia specific subpathways
For exploring the biological roles of microglia at the functional level, we developed a novel framework to identify microglia specific subpathways. As shown in Additional file
3: Fig. S3, we firstly obtained a high-quality PPI network from a previous study [
19], which displayed protein–protein interaction from at least two data resources. Based on the global network, we further performed a global impact analysis to rank candidate mRNAs by using random walk algorithm [
44]. And the consistent gene signatures of microglia, shared by two data sets in MicT/MacT group and three data sets in MicT/MicN group, were regarded as seed nodes. As a results, a total of 55/36 up-regulated/down-regulated genes from MicT/MacT group, and 59/5 up-regulated/down-regulated genes from MicT/MicN group, were respectively annotated into network as seed. And, the random walk algorithm was performed four times to evaluate the global impact of seed nodes at different aspects as follows:
$$P^{t + 1} = \left( {1 - r} \right)WP^{t} + rP^{0}$$
where
W was the column-normalized adjacency matrix of the global network, which consisted of 0 and 1. P
t was a vector, in which a node in the network held the probability of finding itself in this process up to step
t. The initial probability vector,
P0, was constructed in such a way, where equal probabilities were assigned to all seed nodes and the sum of their probabilities was equal to 1. Additionally, the restart of the walker at each step was the probability,
r (
r = = 0.7). When the difference between P
t and P
t+1 fell below 10
−6, the probabilities reached a steady state. Finally, each gene in the network was given a score according to the values in the steady-state probability vector,
P∞. After random walk algorithm, each candidate gene get four scores (score_up and score_down) from both MicT/MacT and MicT/MicN groups.
Then, we obtained subpathway list from R
subpathwayMiner package [
26], which contained at least three gene components from 1773 subpathways. For each subpathway, we calculate the subpathway score based on the gene score from random walk analysis. And, the formula was provided as follow:
$${\text{Score}}_{{{\text{subpath}}}} = \overline{{Score_{up} }} - \overline{{Score_{down} }}$$
where
\(\overline{{Score_{up} }} /\overline{{Score_{down} }}\) was respectively the mean score of genes within one subpathway for MicT/MacT and MicT/MicN groups. For each subpathway, we further performed 5000 random perturbation by random assigning score which was equal to original gene number to calculate the significance as follows:
$${\text{P - }}value_{{{\text{subpath}}}} = \frac{{\left| {Score_{Random} > Score_{True} } \right|}}{5000}$$
where was the true subpathway score and was the random results. From this analysis, the subpathways with P-value < 0.001 were identified as the microglia specific subpathways. Based on these subpathways, a subpathway network was also constructed if any two subpathways shared more than seven gene components.
The associations between microglia subpathways and glioma biology
For all glioma data sets, we took three types of comparison into consideration, including tumor compared to normal samples, recurrence compared to primary samples, high-risk compared to low-risk samples. And the R limma package was utilized to perform the differentially expressed analysis. For this analysis, we identified differentially expressed genes (up-regulated and down-regulated both considered) based on the adjusted P-value < 0.05 and absolute log2-based fold-change > 0.5. And the univariate cox analysis was performed to identify high-risk or low-risk genes based on the P-value < 0.01 and absolute HR > 1. And the data sets without significant gene results were removed. And then, the overlapping between genes within subpathway and up-regulated (high-risk) or down-regulated (low-risk) genes from each data set were evaluated using the hypergeometric test method. And P-value < 0.05 was considered as the significant associated result.
Identification of subpathway prognostic model
We constructed a prognostic model by utilizing expression profiles of microarray platform as the training set. There were systematic deviations between the microarray datasets generated by different laboratories at different times. Therefore, we firstly utilized Combat function in the R SVA package to eliminate the batch effect and formed a merged training set. Based on the training set, we further calculated the NES enrichment score for each subpathway. And then, a generalized linear model by a maximum likelihood estimation with the l1 penalty (Lasso), implemented in R glmnet package was performed. The optimal parameter λ was identified by choosing the minimum over a grid and subpathway signatures with non-zero coefficients were selected.
A drug-subpathway network was constructed based on HGCC resource
Based on the HGCC resource [
28], we obtained the drug IC50 information, as well as gene expression, methylation and CNV data for each GBM cell line. Firstly, based on the median IC50 value as cutoff, we defined two cell line groups, high IC50 groups and low IC50 groups. And then, based on these two groups, we respectively identified drug related genes based on gene expression level, methylation condition, and CNV data. For gene expression profiles, the T-test was used. And for methylation and CNV data, the wilcoxon rank sum test was used. And the cutoff for differentially expression (DE) analysis was set as adjusted P-value < 0.05. Finally, we evaluated the associations between DE genes and 28 microglia subpathways by using the hypergeometric test method. And the result with P-value < 0.05 was considered as the significant associations. An integrated drug-subpathway network was constructed by considering three level of omics data.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.