Introduction
Diffuse glioma in the cerebellum is infrequent, accounting for 0.6–3.3% of all gliomas [
1,
10,
18]. Previous studies reported that patients with diffuse cerebellar glioma (DCG) are younger in general, and that DCGs have a relatively smaller tumor volume compared to cerebral gliomas [
1,
18].
Recent comprehensive genetic analysis of gliomas demonstrated that common alterations that contribute to tumorigenesis differ according to the original tumor region in the central nervous system as well as with the patient’s age [
47]. For example, a K27M mutation in
H3F3A, which encodes the replication-independent histone 3 variant H3.3, is predominantly found in pediatric and young adult high-grade gliomas located in a midline structure such as the brainstem, thalamus, or spinal cord, whereas the G34R/V mutation is associated with adolescent glioblastoma (GBM) in cerebral hemispheres [
2,
16,
43,
48]. Ependymoma, a different histological type of glioma, was also demonstrated to have a different molecular profile according to the anatomical region of the original tumor; oncogenic fusions involving
RELA or
YAP1 were generally seen in supratentorial ependymomas, whereas posterior fossa ependymomas had an extremely low number of mutations, and their pediatric subset showed a typical DNA methylation pattern [
26,
31,
32]. Importantly, tumors of different molecular backgrounds show different responses to therapy, leading to different prognoses. Furthermore, identification of tumor-driving molecular alterations in each case would allow selection of relevant molecular targeting drugs that may become available through extensive research in the era of precision medicine. Thus, it is of increasing importance to clarify the molecular background of tumors that may have specific biological traits. However, DCGs, which may be biologically different from common types of gliomas such as those located in cerebral hemispheres, have not been well characterized molecularly, partly due to their relative rarity. As a consequence, it is still unclear whether the diagnostic or therapeutic approaches for cerebral gliomas are applicable to cerebellar gliomas.
To determine the characteristics of cerebellar glioma, we here performed comprehensive molecular profiling of these gliomas including whole-exome sequencing (WES), Infinium methylation array, and RNA sequencing and compared their profile with that of gliomas derived from other anatomical regions. We demonstrated that DCGs have a region-related characteristic molecular profile that may shed light on the cellular origin of DCG, and also could be specifically targeted as a future treatment strategy.
Materials and methods
Clinical samples
Clinical samples were obtained from individuals who underwent surgery at The University of Tokyo Hospital, Kyorin University Hospital, Dokkyo Medical University Hospital, Saitama Medical University International Medical Center, Tokyo Women’s Medical University Hospital, Yokohama City University Hospital, and the National Cancer Center Hospital, with the patient’s informed consent. This study was approved by the ethics committees of each institute.
We only used samples that were radiographically confined to the cerebellum, and cases that had multiple lesions located outside of the cerebellum or had a tumor extending to the brainstem were excluded (Online Resource 1: Fig. S1). Samples were histologically diagnosed according to 2016 World Health Organization (WHO) classification by an experienced neuropathologist in each hospital and were further reviewed by a senior neuropathologist (J.S.) [
25]. Of the 27 DCGs available in this study, 22 were freshly frozen tumors and five were formalin-fixed paraffin-embedded (FFPE) tissues. Of the 22 freshly frozen tumors, matched normal blood was obtained in 17 cases. Only these 17 samples could be analyzed by WES and methylation array, and such comprehensive analyses were not possible for the other 10 cases, because only a small amount or low-quality DNA was obtained from these remaining cases.
For comparison of the gene expression profile, eight cerebral GBM samples were also analyzed. Detailed information of the samples used in this study is provided in Online Resource 2: Table S1.
DNA and RNA extraction
The DNeasy Blood and Tissue kit (Qiagen) was used to extract DNA from tumor tissue and paired normal blood according to the manufacturer’s instructions. The RNeasy Mini kit (Qiagen) was used to extract RNA from freshly frozen tumor tissue. The Qubit fluorometer (Life Technologies) was used to measure the concentration of double-stranded DNA. The Tape station (Agilent Technologies) was used to measure the quality of RNA.
Sanger sequencing
Sanger sequencing was performed to detect the hotspot mutation of IDH1 (R132), IDH2 (R172), TERT promoter (C228 and C250), and H3F3A (K27). The oligo primers used for PCR amplification of these genes and the annealing temperature for each primer set are shown in Online Resource 2: Table S2. The high-fidelity DNA polymerase KOD-Plus-Neo (Toyobo) was used for PCR, and optimized thermal conditions were used. For each primer set, the PCR amplicon was gel-purified and then sequenced. Sanger sequencing was also performed for validation of mutations identified by WES.
Immunohistochemistry
Immunohistochemical analysis was performed with 4-μm-thick FFPE tumor tissue sections. Briefly, after deparaffinization, antigen retrieval was performed for 30 min in citrate buffer (pH 6.0). The slides were then incubated with the following primary antibodies: H3 K27M (Millipore, ABE419, 1:500), H3K36 trimethylation (Abcam, ab9050, 1:2000).
WES
WES was performed for 17 DCGs and matched blood samples (Online Resource 2: Table S1) as previously described [
3,
20,
51]. In brief, DNA was fragmented using the Covaris SS Ultrasonicator. Exome capture was performed with Agilent SureSelect V6 plus COSMIC (Agilent Technologies). Each sample was sequenced with the HiSeq 2000 (Illumina) as 100-bp pair-ended reads. Sequencing data are summarized in Online Resource 2: Table S3. The Burrows–Wheeler Aligner (BWA) and NovoAlign software (Novocraft Technologies) were used to align next-generation sequencing reads to the human reference genome GRCh37/hg19. After removal of PCR duplicates, the Short-Read Micro re-Aligner (SRMA) [
17] was used to improve variant discovery through local realignments.
Mutation detection and copy number analysis
To detect somatic mutations, copy number variations, and tumor purity, we used integrated genotyper software (karkinos:
http://github.com/genome-rcast/karkinos) as previously reported [
3,
20,
51]. For each sample, tumor purity was estimated from allelic imbalance in the matched tumor and normal samples with a program that examined the allelic fractions of heterozygous single nucleotide polymorphisms (SNPs) in regions of loss of heterozygosity (LOH). This algorithm is similar to that described in a previous report from another group [
9]. In some cases where LOH regions were not detected, tumor content ratios were estimated from the distribution of mutant allele frequencies. When both calculations failed to estimate tumor cellularity, we presumed it to be 0.2 for the correction of mutant allele frequencies. Somatic mutant allele frequencies adjusted by estimated tumor content ratios, that were ≥15% were retained. Artifacts originating from errors in the sequencing and mapping were also filtered by heuristic filtering and Fisher’s test. To eliminate germline variations in this study, we carried out comparative analyses using paired tumor and normal samples from the same cases for all the samples analyzed, and we extracted the somatic events detected only in tumor tissues. Mutations were validated by Sanger sequencing or RNA sequencing. For validation of mutations, variant allele reads of each RNA-sequencing BAM files were counted using SAMtools v1.2 mpileup (
http://www.htslib.org/). Sanger sequencing was also performed for the validation.
To analyze copy number changes, the read depth was compared between normal and tumor for each capture target region. After normalizing by the number of total reads and the GC content bias, the tumor/normal depth ratio was calculated, and values were smoothed using a moving average. Copy number peaks were then estimated using wavelet analysis, and each peak was approximated using complex Gaussian models. A hidden Markov model with calculated Gaussian models was constructed, and copy number peaks were linked to genomic regions. The allelic imbalance for each copy number peak was then calculated, and imbalance information and peak distance were further analyzed by model fitting, yielding integer copy number annotation and tumor purity.
RNA sequencing
RNA sequencing was performed as previously described [
22] for 14 DCG and eight cerebral GBM samples that had RNA of sufficient quality and quantity (Online Resource 2: Table S1). An RNA-sequencing library was prepared using the TruSeq Stranded mRNA LT Sample Prep Kit (Illumina) according to the manufacturer’s protocol. Briefly, 1 µg of total RNA was purified using oligo dT magnetic beads, and poly A+RNA was fragmented at 94 °C for 2 min. cDNA was synthesized using SuperScript II (Invitrogen), and adapter-ligated cDNA was amplified with 12 cycles of PCR. Each library was sequenced using HiSeq 2000, loading four libraries per lane of the flowcell, which produced an average of 59.2 million reads of 101-cycle reads for each sample. RNA-sequencing reads were aligned to a human transcriptome database (UCSC genes) and the reference genome (GRCh37/hg19) using the BWA. If multiple isoforms existed in each annotated gene, the longest isoform was selected. After the transcript coordinate was converted to the genomic position, an optimal mapping result was chosen either from transcript or genome mapping by comparing the minimal edit distance to the reference. Local realignment was then performed within an in-house short reads aligner with small seed size (
k = 11). Finally, fragments per kilobase of exon per million fragment mapped (FPKM) values were calculated for each UCSC gene while considering strand-specific information. The gene set used in the Gene Set Enrichment Analysis (GSEA) was composed of 320 genes that were up-regulated in “PDGFRA-amplified GBMs” and used in previous reports [
34,
38]. The gene set of “Proneural GBMs” was obtained from the GSEA website (
http://www.broadinstitute.org/gsea/index.jsp).
Fusion transcript detection and validation
Fusion analysis was performed with RNA-sequencing data of DCGs in this study and data of 173 GBM samples obtained from The Cancer Genome Atlas (TCGA) website (
https://tcga-data.nci.nih.gov). Fastq files from RNA sequencing were used to detect fusion genes using Genomon-fusion (
https://genomon-project.github.io/GenomonPagesR/) with default parameters. At least 12 bases matching both sides of the fusion in each read and more than four reads spanning the candidate breakpoint were required to call the fusion transcript. When two sides resided on the same chromosome, we chose a minimum distance of 100,000 bp to reduce read-through transcripts. To validate fusion transcripts, tumor RNA was reverse-transcribed using Superscript III (Invitrogen) according to the manufacturer’s instructions, and the obtained cDNAs were used as PCR templates. Oligo primers for PCR amplification of the three fusion isoforms were designed to amplify only the fusion transcript. Designed primers, the annealing temperature for each set, and the estimated size of the PCR products are shown in Online Resource 2: Table S2. PCR was performed with KOD-Plus-Neo, and optimized thermal conditions were used. PCR products were evaluated on an agarose gel, and purified products were sequenced to validate the presence of the fusion product.
Microarray data processing
The gene expression microarray data (Affymetrix U133 plus 2.0 platform) reported by Sturm et al. were obtained from National Center for Biotechnology Information (NCBI)’s Gene Expression Omnibus (GEO,
http://www.ncbi.nlm.nih.gov/geo) and are accessible through GEO Series accession number GSE36245. These data were normalized to examine the correlation between
SOX10 promoter methylation and expression [
48]. Overwrapping expression data with methylation data (GSE36278) was used for correlation analysis.
Methylation analysis
The Infinium MethylationEpic BeadChip (Illumina) was used to analyze the genome-wide methylation profile of 17 DCGs (Online Resource 2: Table S1) and one non-neoplastic frontal lobe sample as a control following the manufacturer’s instructions. The beta-value was calculated for each CpG site using the following equation as previously reported [
3]. Intensity of the methylated allele (M)/[intensity of the unmethylated allele (U) + intensity of the methylated allele (M) + 100] [
5]. This beta-value ranged from 0 (unmethylated) to 1 (fully methylated) and reflected the methylation level of each CpG site represented by the probe.
For clustering analysis of methylation data, the Methylation450K BeadChip methylation data from 210 high-grade gliomas and normal cerebellum samples (two normal adult brains and four normal fetal brains) reported by Sturm et al. were obtained from GSE36278 and TCGA website (
https://tcga-data.nci.nih.gov) [
48]. The information of the tumor region of TCGA samples was obtained from pathological reports in cBioPortal (
http://www.cbioportal.org). Methylation data of 224 gliomas including the 14 DCGs in this study were used for clustering analysis. Because three tumor samples (DCG_01, 13, and 14) were determined to have a low tumor content by exome WES data, they were excluded from the clustering analysis. After excluding probes targeting the X and Y chromosomes, and probes associated with an SNP according to TCGA, extraction of common probes between EPIC and 450K probes was performed, and the remaining 300,870 probes in total were used for the following analysis. The standard deviation of beta-values for each probe was calculated, and the top 8000 most variable probes were selected. Unsupervised consensus clustering was then performed utilizing the R package (ConsensusClusterPlus), and the
k-means algorithm (10 random starting sets, maximum of 1000 iterations) was used to calculate the consensus matrix;
k = 6 was selected as previously reported [
48].
Probes within 1500 bp from the transcription start site (TSS) of protein-coding transcripts (UCSC genes and GRCh37/hg19) were considered to be located in a promoter region, and the mean beta-value of all probes in each promoter for each sample was calculated to represent the promoter methylation status of each gene. To identify genes showing a significantly different methylation status in the promoter between 18 DCGs and 123 cerebral gliomas, the mean beta-value of each promoter for both groups was calculated. Welch’s
t test and the Benjamini–Hochberg method were used to calculate
p values and
q values, respectively. A promoter of a gene was considered to be significantly methylated when the following criteria were fulfilled:
q values <0.01 and difference >0.2. To validate promoter methylation of significantly methylated genes with additional data, the Methylation450K BeadChip methylation data in Fontebasso et al. (GSE55712), Zhang et al. (GSE50774), and Aihara et al. (JGAS00000000106) [
2,
14,
55] were used.
Motif analysis
A total of 224 samples from the studies of Sturm et al. and from TCGA in addition to the samples of the current study were divided into three groups according to
SOX10 promoter methylation levels (i.e., “
SOX10 promoter hypomethylation” group, beta-value <0.5; “
SOX10 promoter intermediate methylation” group, 0.5 ≤ beta-value <0.7; “
SOX10 promoter hypermethylation” group, beta-value ≥0.7) [
48]. To select significantly hypomethylated probes in distal elements (distance from TSS >1500 bp) of the “
SOX10 promoter hypomethylation” group compared to the “
SOX10 promoter hypermethylation” group, the average beta-values of each probe for each group were calculated.
P values were calculated using Welch’s
t test, and the Benjamini–Hochberg method was used to calculate
q values. We chose relatively strict criteria of
q values <1 × 10
10 and difference <−0.25 to select nearly top 1000 probes, and a final total of 1070 probes was selected. Windows of 1000 bp around these probes were searched for motifs. De novo motif discovery was performed by using HOMER (v4.9 2-20-2017).
Statistical analysis
Statistical comparisons of mutated genes were performed using Fisher’s exact test. Overall survival curves were calculated according to the Kaplan–Meier method, and univariate assessment of Kaplan–Meier plots were performed using the log-rank test. Statistical comparisons of gene expression were performed using the Wilcoxon rank-sum test. P values less than 0.05 were considered statistically significant.
Discussion
The genetic analyses in this study supported the concept that the molecular characteristics of adult DCGs are different from those of common cerebral gliomas. Frequent gene alterations observed in adult cerebral GBMs such as mutations in the
TERT promoter,
PIK3CA,
PTEN, and
RB1 were not detected in DCGs, and the rates of chromosome 10 loss, chromosome 7 gain, and mutation or amplification of
EGFR were much lower than those of common cerebral high-grade gliomas [
4,
6]. The
IDH1 mutation, which is very frequent in diffuse lower-grade gliomas [
8,
29], was rare in DCGs. In addition, subsequent integrated omics analysis in the present study clearly demonstrated the brain region-related distinct characteristics of DCGs.
WES analysis identified recurrent loss-of-function mutation of
SETD2 in DCGs. All
SETD2 mutations were present in GBMs that had neither the
H3F3A K27M nor the G34R/V mutation. A previous report demonstrated that the
SETD2 mutation frequently observed in pediatric GBM located in cerebral hemispheres occurred mutually exclusively with
H3F3A G34R/V mutation [
15]. In this study, we showed that
SETD2 mutation was also frequent in DCGs in elderly adults. The frequency of
SETD2 mutation (24%, 4/17) in DCGs was significantly higher than those in previous large-scale genetic analyses of adult gliomas, which showed a
SETD2 mutation rate of 1.7% in GBM (5/292,
p = 0.0007) and 2.1% in lower-grade glioma (3/283,
p = 0.0002) [
6,
8].
SETD2 mutation was quite rare in previous reports analyzing brainstem or thalamic gliomas in either adults or children [
14,
15,
50,
54,
55].
In addition to pediatric cerebral GBM, inactivating mutation of
SETD2 has been reported as a driver gene mutation in clear cell renal cell carcinoma (ccRCC), leukemia, and breast cancer [
11,
23,
56]. The study of ccRCC demonstrated that
SETD2 mutation causes loss of H3K36 trimethylation and consequently leads to altered chromatin accessibility and widespread defects in transcript processing that eventually result in promotion of cancer development [
23,
44]. Like in other cancers, reduced H3K36 trimethylation in DCGs with
SETD2 mutation was confirmed by immunohistochemistry [
39], indicating that epigenetic regulation was altered in these tumors. Because
H3F3A K27M mutation, which we found in three DCGs, also results in the loss of H3K27 trimethylation, such epigenetic alterations may play major roles in the pathogenesis of DCG. Recent studies have identified potential drugs targeting epigenetic alterations such as
H3F3A K27M [
28,
36]. Another study showed that the WEE1 inhibitor selectively kills H3K36-deficient cancers through dNTP starvation resulting from ribonucleotide reductase subunit M2 depletion [
35]. Therefore, assessment of these mutations may lead to new drugs for patients with these ominous diseases in the future.
We showed that the p53 pathway is frequently disrupted, and that
PPM1D is one of the recurrently altered genes in DCGs. The protein encoded by
PPM1D is a p53-dependent serine/threonine protein phosphatase that negatively regulates molecules such as p53, CHK2, H2AX, and ATM, which are related to cell stress response pathways. High DNA copy-number amplification or overexpression of
PPM1D has been detected in several tumors including breast cancer, ovarian cancer, and medulloblastoma [
7,
27,
49]. Mosaic
PPM1D truncating mutation, which is found in the germline DNA of a small population of breast or ovarian cancer patients, was recently determined to be a genetic risk factor for those cancers [
41]. Such truncation was shown to enhance
PPM1D stability, and consequently, works as a gain-of-function oncogenic mutation. A previous report demonstrated that a similar somatic truncating mutation was also frequent in brainstem glioma, and that
PPM1D truncation and
TP53 mutation were found mutually exclusively in six and 19 samples, respectively, in 33 brainstem gliomas, whereas
PPM1D truncation was detected in only one of 57 cerebral gliomas and was absent in thalamic glioma [
55]. In this study, we identified a novel
PPM1D fusion, in addition to the truncating mutations in exon 6, neither of which has been previously reported in cerebellar gliomas. This fusion is a novel mechanism of
PPM1D alteration that was identified by RNA sequencing, but should have been missed by WES only. Therefore, the same mutations may exist in brainstem gliomas and other cancers if appropriately examined. Because
PPM1D alterations have been a target of drug development, novel therapeutic opportunities may be available in the future for cerebellar and brainstem gliomas with
PPM1D truncating mutation or fusion [
12].
Some transcription factors play critical roles in the determination of cell fate. For example, SOX10, which is repressed by polycomb repressor in neural stem cells and is induced in oligodendroglial precursor cells, is a key transcription factor for the oligodendroglial lineage [
37,
40]. In this study, we demonstrated that the CpG island promoter methylation status of such developmental genes, particularly of
SOX10 and
FOXG1, was remarkably different between gliomas that originated from different regions. Furthermore, DCGs were characterized by hypomethylation of the
SOX10 promoter and hypermethylation of the
FOXG1 promoter regardless of the presence or absence of K27M mutation, which resulted in upregulation of SOX10 (SOX10+) and downregulation of FOXG1 (FOXG1−). Previously, Sturm et al. showed that epigenetic silencing of
FOXG1 was characteristic of diffuse midline gliomas that are H3 K27M-mutant and located in the brainstem or thalamus, and that this type of glioma has a distinct cell of origin characterized by OLIG1+, OLIG2+, and FOXG1− [
48]. Notably, our analysis revealed not only that the status of OLIG1+, OLIG2+, and FOXG1− was shared between DCGs and K27M midline gliomas, but also that
SOX10 promoter hypomethylation and consequent gene overexpression was commonly found among these tumors, whereas
SOX10 expression is repressed by promoter hypermethylation in most other cerebral high-grade gliomas. Because most of the tumor-specific targets of de novo CpG island methylation are genes that are silenced by the polycomb repressor, hypermethylation of promoter CpG islands in key developmental transcription factor genes in tumors may reflect their repressed status in the tumor’s tissue of origin; thus, the methylation status of these genes may reflect the regulation of dominant transcription factors during their developmental course [
42,
45,
53]. In that regard, we think that it is especially interesting that DCGs and K27M midline gliomas had a similar methylation pattern in the promoters of key developmental transcription factors such as SOX10, FOXG1, OLIG1, and OLIG2, suggesting a particular commonality in their cell of origin or tumor developmental process that appears to be distinct from other cerebral gliomas.
In contrast to the similarity in the methylation status of CpG islands of the developmental transcription factors, our global methylation profile analysis of adult DCGs demonstrated that all DCGs were clustered into either the “K27” group or the “RTK I (PDGFRA)” group, indicating two representative epigenetic profiles are present in adult DCGs. In accordance with these methylation patterns, gene expression analysis demonstrated that adult DCGs were significantly enriched for the PDGFRA-associated genes that were observed in the “PDGFRA-amplified GBMs” in the TCGA project; these GBMs were mostly classified as the “Proneural type” GBMs based on their gene expression profile [
34,
52]. Upregulation of
SOX10, which positively regulates
PDGFRA in the oligodendroglial lineage, may explain why DCGs showed the “RTK I (PDGFRA)” methylation pattern [
13,
40], whereas only a few had
PDGFRA amplification. Indeed, diffuse intrinsic pontine gliomas, which often harbor H3 K27M mutation and
SOX10 upregulation, also frequently show higher expression of
PDGFRA and a specific PDGFRA-related gene expression signature, indicating that the PDGFRA-related gene expression signature is shared by “K27” gliomas and DCGs categorized in the “RTK I (PDGFRA)” methylation group [
33,
34]. Nonetheless, it is noteworthy that the prognosis of patients with DCG was quite different between the “K27” group and the “RTK I (PDGFRA)” group, thus emphasizing the clinical importance of distinguishing these two groups.
In summary, we demonstrated that compared to most cerebral gliomas, adult DCGs had characteristic genetic alterations and epigenetic profiles, which included frequent SETD2 and PPM1D alteration and PDGFRA-related genetic and epigenetic signatures, and that these DCGs were characterized by upregulation of SOX10 and downregulation of FOXG1, which possibly reflects their cell of origin and developmental course. Notably, such a characteristic expression pattern of developmental transcription factors was commonly observed in diffuse midline glioma H3 K27M-mutant, which is a newly defined entity in the 2016 WHO classification of brain tumors. We think that further studies will clarify differences in the cell of origin among tumors that originated from different brain regions and refine the tumor classification, and that tailored therapy that considers tumor molecular characteristics related to the tumor region will be available in the future.