Introduction
Multiple Myeloma (MM) is characterized by an abnormal clonal plasma cell infiltration in the bone marrow, which may lead to the development of lytic bone lesions and myelosuppression [
1,
2]. The etiological genetic features of MM include translocations between the
IgH locus and a number of oncogenes, including
MMSET/
FGFR3 (4p16),
CCND1 (11q13),
MAF (16q23),
MAFB (20q12), or aneuploidy demonstrated in patients with hyperdiploid genomes [
3‐
5]. In addition to etiological events, secondary acquired genetic abnormalities, including recurrent mutations, have been reported. These acquired genetic abnormalities deregulate key oncogenes and tumor suppressor genes in MM [
6].
Few studies in myeloma have attempted to clarify the epigenetic drivers and their impact on the underlying disease, with the majority having focused on global alterations in DNA methylation, histone modifications, and noncoding miRNAs [
7‐
11]. Individual epigenetic marks have been investigated through the use of low-throughput techniques, such as methylation specific PCR, pyrosequencing, and semi-high output 450K methylation arrays [
8,
9,
12].
Regarding DNA methylation, we and others have shown that there is a significant change in DNA methylation levels at the transition from monoclonal gammopathy of undetermined significance (MGUS) to MM, resulting in genome-wide hypomethylation while specific genes are hypermethylated [
8,
11]. There is also a clear difference in the DNA methylation levels in the t(4;14) MM subgroup compared to other subgroups, and this is thought to be due to over-expression of the histone methyltransferase MMSET in this group. DNA methylation has also been used to identify genes of prognostic interest, highlighting the importance of this biological process [
9]. However, the possible internal cross-talk between epigenetic regulators at the DNA and histone levels and their combinatorial effects on gene expression patterns in different MM molecular subgroups has not been addressed.
To address this deficiency, we have optimized the use of enhanced reduced representation bisulfite sequencing (eRRBS), complemented with 850K methylation array (Illumina), in newly diagnosed MM (NDMM) patients of six molecular subgroups to determine the alterations in DNA methylome per subgroup in order to compare to healthy donors. Enrichment of promoter and gene body-associated CpG sites allows robust correlation between DNA methylation at differentially methylated regions (DMRs) and expression of the closest gene. Additionally, we show that these DMRs co-localize with other epigenetic factors, including histone marks and SE protein signatures, to impact gene expression dysfunction in MM.
Methods
Patients and sample preparation
Fifty two NDMM patients were consented with IRB approval for bone marrow aspirates for CD138+ cell selection (RoboSep, StemCell Technologies, Germany) to enrich for tumor cells at least >90%. Patients represented the major translocation and hyperdiploidy subgroups and were compared to CD138+ PCs isolated from bone-marrow random aspirates of four age-matched healthy donors. These patients were well-characterized in terms of diagnostic variables, demographic, and clinicopathological parameters (
Supplementary Table1). DNA and RNA were extracted using AllPrep DNA/RNA mini kit (Qiagen, Hilden, Germany), RNeasy RNA extraction kit (Qiagen), or Puregene DNA extraction kit (Qiagen). Bisulfite conversion of DNA was carried out using EZ-DNA methylation kit (Zymo Research, CA, USA).
eRRBS sample processing, library preparation, and sequencing
The eRRBS protocol was optimized with 100 ng of genomic DNA. Briefly, DNA samples were digested overnight with MspI followed by end-repair and A-tailing, methylated adapter ligation, uracil removal treatment, magnetic bead-based size selection, bisulfite conversion, and PCR enrichment [
13]. The size and concentration of library fractions were determined prior to sequencing. Samples were multiplexed and sequenced using 75-bp single end reads.
Interpretation of eRRBS data
Quality control of the sequencing reads and methylation base calling were performed using bcl2fastq2 (Illumina) and TrimGalore (v 0.4.4) software, respectively. We obtained an average of 2.198x10
7 total aligned reads per sample and measured the methylation levels of an average of 21 million methylated CpG sites per sample from the eRRBS data (
Supplementary Table1). Sequencing data were aligned to whole genome version hg38/GRCh38 using the Bismark alignment software (v 13.0) (Babraham Bioinformatics, UK). Differential methylation analysis was performed using DMAP (v 1.42) [
14] and cytosines with fewer than 10 reads in any sample were discarded from subsequent analyses. Bismark quality control report on the eRRBS data are listed in
Supplementary Table 2.
DMRs containing at least 2 CpG sites were considered for subsequent analyses, provided the methylation percentage of both the control and NDMM groups were not >80%, <20%, or between 40% and 60%. DMRs of ≥10% (a false discovery rate [FDR]-adjusted p-value <0.05) increase (hypermethylation) or decrease (hypomethylation) in NDMM subgroups, compared to control samples, were considered significant for all the subsequent analyses. DMRs were annotated by identifying the closest TSS according to RefSeq. Regions that were 5 kb upstream and 200 bp downstream from the TSS were marked as promoter regions, while the region immediately downstream of the defined promoters to the 3’ ends were marked as gene body. Relative distance percentage (D) of a DMR from the nearest TSS site was calculated using the following formula:
D= (Absolute distance from the end base of DMR-TSS start base/Gene length) X 100
Genes with the combination of both hyper- and hypomethylated DMRs at promoter or body are excluded from the present study. An unsupervised hierarchical clustering was performed using the top 5% most variable CpG sites using hclust method in R.
MethylationEPIC 850k bead array
Methylation array was performed in 48 patients, including 34 of the 52 NDMM patients of the eRRBS dataset in addition to 14 NDMM samples. 500 ng of genomic DNA was used as input. Genomic DNA were bisulfite converted and processed on Infinium HumanMethylationEPIC BeadChip arrays (Illumina Inc., CA, USA) per manufacturer’s protocol. Microarray raw IDAT and annotation files (Infinium MethylationEPIC v1.0_B4 Manifest File) for the EPIC assay were loaded into GenomeStudio software (v 1.9.0, Illumina) for differential methylation analysis. Difference in the average methylation proportion (β value) at a CpG site between control and NDMM patients >0.1 were considered significant at a diffscore >+/-13 (equivalent to adjusted p-value 0.05). All downstream analyses were conducted by converting the hg19/GRCh37 coordinates of EPIC probe-sets (default in GenomeStudio) to hg38/GRCh38 using the UCSC lift-over tool to match and validate the eRRBS data, wherever possible.
Gene expression profiling
Gene expression profiling (GEP) using Affymetrix U133 Plus 2.0 arrays was performed. CEL files were processed using the Transcriptome Analysis Console (v 4.0, Thermo Fisher, CA, USA), where raw intensity values were MAS5 normalized and converted to log2 scale. Average GEP value over 2-fold more or less in NDMM patients compared to controls were defined as overexpressed and under-expressed genes respectively.
RNA sequencing
RNA sequencing was performed on 45 out of the 52 NDMM patients (as specified in
Supplementary Table 1). RNA-seq was performed using 100 ng total RNA with genomic DNA removal using the TURBO DNA-free kit (Ambion). RNA was prepared using the TruSeq stranded total RNA Ribo-zero gold kit (Illumina) and libraries were sequenced using 75 bp paired end reads on a NextSeq500 (Illumina). RNA-seq data was analyzed using the transcript aligner STAR (v2.5.1b) [
15] and transcript level data were generated by Salmon (v0.7.2) [
16]. RNA-sequencing and microarray data were compared for key genes per NDMM subgroup (
Supplementary Figure 11).
Intersect analysis of methylation and expression
A Venn intersection analysis was carried out using Venny (v 2.0) to identify the hypermethylated under-expressed and hypomethylated overexpressed genes at the promoter. Additionally, the methylation and expression correlations were determined on hypomethylated-under-expressed and hypermethylated-overexpressed gene clusters at the gene bodies.
Gene ontology and protein interaction prediction
Histone modifications (ChIP-seq) analysis
Chromatin immunoprecipitation and fixation were carried out per manufacturer’s protocol (Active Motif Inc.) on mycoplasma free and STR (short tandem repeat) checked KMS11 and MM1.S cells. Antibodies were used to determine genome-wide four activating histone marks, namely H3K4me3, H3K27ac, H3K4me1, and H3K36me3, and two inactivating histone marks, namely H3K9me3 and H3K27me3 and data are available under accession number GSE151556. Purchase of ChIP antibodies, experiments, and analyses of the data were carried out in association with Active Motif Inc. (CA, USA). The histone marks for the U266 cell line were obtained from the Blueprint epigenome consortium (
http://www.blueprint-epigenome.eu/). MACS2 was used for peak calling to determine the enrichment of histone marks both at the short and larger overlapping DMRs [
17]. Delineation of significantly ChIP-enriched regions were made in the form of SICER BED files and [
18] were uploaded in UCSC custom track to align histone marks to the DNA and SE-CTCF modifications.
Annotations for CTCF, super-enhancer, and enhancer sites
Possible overlap of CTCF sites with the DMRs were determined by using ChIP-seq data for CTCF in the delta47 myeloma cell line [
19]. Possible overlaps of SE with DMRs were determined by using ChIP-seq data from Lovén et al. 2013, from BRD4 binding in the MM1.S [
20] cell line. The clustering of the SE/CTCF-DMRs were performed by determining the average linkage of DMR-methylation level using Euclidian distance method.
Statistical analysis
A two-way ANOVA was used to determine the significance in differential methylation analysis. A non-parametric two tailed Mann-Whitney U test was performed for rest of the analysis and statistical analyses were determined at a p value < 0.001 or < 0.05 (as indicated).
Discussion
In the present study, we have combined alterations in genome-wide DNA methylation and histone modifications with regulatory mechanisms of SE-CTCFs to explain the subgroup-specific differential gene expression dysfunctions in NDMM patients. The GEMs reported here not only serve as the epigenetic biomarker for the early detection of the disease, but also give an idea about successive oncogenic transitions in MM.
Autosomal epigenetic traits in B cells are considered to propagate to the daughter cells in an accumulative pattern through the stages of differentiation [
33] which strengthens the importance of differential DNA methylation as a specific predictor of disease progression in MM. We observed that the DNA methylation data points for the top 5% variable DMRs were found to affect >94% of autosomal CpG sites across the molecular subgroups. Furthermore, we show that, except for t(4;14), DNA hypomethylation was prevalent across the major genomic regions including promoters, gene bodies, or IGRs in the remaining MM subgroups compared to age matched controls. The amount of hypermethylation in t(4;14) relative to non-t(4;14) subgroups may be explained by the over-expression of MMSET, an H3K36 methyltransferase [
34]. In contrast, global DNA-demethylation in MM subgroups may be attributed to frequent mutations in epigenetic modifiers, especially
DNMT3A [
35], which may create favorable conditions of genome-wide DNA hypomethylation within and outside the CpG islands.
When we combined methylation data at variable CpGs with gene expression, it was found that even though DNA-methylation is inversely correlated to the expression in promoters of the nearest genes, they may be positively correlated to the expression at the gene bodies, as evidenced by the key genes per subgroup. Moreover, regulation of gene expression is not merely influenced by the changes in DNA methylation at promoters or within the gene body, but are tightly linked to the overlapping chromatin modifications, or regulated from the juxtaposed SE-CTCF loops (
Supplementary Table20). For instance, we observed a series of GEMs, where the DMRs were differentially hypomethylated at the gene body but remained over-expressed in a particular MM subgroup. These events could be correlated to the effect of the overlapping activating histones or regulatory control from the SEs (
Supplementary Table21). At the histone level, the majority of the hypomethylated DMRs overlapped with repressive H3K27me3, while the majority of the hypermethylated DMRs within the gene body overlapped with H3K36me3 and H3K4me1 or H3K4me3 marks. A strong association of these activating histones has already been reported in relation to their regulatory role in the alternative splicing mechanisms in cancer [
36]. The preferential occurrence of these histones has also been reported in MM as biomarkers and druggable targets [
37‐
39]. Additionally, loci-specific enrichment of H3K27ac at the DMRs suggest the existence of interstitial SE-like regulators, which may create a transcriptionally active state resulting in over-expression of genes or gene clusters in MM. We found that these acetylated and hypomethylated DMR domains are also the preferential binding sites of BRD4 or MED1. These BRD4/MED1 binding sites are generally demarcated by CTCFs at the termini that presumably form the SE-CTCF loops. SE-CTCF loops spanning the length of an entire gene or gene-cassette lead to the aberrant over-expression of genes.
The most common pathways predicted to be affected by the GEMs of different MM subgroups, were PI3K/AKT/mTOR, MAPK, Rap1 and the cell cycle. These pathways align with the existing literature, where frequent activation of the PI3K/AKT/mTOR pathway [
40], or recurrent mutations and aberrant expression of MAPK [
41] or Rap1 [
42] genes have been reported in MM patients. Interestingly, six GEMs in the t(4;14) subgroup were upregulated and are part of the Rap-guanine nucleotide exchange family, upstream of Rap1. In contrast, a different set of GEMs in the maf and HY subgroups, containing both upstream activators and downstream targets of Rap1 were down-regulated. While the upstream regulators mainly constitute membrane receptor kinases, the downstream GEMs were involved in the cellular adhesion, polarity or migration machinery. Cell-cycle pathway genes were also differentially expressed, including
CCND1,
CCND2, and
CDKN2C. These genes are associated with proliferation and prognosis in MM patients. Therefore, the present study provides deeper insights into the epigenetic control of gene expression, and its involvement in different MM signaling pathways.
We also demonstrated epigenetically controlled expression of
CCND1 and
CCND2 among the MM subgroups. For instance,
CCND1 is known to be overexpressed in the t(11;14) subgroup and occurs through the juxtaposition of the
IgH-SE next to
CCND1. This results in an epigenetic sweep of the active histone marks from the IgH locus across
CCND1. In contrast,
CCND2 is not expressed in the t(11;14) subgroup, and may be explained by the enrichment of repressive H3K27me3 marks, lack of activating H3K4me1 and H3K27ac marks, and hypomethylated DMRs within the gene body of
CCND2. In contrast, a weak SE signal is present in the t(14;16) cell line MM1.S at the
CCND2 promoter, characterized by H3K27ac, H3K4me3, and BRD4 enrichment. A similar ChIP profile is seen in the t(4;14) cell line KMS11. These active histone marks were found in conjunction with DNA hypermethylation within the gene body of
CCND2, indicating a possible interaction between the epigenetic states. An interesting hypothesis would be to alter the DNA methylation levels within the body of
CCND2 and determine if this alters the chromatin marks and expression of the gene in these high risk
CCND2 expressing subgroups [
43].
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.