Background
Changes in DNA methylation patterns are a known hallmark of acute myeloid leukemia (AML) and underlie AML pathogenesis [
1]. DNA methylation in patients with AML has been studied extensively and may reflect either specific molecular abnormalities or characterize a group of patients without an apparent molecular aberration. Specific translocations such as
PML-
RARA,
AML1-
ETO (
RUNX1-
RUNX1T1),
MLL translocations or
CBFB-
MYH11 fusion, as well as
CEBPA,
NPM1,
IDH1/
IDH2, DNMT3A,
TET2 and
RUNX1 mutations have been described to display distinct methylation signatures [
2]-[
4]. These epigenetic profiles are usually accompanied by specific gene expression features. Studying genes that are epigenetically deregulated in different groups of patients may contribute to a more detailed understanding of pathways involved in the leukemic transformation. Importantly, the effect of DNA methylation changes is greatly dependent on the location of differentially methylated regions (DMRs) [
5]. New approaches using next-generation sequencing enable studying of DMRs scattered throughout the genome and targeted bisulfite sequencing offers a reasonably balanced ratio between cost and informativeness (number of CpGs covered) [
6]. The link between gene expression and DNA methylation data is needed to find pathologically relevant DNA methylation changes, especially because many (or even the majority of) DMRs reflect the tissue of origin and not leukemia (cancer) specific changes [
7].
In this study, 84 megabases (Mb) of 14 AML genomes and one CD34+ pool of cells from healthy donors were captured for DNA methylation and gene expression profiling. The aim was to identify differentially methylated genes with potential impact on AML pathogenesis based on the correlation of methylation and expression data.
AML patients with
CBFB-
MYH11 fusion (
CBFB - Core-binding factor, beta subunit;
MYH11 - Myosin, heavy chain 11) resulting from inv(16) rearrangement clustered together in a hierarchal DNA methylation and expression analysis. The majority of differentially methylated regions unique for
CBFB-
MYH11 patients were hypomethylated and genes assigned to such regions were previously described as overexpressed in inv(16) AML [
8].
PBX3 (pre-B-cell leukemia homeobox 3), recently demonstrated as an important cofactor of
HOXA9 in leukemogenesis [
9], was validated as a gene whose gene expression levels correlated with DNA methylation of its putative regulatory region across AML subtypes. The importance of
PBX3 is underlined by the fact that
PBX3-overexpressing patients relapse more frequently. In summary, we discovered new genomic regions affected by aberrant DNA methylation that are associated with expression of genes implicated in leukemogenesis.
Discussion
Here, we report a
CBFB-
MYH11, i.e. inv(16), specific hypomethylation that may play a role in upregulation of some previously described inv(16) overexpressed genes [
8]. We analysed DNA methylation and expression data of
MN1,
SPARC,
ST18 and
DHRS3 in 55 AML patients and 10 healthy controls. Lower methylation levels of these genes in inv(16) patients versus other AML and healthy donors were confirmed. When measured by TaqMan gene expression assays, inv(16)-specific overexpression of
MN1,
SPARC,
ST18 and
DHRS3 was found with respect to non-inv(16) AML, but only in
ST18 in comparison with healthy donors. This was inconsistent with the Illumina microarray expression data as well as previously published data [
8]. Therefore, we re-measured the results using SybrGreen RQ-PCR and we obtained different values. In this RQ-PCR experiment, changes in expression levels (between inv(16) AML and healthy donors) were also significant for
MN1 and
SPARC. We excluded both the role of PCR nonspecificity and DNA contamination. Interestingly,
MN1 and
SPARC primers for SybrGreen RQ-PCR, and
MN1 and
SPARC probes on the expression microarray, are localized within the same exons, while
MN1 and
SPARC TaqMan probes have different, exon-exon localizations. We cannot claim that this is the only reason for the different results, this issue definitively needs deeper examination in future studies. It is of interest that the results obtained by SybrGreen RQ-PCR are in agreement with the publicly available data (GSE34823) of the study of Bletiere et al. [
13].
Average gene expression levels of MN1, SPARC and ST18 extracted from the above-mentioned dataset are higher in inv(16) AML in comparison with healthy donors' bone marrow (4-times for MN1 and SPARC, 7-times for ST18), and DHRS3 expression is basically the same in both groups.
Further, we extracted the data from The Cancer Genome Atlas (TCGA,
https://tcga-data.nci.nih.gov/tcga/) and they also confirmed a link between hypomethylation of
MN1 and
DHRS3 regulatory regions and their overexpression in inv(16) AML when compared with AML samples with normal karyotype (healthy controls data were not available). Regulatory regions corresponding to remaining genes lacked information of their methylation status in TCGA due to the absence of appropriate CpGs in HumanMethylation450 BeadChip used at TCGA study [
3]. This supports the profitability of studying DNA methylation using targeted bisulfite sequencing, which provides more complex coverage than microarray based techniques.
The hypomethylation pattern that we discovered in inv(16) AML patients is remarkable also with respect to the very recently published data of Mandoli and colleagues [
14]. For the first time, their study revealed the involvement of
CBFB-
MYH11 not only in repression but as well in transcriptional activation. The direct involvement of
CBFB-
MYH11 in overexpression of
MN1,
ST18 and
SPARC is supported with 2-fold downregulation of these genes upon
CBFB-
MYH11 knockdown as reported in their work. However, none of the 1874 high-confidence
CBFB-
MYH11 binding sites [
14] overlaps with any of the hypomethylated regions reported here.
DHRS3 was among the genes upregulated upon
CBFB-
MYH11 knockdown, which is in agreement with its disputable upregulation in inv(16) AML. There were great differences in localization of hypomethylated regions with respect to TSSs of individual genes. With regard to
MN1 and
SPARC, the hypomethylation was located not far from their TSSs (for location see Additional file
3: Table S2), which makes the assumption of their role in the expression of these genes more straightforward. Moreover,
MN1 and
SPARC potential regulatory regions overlap regions of active chromatin (enhancers) in mobilized CD34+ cells as observed in the EpiGenome Browser (
http://epigenomegateway.wustl.edu/browser) suggesting a role of these regions in transcription regulation. On the contrary, differentially hypomethylated sites assigned by GREAT to
ST18 and
DHRS3 are placed much farther from their TSSs, specifically approximately 275 kilobases (kb) downstream for
ST18 and 277 kb upstream for
DHRS3. ST18 and
DHRS3 assigned regulatory regions are placed within chromatin marked with low transcription activity and enhancer, respectively (in mobilized CD34+ cells, data from EpiGenome Browser).
MN1 expression levels have been shown to stratify prognosis of cytogenetically normal (CN) AML patients and its overexpression is connected with a poor outcome of CN-AML patients [
15],[
16]. Nevertheless, inv(16) AML patients are generally associated with a good prognosis [
17],[
18] in spite of their frequent
MN1 overexpression. Functional studies have proved that overexpression of
MN1 cooperates with inv(16) in developing AML in vivo and that neither inv(16) or
MN1 alone are capable of promoting leukemia [
19]. According to our results it seems that hypomethylation is present uniquely in inv(16) AML patients in both
MN1 assigned regions (none of the other AML subtypes or healthy controls displayed
MN1 hypomethylation in our cohort). So it supports the theory that upregulation of some genes that might involve
MN1 is crucial for inv(16) leukemogenesis and hypomethylation may be therefore needed to ensure stable overexpression of critical genes. Apparently the mechanism of
MN1 upregulation is different in non-inv(16) AML patients or potentially hypomethylation of other regulatory areas located elsewhere may be involved.
Correlation between targeted DNA methylation and microarray expression data of 14 AML patients and a healthy controls' CD34+ pool revealed
PBX3 differential methylation and gene expression.
PBX3 (pre-B-cell leukemia homeobox 3) is part of the three amino acid loop extension (TALE) family of transcription factors, which include the products of the Pbx and Meis genes and are capable of heterodimerization with the Hox proteins [
20]. Recently,
PBX3 was reported to have a synergistic effect with
HOXA9 in transforming normal hematopoietic progenitor cells in vitro as well as in vivo [
8]. Moreover,
PBX3 is one of the four genes (
HOXA6,
HOXA9,
PBX3 and
MEIS1), whose common expression signature was shown to influence overall survival in CN-AML [
12]. All evidence points to an important role of
PBX3 in leukemogenesis. This is the first report uncovering DNA methylation as a plausible regulator of
PBX3 expression. We found a strong negative correlation between levels of
PBX3 methylation and expression in 8 healthy donors' samples and 30 AML patients at diagnosis (P < 0.0001 and P = 0.002 for upregulation and downregulation, respectively). Localization of
PBX3 differential methylation overlaps
TAF1 binding site according to ENCODE ChIP-Seq data from UCSC genome browser. TAFs (TBP-associated factors) create a stable complex with
TBP (TATA-binding protein) and
RNAPII to form a preinitiation complex, so we may assume that DNA methylation status of
TAF1 binding site can directly influence the accessibility of DNA for transcription enzymes. The probability of transcription initiation is possibly dependent on whether the DNA methylation is low with high expression rates or DNA methylation is high with decreased expression or finally intermediate DNA methylation corresponding to in-between expression levels. As
PBX3 has a CpG island (CGI) overlapping its TSS, we also looked at its methylation status. Based on targeted methylation data, there was no methylation present either in AML or healthy controls. Therefore, methylation status of downstream located control element rather than CGI methylation is most likely crucial for
PBX3 expression. Further we focused on potential prognostic significance of
PBX3 expression in terms of overall survival (OS) and incidence of relapse. High
PBX3 expression levels were not related to different OS compared to AML patients with low
PBX3 expression, however relapse rates were significantly higher in
PBX3-overexpressing patients by both univariate and multivariate testing. This suggests more aggressive phenotype/course of disease of these patients, which is not reflected in the OS probably due to the early and effective treatment of relapses - often followed by bone marrow transplantation. We also showed that
PBX3 overexpression did not occur in AML patients within cytogenetically favourable subgroup.
We validated the methylation/expression correlation stated in the Additional file
5: Table S3 also for
GFI1. Moreover, the observed correlations are further supported by the presence of genes, for which the role of DNA methylation is already published such as
MPO[
21],
CEBPα,
DAPK1,
IRF8 and
PRDX8[
22],[
23].
Materials and methods
Patients
For targeted bisulfite sequencing, 14 AML patients at diagnosis (see Table
1) and pooled CD34+ cells from 4 healthy donors were sequenced. Genes selected based on targeted bisulfite sequencing results were examined using 454 bisulfite pyrosequencing in a larger cohort of AML patients (their characteristics are given in Additional file
7: Table S4). Informed consent was obtained from all patients and healthy blood donors enrolled in the study. The study was approved by the IHBT Institutional Ethics Committee according to the Helsinki Declaration.
Sample preparation
Mononuclear cells (MNC) from peripheral blood (PB) or bone marrow (BM) of the AML patients at diagnosis were separated by Ficoll gradient centrifugation (Histopaque, Sigma-Aldrich, Steinheim, Germany). CD34+ cells were harvested from buffy coats of healthy blood donors using MicroBead kits (Miltenyi Biotec GmbH, Bergish Gladbach, Germany). The CD34+ pool was created by mixing of 4 individual healthy blood donors' separated cells (all of them men aged 42 to 58 years old, median age 45.5). DNA and RNA were extracted using AllPrep DNA/RNA Mini Kit (Qiagen, Hilden, Germany). Bisulfite conversion was performed from 1 μg of DNA by EpiTect Bisulfite Kit (Qiagen) and eluted into 40 μl of EB buffer. cDNA was prepared using M-MLV RT (Moloney Murine Leukemia Virus Reverse Transcriptase, Promega, Madison, WI, USA).
Targeted bisulfite sequencing
Preparation of targeted bisulfite libraries started with 3 μg of genomic DNA and was carried out using SureSelectXT Human Methyl-Seq kit (Agilent, Agilent Technologies, Santa Clara, CA, USA) according to the manufacturer's instructions. Libraries were multiplexed into 4 pools and each pool was sequenced on 2 HiSeq-2000 (Illumina, San Diego, CA, USA) lanes using 105 bp paired-end sequencing reads with average coverage of 83 - ranging from 46 to 131.
4 bisulfite pyrosequencing
Bisulfite-treated (BS) DNA was subjected to 2-round PCR. In the 1st round of PCR, loci-specific primers with M13 universal tails were used to amplify regions of interest. Subsequently, primers specific to M13 universal tails now tailed with 454 -specific sequencing primers and a unique barcode sequence (MID) were applied to the 2nd PCR. Loci-specific primers were designed with Methyl Primer Express v1.0 software (Applied Biosystems Inc. Foster City, CA, USA; see Additional file
8: Table S5 for primer sequences). HotStarTaq DNA polymerase (Qiagen) and manufacturer's recommended PCR reaction conditions were used for amplification. 2 μl of BS DNA was added to the 1st PCR and 1 μl of 100× diluted 1st round PCR product was subjected to the 2nd PCR. PCR cycling conditions were as follows: 1st round PCR - initial denaturation (15 min at 95°C), followed by 35 cycles of denaturation (30s at 94°C), annealing (30s at Ta °C, Ta - annealing temperature, see Additional file
8: Table S5) and extension (1 min at 72°C), final extension (10 min at 72°C); 2nd round of PCR - initial denaturation (15 min at 95°C), followed by 26 cycles of denaturation (30s at 94°C), annealing (30s at 60°C) and extension (1 min at 72°C), final extension (5 min at 72°C). All amplicons after 2nd round of PCR (up to 288 for one run) were purified using Agencourt AMPure
XP magnetic beads (Beckman Coulter, Fullerton, CA, USA) and Biomek® FXP Laboratory Automation Workstation (Beckman Coulter). Precise concentration of amplicons were determined using Quant-iT™ PicoGreen dsDNA Assay Kit (Life Technologies, Carlsbad, CA, USA) and amplicons were equimolarly pooled to obtain amplicon library with 10
9 fragments/μl concentration. Subsequent procedures were carried out according to 454 amplicon sequencing manuals (454 Life Sciences, Roche Applied Science, Branford, CT, USA) on the GS Junior sequencer (Roche). An average overall coverage of 225 reads was observed (ranging from 53 to 659 for individual amplicons).
mRNA microarray profiling
Gene expression profiles from 14 AML patients and 4 CD34+ cells of healthy controls were generated by HumanHT-12 v4 Expression BeadChip Kit (Illumina). The chip scanning was done with a BeadStation 500 instrument (Illumina).
Quantitative real-time PCR (RQ-PCR)
The expression levels of selected genes were assessed with TaqMan Gene Expression Assays (Life Technologies) - see Additional file
9: Table S6 for individual Assay IDs.
GAPDH was utilized as a house-keeping gene. Amplification was carried out with TaqMan Universal Master Mix II (Life Technologies) and recommended cycling conditions. SybrGreen RQ-PCR was performed using QuantiTect SYBR Green PCR Kit (Qiagen) and pre-designed KiCqStart® SYBR® Green primers (Sigma-Aldrich). Each sample was run in duplicates on a StepOne instrument (Life Technologies).
Molecular genetics
The presence of internal tandem duplication (ITD) in the juxtamembrane (JM) and tyrosine kinase 1 (TKD1) domains (exons 12-14) of
FLT3 gene and the presence of
CBFB-
MYH11 fusion transcript at diagnosis was detected as described previously [
24]. Further, we examined mutations in
NPM1[
25],
CEBPA[
26] and
DNMT3A[
27] and intragenic
MLL abnormalities such as partial tandem duplications (
MLL-PTD) by direct sequencing [
28],[
29].
Cytogenetics
For cytogenetic analyses and fluorescence in situ hybridization (FISH) the samples of bone marrow were cultivated for 24 hrs in medium RPMI 1640 with 10% of fetal calf serum. Twenty G-banded Wright-Giemsa stained mitoses, if available, were evaluated. The karyotypes were described following ISCN 2013 nomenclature. For precise identification of chromosomal aberrations, FISH with locus specific DNA probes (Vysis, Downers Grove, IL, USA) and multicolor FISH with color kit probes and ISIS computer analysis (both from Metasysteme, Altlusheim, Germany) were used.
Data processing and statistics
Data from targeted bisulfite sequencing were processed and evaluated using freely available programs: (i) FastQC [
30] (quality control of reads), (ii) Trimmomatic [
31] (removal of adapters/primer-dimers and bases with low-quality scores), (iii) Bismark [
32] (methylation-aware alignment of reads to the reference genome and computation of methylation ratios). Differentially methylated target regions were assigned to genes using the on-line annotation tool GREAT [
11]. Quantile normalization and subtraction of background was applied to the raw microarray expression data in BeadStudio Data Analysis Software (Illumina). Raw data from 454 pyrosequencing were processed using a filter template to relax the stringency of the original valley filter (kindly provided by Dr. Esteban Czwan, Roche). This step was necessary due to the lower complexity of bisulfite treated DNA containing long stretches of homopolymers. The filter template is available on-line (Additional file
10) and its usage is described in Additional file
11. Data from 454 pyrosequencing were aligned to a reference in GS Amplicon Variant Analyzer (AVA) (Roche) software and DNA methylation levels were assessed using the web-based software BISMA [
33].
Kaplan-Meier curves and two-sided log-rank test were used to estimate the overall survival and to compare differences between survival curves. The relations between qualitative parameters were compared in contingency tables using Fisher's exact test. For analyses of quantitative data, medians were detected and non-parametric two-tailed Mann-Whitney tests were performed. All these tests were conducted at a level of significance of 0.05 using GraphPad Prism4 software (GraphPad Software, San Diego, CA, USA). Cox regression analysis. was performed applying the SPSS statistical software (SPSS Inc., Chicago, IL, USA).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Competing interests
The authors declare that they have no competing interests.