Introduction

Acute lymphoblastic leukemia (ALL) has been known as the most common childhood malignancy.1 Its genetic factors have been characterized mainly by chromosomal alterations.2 On the other hand, the genetic factors responsible for increasing susceptibility to complex ALL have been quite limitedly known. Recently, genome-wide association studies (GWAS) have been conducted to identify such factors. They suggested a variety of nucleotide sequence variants associated with ALL.3, 4, 5, 6 Especially, a meaningful GWAS with a large sample size (907 cases and 2398 controls) revealed associations of ALL with variants in and around the genes encoding Ikaros family zinc-finger 1 (IKZF1), AT-rich interactive domain 5B (ARID5B) and CCAAT/enhancer binding protein epsilon (CEBPE).5 There has been great concern regarding CEBPE because it was targeted by translocation of immunoglobulin heavy chain in ALL, resulting in lymphoid leukemogenesis.7 Its promoter single-nucleotide polymorphism (SNP, rs2239633) was found to be associated with ALL in the GWAS. This association was confirmed by another GWAS8 and a replication study.9 Functions of the SNP were, however, not proven. It is even unclear whether the variant is functional or linked to the functional variant. The objective of this study was to identify functional nucleotide variants and/or their haplotypes located at the previous GWAS signal in a promoter region of the CEBPE gene.

Materials and methods

Selection of variants

We examined the promoter activity of rs2239633 and its proximity SNPs in a 5′-upstream regulatory region of the CEBPE gene. The SNPs were selected from a promoter region within the range of −2 to 0 kb based on linkage disequilibrium (LD) mapping using genotypes of 381 Europeans (CEU+TSI+FIN+GBR+IBS) from the 1000 Genomes Project (http://browser.1000genomes.org/index.html). The LD block was constructed with SNPs with Hardy–Weinberg equilibrium (P>0.001), minor allele frequency >0.001 and r2>0.8 by using the confidence interval method of Gabriel et al.10 Their haplotypes were estimated based on pairwise measures of LD using Haploview (version 4.2),11 and the corresponding frequencies were estimated by expectation–maximization algorithm.

Promoter reporter constructs

We made a construct of the CEBPE promoter region encompassing all the selected polymorphisms linked to the rs2239633. The promoter region was isolated from human genomic DNA using PCR, and the PCR products were ligated into the T-blunt vector (Solgent, Daejeon, Korea) using T4 DNA ligase (Promega, Madison, WI, USA). The fragment containing the CEBPE promoter was released from the T-blunt vector by cutting with KpnI and HindIII restriction enzymes (NEB, Ipswich, MA, UK), gel-purified, and subcloned into promoterless pGL3-basic vector (Promega).

Site-directed mutagenesis

Each haplotype construct was introduced by mutations of the target SNP sites using QuikChange XL Site-Directed Mutagenesis Kit (Stratagene, La Jolla, CA, USA). The following forward primers were utilized for point mutations: 5′-TCAGGCCCAGGGCGAGGGGCGAAGC-3′ (rs2239630), 5′-GGCTGGAAACGCACTAATATTTGGGCTATTGCACA-3′ (rs2239631), 5′-CTGGAAACGCACTAACATTTGGGCTATTGCACAGCTC-3′ (rs2239632), 5′-CACCACGCAGGCTCGTGTGTAGAGCTTGTTC-3′ (rs2239633), 5′-ACCCAACTGAGGGTTGGCACTGGAAGAGG-3′ (rs2239634), 5′-TGGAGTCCCCTGGCCTCCCAGCAGGGA-3′ (rs2239635), 5′-CTGACCAAGGAGTGTCTCCAACTGCTGAATAGG-3′ (rs12434881). The corresponding reverse primers were as follow: 5′-GCTTCGCCCCTCGCCCTGGGCCTGA-3′ (rs2239630), 5′-TGTGCAATAGCCCACATATTAGTGCGTTTCCAGCC-3′ (rs2239631), 5′-GCTGTGCAATAGCCCAAATGTTAGTGCGTTTCCAG-3′ (rs2239632), 5′-GAACAAGCTCTACACACGAGCCTGCGTGGTG-3′ (rs2239633), 5′-CCTCTTCCAGTGCCAACCCTCAGTTGGGT-3′ (rs2239634), 5′-TCCCTGCTGGGAGGCCAGGGGACTCCA-3′ (rs2239635), 5′-CCTATTCAGCAGTTGGAGACACTCCTTGGTCAG-3′ (rs12434881). All constructs were confirmed by direct sequencing.

Cell lines and culture conditions

Human embryonic kidney (HEK) 293 cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM; Wellgene, Daegu, Korea), supplemented with 1% penicillin streptomycin (Wellgene) and 10% fetal bovine serum (FBS; Wellgene) at 37 °C in a humidified atmosphere containing 5% CO2.

Dual-luciferase reporter assay

Twenty-four hours prior to transfection, HEK293 cells were seeded at 2.4 × 105 cells per well on six-well plates. The cells in each well were transfected with 1200 ng of the pGL3-basic vector or the pGL3-CEBPE haplotype vectors containing firefly luciferase reporter gene using Lipofectamine 2000 (Invitrogen, Carlsbad, CA, USA). Ten nanograms of pRL-CMV vector (Promega) encoding Renilla luciferase reporter was used as an internal control for transient transfection efficiency. Twenty-four hours after transfection, the cells were washed with PBS, and the firefly and Renilla luciferase activities were measured with luminometer (CentroPRO LB 962; Berthold, ND, USA) using the Dual-Luciferase Report Assay (Promega). Three replicates per CEBPE haplotype-reporter construct were examined in each transfection experiment. The reporter assay was performed in triplicate, and thus a total of nine replicates per haplotype-reporter construct was assayed.

Statistical analysis

Differences in relative luciferase activity among haplotypes were determined with the significance threshold of P-value<0.05 by the Tukey’s multiple test. The statistical analysis was conducted using SPSS software version 20 (Chicago, IL, USA).

Results

The LD block was estimated at the previous GWAS signal in a promoter region of the CEBPE gene (Figure 1). The block included 7 of 16 SNPs in its upstream region of −2 to 0 kb, and the seven SNPs were selected to examine their promoter activity. Haplotypes with the selected SNPs were estimated with their frequencies, and three haplotypes were obtained with their frequencies >0.1 (Figure 1). The most frequent haplotype (H1) was found to be CTTTTGT, with a frequency of 0.45. The second most frequent haplotype (H2, TCGCACC) had entirely opposite alleles to H1. A reporter construct of 1036 bp CEBPE belonging to a promoter region from −935 to +100 bp encompassing all the polymorphisms (rs12434881, rs2239630, rs2239631, rs2239632, rs2239633, rs2239634 and rs2239635) linked to the rs2239633 was made for each haplotype. Their luciferase activity revealed the strongest expression with H2 and the weakest with H1 (Figure 2).

Figure 1
figure 1

Single-nucleotide variants and their haplotypes used in the current study. (a) Nucleotide sequence of the CEBPE promoter region used in this study. Shaded letters indicate the selected SNPs with major/minor alleles. (b) Linkage disequilibrium (LD) block constructed in the promoter region. Color scheme represents the estimates of squared correlation coefficient (r2). (c) Haplotypes and their frequencies. * indicates nonnatural haplotype. A full color version of this figure is available at the Journal of Human Genetics journal online.

Figure 2
figure 2

Estimates of relative luciferase activity in HEK293 cell by haplotypes of the CEBPE promoter region. The estimate represents the average of three values resulted from experiments in triplicates, and the error bar represents its corresponding standard error. Similar results were obtained in two other independent experiments. Relative luciferase activity estimates are presented with white, gray, and black bars for only pGL3, nonnatural haplotypes and natural haplotypes, respectively. The estimates without the same letter differ by the Tukey’s multiple test (P<0.05).

Subsequently, luciferase activity was examined further to identify a specific functional nucleotide variant, which is a component of the haplotypes. We did not find any difference in luciferase activity by substituting the allele of rs12434881, rs2239630, rs2239631, rs2239634 or rs2239635 (P>0.05, Supplementary Figures S1 and S2). However, the luciferase activity increased by allelic substitution at rs2239632 and at rs2239633 (P<0.05, Figure 2). Especially, a distinctive difference in luciferase activity was observed by the nonnatural haplotype with nucleotide substitution (T to G) of rs2239632, which is 653 bp ahead of the gene. The activity level with its G allele corresponded to that of H2 (P>0.05).

Discussion

The current study identified a functional sequence variant in the promoter of the CEBPE gene by luciferase reporter assay. This study was performed based on the results of a GWAS signal (rs2239633) in the 5′-upstream region of the CEBPE gene for susceptibility to ALL in English (odds ratio (OR)=1.34, P=2.88 × 10−7, 907 cases and 2398 controls),5 Germans (OR=1.6, P=5.1 × 10−5, 1193 cases and 1516 controls),8 and Germans and Italians (OR=1.35, P=4.0 × 10−10, 1438 cases and 2735 controls).9 The functional examination of its sequence variants was preferred because CEBPE has a crucial role in dysregulation of CEBP, which is a suppressor of leukemogenesis.5 In fact, a variety of rearrangements in the CEBPE gene were found in tumors derived from patients with T-cell leukemia or lymphoma.12, 13, 14, 15, 16 Also, genetic associations of ALL with other nucleotide variants (rs4982729, OR=1.33, P=2.88 × 10−10; rs12887958, OR=1.28, P=3.16 × 10−8; rs4982731, OR=1.36, P=8.97 × 10−12; rs8015478, OR=1.32, P=4.35 × 10−10; rs17794251, OR=1.22, P=2.7 × 10−5) of the gene were identified from a more recent GWAS with multiple ethnic populations (1605 cases and 6661 controls).17

We obtained a heterogeneous gene expression by haplotypes composed of the variants linked to rs2239633 located −653 bases apart from its transcription start site, found to be associated with ALL in the previous study. The most frequent haplotype (CTTTTGT), with a frequency of 0.449, was found to decrease its expression compared with the haplotype (TCGCACC, frequency=0.335) with entirely opposite alleles to the most frequent haplotype. Further investigation unveiled causal single-nucleotide variants (rs2239632 and rs2239633) for the distinctive expression by a nucleotide substitution of the most frequent haplotype. Especially, the difference in the expression between H1 and H2 haplotypes corresponded to that between the two alleles of rs2239632. Its G allele turned out to increase the expression. The increased CEBPE might be targeted by translocation of immunoglobulin heavy chain resulting in lymphoid leukemogenesis,7 and accumulated lymphoid leukemogenesis would finally lead to ALL. This hypothesis concurred with the GWAS in which the C allele of rs2239633 was found to be associated with increased susceptibility to ALL. The C allele was in complete LD with the G allele of rs2239632. Further studies are needed to confirm the hypothesis and to unravel detailed mechanisms. For example, this study predicted changes in transcription factors and their binding sites by nucleotide substitution of the functional variant using PROMO program (http://alggen.lsi.upc.es), and identified various binding sites that disappeared by the nucleotide substitution from T to G (Figure 3). Thus, we suspect that the G allele at rs2239632 decreases the binding affinity with proteins. A study that examines the association between susceptibility to ALL and the haplotypes analyzed in the current study is also warranted. Rigorous efforts to examine the changes in the transcription factors and further mechanisms toward phenotypic changes would contribute to our understanding of the genetic architecture for susceptibility to ALL.

Figure 3
figure 3

Transcription factor binding sites predicted by haplotypes of the CEBPE promoter region. The thin horizontal line indicates the transcription factor binding site of H1, and the bold line indicates that of H2. They are displayed in decreasing order of similarity from top. C/EBPβ, CCAAT/enhancer binding protein β; NF-1, neurofibromatosis type 1; GR, glucocorticoid receptor; PR, progesterone receptor; NF1-CTF, nuclear factor 1—CCAAT-binding transcription factor; HNF-3α, hepatocyte nuclear factor-3α.