Background
Colorectal cancer (CRC) is a prevalent disease, particularly in the Western world, with 1.36 mm cases diagnosed worldwide in 2012 [
1]. As with all cancers, CRC encompasses multiple molecular subtypes with specific characteristics [
2]. The CpG island methylator phenotype (CIMP) is one subtype, and describes tumours with a high frequency of hypermethylation at CpG islands [
3].
While there is no consensus on a gene panel to determine the CIMP status of a tumour, one of the most commonly used is the Weisenberger panel of genes comprising of
CACNA1G, NEUROG1, RUNX3, SOCS1 and
IGF2 [
4]. CIMP can be further split into CIMP-high (CIMP-H) and CIMP-low (CIMP-L), which show high and intermediate levels of hypermethylation respectively [
5]. The CIMP-L subtype, defined as tumours with 1/5 to 3/5 of these marker genes methylated, is associated with
KRAS mutations and is more common in men [
5]. CIMP-H tumours, defined as tumours with hypermethylation at >3/5 marker genes, are significantly associated with mutations in
BRAF, female patients and location in the proximal colon [
4,
5]. Recently, colorectal tumours have been split into further methylation subtypes. Hinoue et al. identified four subtypes based on hierarchical clustering of DNA methylation at loci exhibiting high inter-tumour variability [
6]. Two, representing CIMP-H and CIMP-L tumours, were associated with
BRAF and
KRAS mutations, respectively. Tumours in the third cluster were associated with
TP53 mutations and prevalence in the distal colon, while the fourth cluster was enriched for tumours from the rectum, with low rates of
KRAS and
TP53 mutations.
Hypermethylation occurs primarily at CpG islands, the majority of which are unmethylated in normal tissue and are found near the promoter region of approximately 70% of mammalian genes. ChIP-Seq experiments have demonstrated proteins including KDM2A and CFP1 preferentially bind unmethylated CpG islands [
7,
8]. The regions surrounding CpG islands, termed island shores, are important for cellular differentiation and are also targets of aberrant methylation in cancer [
9]. Hypermethylation in cancer occurs preferentially at genes that, in embryonic stem cells, exhibit the repressive H3K27me3 histone modification laid down by the Polycomb group (PcG) proteins [
10]. Cells lacking members of the PcG complex are unable to complete normal cellular differentiation [
11]. Many H3K27me3 marked genes also harbor the activating H3K4me3 mark in embryonic stem cells, a state referred to as ‘bivalent’, and these genes are enriched for roles in development and differentiation [
12,
13]. Preferential hypermethylation of developmental and differentiation genes supports the epigenetic switching model, in which developmental regulators that are temporarily silenced by histone modification in stem or progenitor cells are often heavily DNA methylated in cancer [
14]. This model proposes that bivalent genes, which would normally lose PcG protein occupancy and become upregulated, are maintained in a stably repressed state by the presence of aberrant DNA methylation, inhibiting differentiation [
14,
15].
In this study, we characterized global cancer-specific methylation patterns of 94 CRC tumour samples and matched tissues at very high resolution. We find the frequency of hypermethylation at genes follows a steady continuum from CIMP-N to CIMP-L to CIMP-H tumours. We identified a core set of 132 genes that were hypermethylated in all CIMP-H tumours and associated preferentially with genes involved in development and differentiation.
Discussion
In this study we utilized the high density coverage of the Illumina 450 K methylation array to characterize DNA methylation in CRC. Using this genome wide approach, and consistent with previous studies, we identified three methylation subtypes (high, intermediate, and low levels of methylation). The CIMP-H subtype was enriched for tumours from the proximal colon and female patients, as previously observed [
4,
5]. Notably, only 84% of CIMP-H tumours in our dataset were classed as CIMP-H using the Weisenberger panel [
4]. Using the Illumina Infinium HumanMethylation 27 K array, two publications [
6,
33] have proposed splitting the CIMP-N subtype into two groups, one of which is enriched for distal tumours and TP53 mutations, and the other enriched for rectal tumours with a low frequency of mutations. Our hierarchical clustering dendrogram did not support the division of the CIMP-N subgroup into two groups, which might reflect the use of different clustering techniques or probe sets. Morever, a recent publication performed clustering based on 10,000 CpG probes also identified only three tumour subtypes [
34].
An unexpected finding during our analysis was that a small number of tumours classified as CIMP-H had fewer hypermethylated genes than some CIMP-L tumours, and a small subset of CIMP-N tumours had a higher number of hypermethylated genes than some CIMP-L tumours. Thus, there is no distinct boundary in the number of hypermethylated genes between the CIMP subtypes, despite a difference in the average number of hypermethylated genes between the subtypes. The lack of distinct boundaries between CIMP groups might be explained by a variable number of stochastic hypermethylation events that contribute to the overall frequency of hypermethylated genes in each tumour.
The high density of the 450 K arrays enabled us to use the Wilcoxon Signed Rank test to interrogate methylation at all CpGs in each island and island shore. The benefit of this method is selection for genes with the greatest changes in methylation, and is an improvement on previous methods that identified hypermethylated genes on the basis of one or two differentially methylated CpGs. We identified over 2000 genes significantly differentially methylated between tumours and matched normal tissue. This is comparable to a previous study that mapped differential methylation to 1465 RefSeq genes [
31]. Many of the genes observed to be differentially methylated have been identified previously, including
GATA4/
5 [
35],
SFRP2 [
29], and the previously proposed serum and stool CRC marker genes
EYA4 [
36] and
TFPI2 [
37].
EYA4 and
TFPI2, along with
TLX1, were the three most frequently hypermethylated genes in our dataset. Although, to the best of our knowledge,
TLX1 hypermethylation has not been previously associated with CRC, one study showed it is methylated in a high frequency of early stage breast cancers [
38]. Activation of
TLX1 through either chromosomal relocation [
39] or promoter CpG island demethylation [
40] is associated with T-cell acute leukaemia. Notably, we observed multiple members of the
LHX,
LMX,
NKX,
PAX and
TBX families of transcription factors were hypermethylated in all CIMP-H tumours. These transcription factor families have roles in development, spatial patterning and tissue homeostasis, and the aberrant silencing or expression of these genes has been associated with tumour growth kinetics and malignancy potential.
PAX genes are widely expressed and associated with maintaining tissue homeostasis or wound repair and may play a role in maintaining progenitor cell pluripotency. Loss of expression and hypermethylation of
PAX genes in cancer was recently reviewed in [
41].
Mutations observed in tumours are commonly classified as driver or passenger events, the former being few in number, high in frequency and common to multiple tumour types, while the latter occur in many different genes and appear sporadically. This mutation landscape of tumours has been described as comprising ‘mountains’ (rare genes that are mutated frequently) and ‘hills’ (the many genes that are mutated rarely) [
42]. Hypermethylation of genes also resembled this model, with many hypermethylated in a small number of tumours, and a smaller set of genes hypermethylated in the majority. When we scored only CIMP-H tumours the number of ‘mountains’ was much higher than observed in CIMP-N tumours. Strikingly, 132 genes are hypermethylated in 100% of CIMP-H tumours. These were highly enriched for roles associated with segment specification, morphogenesis, and development.
To explain the hypermethylation of 132 genes across all of our CIMP-H tumours we consider three potential models. The first model relies on natural selection, following disruption of the strict controls maintaining normal epigenome homeostasis, to shape all tumours towards a similar pattern (convergent evolution). Deletion of transcription factor binding sites leads to normally unmethylated CpG islands becoming progressively more methylated, thus the accumulation of methylation might represent a marker of dysregulated pathways in which transcription factor binding is no longer occurring [
43]. The second model involves mutation of an upstream factor, e.g. transcription factor or chromatin remodeler, causing a specific set of genes to become hypermethylated (the “instructive model” [
44,
45]). A study of gliomas, which also display the CpG island methylator phenotype (G-CIMP), demonstrated that the introduction of a single mutation in
IDH1 into primary human astrocytes rearranges the epigenome to match G-CIMP tumours [
46]. This demonstrates that, at least in some tumour types, a single mutation can reproducibly induce genome-wide changes in methylation, however no single mutation that is sufficient to cause CIMP in CRC has yet been identified. Tahara et al., analysed CRC mutations and found the chromodomain genes
CHD7/8 frequently had non-silent mutations in CIMP-H tumours [
47]. Further, the authors showed that genes previously identified as differentially methylated in CRC are frequently bound by CHD7. The activating
BRAF V600E mutation is tightly correlated with the CIMP-H subtype [
4].
BRAF activity has recently been reported to increase activity of the MAFG protein through the RAS-RAF pathway, leading to the recruitment of a repressor complex that facilitates promoter hypermethylation [
48]. However, not all CIMP-H tumours carry this mutation, suggesting either MAFG activity is increased via a different mutation or an alternative mechanism of hypermethylation exists in CIMP-H tumours.
In the third model, CIMP-H tumours reflect the epigenetic state of the tumour-initiating cell. Tumour DNA methylation profiles have previously been shown to reflect their tissue of origin [
49]. A plausible explanation for the drastically different epigenetic profiles of CIMP-H compared to CIMP-N tumours is that the tumour-initiating cell was in a different developmental state (e.g. progenitor compared to terminally differentiated). The developmental state of tumour-initiating cells has been shown to influence the characteristics of tumours [
50]. Chow et al. used a single event, activation of the Sonic HedgeHog pathway, to transform both Neural Stem Cells (NSCs) and Neural Progenitor Cells (NPCs) into tumour-initiating cells. The tumours derived from NSCs and NPCs displayed different molecular characteristics, demonstrating the association between differentiation stage (or epigenetic state) of a tumour-initiating cell and the tumour subtype. The epigenetic state or differentiation stage of a cell might influence a resulting tumour through the pattern of histone modifications or the complement of transcription factors being expressed. We observed 89% of the genes hypermethyalated in all CIMP-H tumours were PcG targets (data not shown). In non-malignant cells, H3K27me3 (the repressive histone modification laid down by PcG proteins) provides a transient repression of transcription factors that, when activated, cause differentiation [
51,
52]. This suggests tumours derived from cells at different stages of differentiation would acquire different hypermethylation profiles, given the established relationship between H3K27me3 and DNA methylation [
13].
Acknowledgements
We thank Dianne Hyndman of AgResearch at the Invermay Agricultural Centre for assistance with the Illumina HumanMethylation arrays. We thank Associate Professor Richie Soong and Dr. Touati Benoukraf at the Cancer Science Institute at the National University of Singapore for their help and guidance, and the Marjorie McCallum scholarship which facilitated travel to Singapore.