Background
Alterations of DNA methylation patterns are the most common epigenetic aberrations in cancer and occur in cells during early breast cancer development and progression [
1]. DNA methylation is a reversible biological signal that underlies tissue specific cell differentiation and cells adaptability to changes in their environment through regulation of gene expression [
2]. Specifically, it is the addition of a methyl group to DNA cytosine bases that occurs predominantly in Cytosine-phosphate-Guanine (CpG) dinucleotides [
2]. Approximately 60% of human genes contain high density of CpG dinucleotides in their promoters [
3,
4]. CpG-rich regions are mostly unmethylated in normal cells when located in regulatory regions of housekeeping genes, tissue-specific genes and tumor suppressors [
4,
5], while a methylated state of CpG islands located in promoters of some oncogenes leads to their transcriptional silencing [
6].
As DNA methylation status of large subset of sites are known to be strongly correlated with each other, approaches that capture the dynamics of several sites simultaneously across the entire genome (epigenome-wide studies) are less prone to bias than candidate gene methylation studies [
7]. Numerous genome-wide DNA methylation-profiling techniques exist, hindering the comparison of results across studies that have used different methods [
8,
9]. While the whole-genome bisulphite sequencing method provides the highest accuracy and single nucleotide resolution, it is not yet feasible for large cohorts [
9]. An acceptable compromise between coverage and precision is to target a comprehensive subset of the genome [
9]. As such, the high-throughput and relatively affordable Infinium Human Methylation 450 K (HM450k) and MethylationEPIC (EPIC) BeadChip of Illumina, which targets approximately 480,000 CpG and 850,000 CpG sites across the human genome respectively, with at least 99% coverage of RefSeq genes [
9,
10], have been widely used in epidemiological studies.
DNA methylation studies are aimed at identifying high-risk methylation patterns that may have an application in breast cancer early diagnosis and in identifying high-risk women for targeted interventions [
11]. However, robust evidence of a prospective relationship between DNA methylation patterns and breast cancer risk is still lacking. Previous reviews focused mainly on whole-blood DNA methylation studies, considered all methods of DNA methylation measurement, the results of which are inherently different and difficult to compare across different methods, and lacked the systematic evaluation of strengths and weaknesses of included studies. Furthermore, many more epigenome-wide studies of breast cancer risk have been published since, prompting the need for an updated rigorous and systematic methodological evaluation of all relevant studies. Thus, the objective of the present systematic review is to evaluate and synthesize results of epigenome-wide association studies that have used the HM450k or EPIC BeadChip, to determine if global DNA methylation and specific differentially methylated sites are consistently associated with women breast cancer risk, and to identify what could have limited the consistency of their results.
Discussion
The present systematic review of epigenome-wide DNA methylation and risk of breast cancer indicates a consistent trend toward a global blood-derived DNA hypomethylation and higher estimates of epigenetic age in women who develop breast cancer. None of the identified differentially methylated CpGs in individual studies were consistently associated with breast cancer risk across studies and sparse data precludes any conclusions from studies of breast tissue DNA methylation.
Although the overall strength of evidence is weak, since most studies were at least at serious risk of bias and the strength of associations is relatively weak, especially for epigenetic age, our findings are more consistent than those observed from studies that have used other global DNA methylation estimation methods such the luminometric methylation assay (LUMA), liquid chromatography-mass spectrometry (LC-MS) of 5-methyldeoxycytosine (5-mdC) concentration or pyrosequencing and MethyLight assay measuring the methylation of repetitive DNA elements (i.e., LINE-1, Alu, or Sat2) [
39], indicating that these methods may not capture the global DNA methylation differences between cases and controls.
A growing body of evidence suggests that well known breast cancer risk factors are associated with global DNA hypomethylation and increased epigenetic age [
40], including lifestyle and dietary factors [
41,
42], body mass index [
43], physical inactivity [
44], and hormone exposure [
45]. Furthermore, global DNA hypomethylation has been observed in cancers [
46], including breast carcinomas, indicating that DNA methylation mediates gene-environment interactions. However, effect of DNA hypomethylation depends on the genomic location of hypomethylated CpGs [
47]. In fact, while DNA hypomethylation of gene promoters is positively correlated with gene transcription, hypomethylation in repetitive elements may lead to genomic instability and reactivation of expression of transposable elements, whereas hypomethylation within gene bodies may disturb alternative splicing [
47]. Even though few studies included in the present systematic review have considered CpGs location in their analyses, there is some indication that the variability in DNA methylation between breast cancer cases and controls is driven by differential methylation of CpGs located outside CpGs islands and promoters.
The lack of evidence for consistent associations between DNA methylation at specific CpGs and breast cancer risk may be explained by methodological biases. Because DNA methylation profiles, unlike the genome, are subject to dynamic changes induced by genetic, environmental and stochastic factors [
9], identification of a causal relationship is challenging and requires the use of conventional epidemiological approaches [
9], which has been largely overlooked in most included studies.
In addition to traditional causes of biases inherent in observational designs, an important issue was related to preprocessing of methylation data. Different methods for data normalization have been developed for probe design bias correction, a systematic difference in methylation values distributions related to the use of two types of probes of different chemical properties in the HM450k BeadChip. While no single normalization method is considered the best, functional normalization method, which was used by most included studies, is appropriate for cancer/normal comparisons and vastly different tissue types, where large global methylation differences are expected [
48]. When comparing the same tissue type, functional normalization method is believed to be inappropriate as it may obscure true differences between individuals [
48]. Moreover, few studies reported exclusion of cross-hybridizing probes and probes overlapping SNPs prior to analyses, which are known to generate technical and biological artifacts that could have confounded the results [
49].
The strengths of the present systematic review include the use of the Cochrane Reviews rigorous methodology, the extensive and highly sensitive search strategy to retrieve as many relevant studies as possible, the use of a pre-established protocol, the assessment of the risk of bias, and the systematic analysis of results in light of methodological strengths and weaknesses of relevant studies. Limitations include the lack of high-quality evidence and the overall serious risk of bias in included studies, due to selection bias, confounding and data preprocessing.
Although considered relatively stable, DNA methylation is a labile and reversible feature that may vary over time, reflecting variation in environmental exposures [
50]. In fact, we observed that differences in follow-up periods may have impacted detection of differences in methylation patterns between breast cancer cases and controls, suggesting that a point measurement of DNA methylation may not predict lifetime breast cancer risk, but rather could be used for short-term prediction of breast cancer risk. It should also be kept in mind that DNA methylation patterns are tissue-specific. While tissue-specificity is generally considered of lesser concern in studies aiming at identification of biomarkers of exposure or disease risk, DNA methylation patterns obtained from accessible surrogate tissues such as blood can not be easily extrapolated to breast tissue [
11]. In fact, concordance between DNA methylation in different tissues seems to be complex and locus dependent [
51] and if high inter-tissue correlation may be present when methylation changes induced during embryogenesis are propagated soma-wide, changes occurring during adulthood and ageing are more likely to remain tissue specific [
9,
51,
52]. For DNA methylation biomarkers to have the potential to inform interventions based on epigenetic agents for prevention or treatment of breast cancer, it is necessary to demonstrate a mechanistic link between DNA methylation patterns and breast cancer occurrence [
11]. Such mechanistic link could only be supported by identification of tissue-specific DNA methylation changes in normal breast tissue prior to breast cancer occurrence [
11].
To overcame some of the observed limitations, epigenome-wide studies should use more conventional epidemiological approaches, including an ethnically homogeneous and representative sampling of breast cancer patients and proper selection of controls to minimize the risk of selection bias (such as the use of nested case-control designs). Moreover, appropriate correction of potential confounding (by adjusting or matching for breast cancer known risk factors) should be considered. Studies should also allow for a sufficient lag time (time between sample collection and breast cancer diagnosis) to minimize the risk of reverse causation (effects of an underlying breast cancer not yet diagnosed). In addition, studies should consider the impact of time to diagnosis for cases and length of follow-up in controls as changes in methylation status due to variation in environmental exposures can occur during long follow-up periods and bias the observations toward the null (toward weaker associations or no association). Finally, data preprocessing should avoid functional normalization methods, which are not suitable for detection of discreet differences between samples from the same tissue type, and should exclude cross-hybridizing probes and probes overlapping SNPs prior to analyses.
While epigenome-wide DNA methylation methods are particularly suitable for hypothesis generation, as they capture the dynamics of several sites simultaneously across the entire genome, their findings, particularly differential methylation of specific CpGs sites and related genes, should be validated using a different measurement method, with higher sensitivity and specificity, such as PCR-based methods in a candidate-gene methylation approach. In addition, any detected methylation differences should be supplemented by transcriptional or protein expression analysis to confirm their functional impact and its association with breast cancer occurrence [53]. Once validated, specific CpGs methylation status, and expression value of related genes, could be used in prospective study designs to generate comprehensive predictive models, integrating clinical characteristics and environmental risk factors that would accurately predict breast cancer risk for each woman.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.