Introduction

Autism spectrum disorder (ASD) is collectively used to refer to the heterogeneous collection of neurodevelopmental disorders with characteristics of severe impairments in social interactions and communication, combined with restrictive and repetitive interests and behaviors.1 ASD occurs across ethnic, racial and socioeconomic levels at a rate estimated by the CDC in 2012 to be 1 in 88 children.2 Boys with ASD outnumber girls around 5 to 1, making the prevalence in boys about 1 in 54. Although increased awareness and diagnosis is expected to explain a portion of the striking increased prevalence in ASD over the last two decades, many of the factors behind the rising rates have yet to be understood.3 An overview of the evidence for genetic and environmental risk and protective factors in ASD will be provided in this review.

Epigenetic mechanisms such as DNA methylation act at the interface of genetic and environmental factors, and therefore may be an important portal into the complexity of autism genetics. Genome-wide technologies have really served to broaden the view of the methylome and revealed the importance of tissue specificity, developmental dynamics and genomic location in the context and relationship of DNA methylation and gene transcriptional patterns. I shall further discuss the potential relevance of genome-wide methylome mapping technologies and their use in understanding the interface between genetic and environmental risk and protective factors.

The complex genetics of ASD

Historically, data from monozygotic twin studies provided an estimated very high heritability rate for autism above 90%.4, 5 However, in more recent twin studies, the estimates of dizygotic twin concordance have been increasing and monozygotic twin concordance have been decreasing.6 A recent large twin cohort study actually put the estimated risk of shared in utero environment at 30–80% for ASD higher than the genetic risk for ASD, at 14–67%.7 These studies have suggested that although there is a strong genetic basis for ASD in some individual cases, the genetics of ASD is decidedly complex, and many cases of ASD are likely to involve complex interactions between genetic and environmental risk factors.

A recent study of almost 10 000 dizygotic twin pairs assessed for autistic traits have provided evidence for a genetically based ‘female protective effect’ in ASD.8 Siblings of female probands scoring above 90% on population-based autistic trait distributions were significantly more likely than those of male probands to also score above 90%. These results support the idea that females may require a greater ‘load’ of familial etiological factors to manifest autistic traits than males, adding to evidence previously obtained in the analysis of copy-number variants in large autism cohorts.9, 10 Another potential explanation for the female protective effect in ASD comes from a study of Turner’s syndrome individuals with a single X chromosome, in which evidence for an imprinted X-linked locus for social behavior was obtained.11 The female protective effect in ASD is also predicted to factor into predictions of female carrier state of genetic heritability of risk factors, both for genes on the X chromosome as well as autosomal inheritance with female protection.12

Genetic approaches for investigating genetic etiology of ASD have included cytogenetic analyses, linkage analyses, genome-wide association studies (GWAS), copy-number variation (CNV) analyses and whole-exome sequencing approaches. Cytogenetic analysis can detect large deletions, duplications or repeat expansions such as the FMR1 CGG expansion in fragile X syndrome or the >12 Mb duplications in 15q11-q13 that each make up ∼1–2% of ASD cases.13, 14 Microarray technologies allow the finer resolution of CNVs, including deletions and duplications ranging in size from several kb to several Mb.15 Although many CNVs are polymorphic between humans, a higher frequency of rare de novo CNVs has reproducibly been observed in ASD compared with controls.16, 17, 18, 19 Analyses of specific CNVs recurrent in autism have identified several recurrent genes and gene pathways, including neuronal cell adhesion (NLGN1, NRXN1 and ASTN2), ubiquitin pathway (UBE3A, PARK2, RFWD2 and FBXO40)20 and GTPase/Ras pathway (AGAP1, SYNGAP1 and CDH13).21

However, the total burden of CNVs and their overall size appear likely to be as important as the individual genes. Interestingly, a recent study showed that increased CNV deletion size correlated primarily with reduced intelligence quotient but increased CNV duplication size correlated with reduced sociability traits.22 Furthermore, a recent analysis focused on highly dynamic, CNV hotspots associated with autism or developmental delay syndromes demonstrated that total burden of both rare and common duplications is significantly associated with autism.23 Together, these studies indicate that large-scale CNVs are clearly an important genetic risk factor for ASD, but highlight the need to look beyond the individual genes disrupted by CNVs to consider both rare and common de novo events and total CNV burden that affect common genetic pathways, as well as the impact of large duplications on ASD risk.

GWAS has focused on the importance of single nucleotide polymorphisms (SNPs) in the heritability of ASD through the analysis of common variants. One of the largest GWAS studies compared ASD cases and controls, and identified a significant association of chromosome 5p21 in a large gene desert located between two cadherin genes (CDH9 and CDH10) involved in embryonic cell adhesion.24 Two other GWAS studies replicated association to 5p14-15, but not to the same location.25, 26 An additional GWAS study found significant association at chromosome 20p12 within the MACROD2 gene locus,27 encoding an ADP-ribosylase implicated in chromatin and DNA repair events within the nucleus.28, 29 In a second stage GWAS analysis with increased ASD probands, no significant association was observed, but the strongest candidate mapped to CNTNAP2 on chromosome 7,30 a gene previously implicated in ASD.31, 32

More recently, whole-exome resequencing efforts have focused on identifying rare variants within protein-coding sequences in ASD cases but not controls. The first whole-exome study of 20 individuals with sporadic ASD identified four potential causative de novo variants in FOXP1, GRIN2B, SCN1A and LAMC3.33 A second study by the same group also later identified CHD8 and SCNA2 as causative for ASD.18 Interestingly, this study demonstrated that de novo mutations were paternal in origin and the number of mutations positively correlated with increasing paternal age. Therefore, the environmental factor of increasing paternal age may appears to have an impact on the genetic risk for rare causal variants of ASD, similar to the well-known affects of maternal age on genetic aneuploidies such as trisomy 21 causing the neurodevelopmental disorder Down syndrome.34

Environmental risk and protective factors for ASD

Epidemiology studies of ASD risk have continued to find additional environmental factors that both independently and together with genetic factors, increase the risk for ASD. Most of the current evidence has focused attention on parental, perinatal and obstetric factors. Some historic environmental exposures, though currently rare in the population, provide important evidence for exogenous factors that can greatly increase risk for ASD. These include rubella infection or medications such as thalidomide or valproate exposure during pregnancy, which each have been implicated in several hundred-fold increase in autism risk.35, 36, 37, 38 More recent epidemiological studies have reinforced maternal infection, detected as the occurrence of fever or influenza during pregnancy, as significantly increasing ASD risk.39, 40 Animal models using artificial immune challenge paradigms, however, have reinforced and confirmed the findings that it is the maternal immune challenge itself, rather than a specific viral infection, that is likely responsible.41 Support for immune dysregulation in ASD has come from evidence for maternal autoantibodies and altered cytokine responses in both mothers and probands with ASD.42, 43, 44

Additional contributing maternal risk factors for ASD include advanced maternal age,45 maternal obesity,46 later birth order47 and shorter pregnancy spacing interval.48 Maternal exposure to common environmental pollutants is also an emerging concern as factors such as air pollution5, 49 and pesticides50 have been associated with increased ASD risk. Genetic susceptibility likely has a major role in the relatively low odds ratios observed with these environmental risk factors.

Recently, epidemiological studies have raised the potential importance of transgenerational environmental effects in the risk for ASD. First, one study revealed a significant association with advanced grandpaternal age and risk of autism in grandchildren, suggesting that autism risk could develop over generations.51 A prior study found increased grandmaternal age to be associated with autistic traits in grandchildren, and proposed a meiotic mismatch methylation hypothesis to explain their unexpected results.52 Briefly, the meiotic mismatch methylation hypothesis suggest that the vulnerable window of methylation changes during the grandmother’s pregnancy is in the second trimester meiosis I events in the fetal ovary, when the paired and recombining grandparental chromosomes could potentially be influenced both by environmental factors and by genetic mutations such as small deletions and duplications on the grandpaternal chromosomes during pairing. A transgenerational effect that cannot be explained by fetal meiotic events, however, is a large population-based longitudinal cohort study (Nurses’ Health Study II) that identified maternal exposure to child abuse in early life as a risk factor for having a child with autism.53 These studies from human population studies are perhaps less surprising in light of emerging evidence for environmental factors such as stress54, 55, 56 or environmental enrichment,57 endocrine disruptors, such as vinclozolin58 and BPA,59 and nutrition60, 61 that have all been shown to exhibit transgenerational effects in animal models relevant to ASD.

Interestingly, prenatal folic acid supplementation is a protective factor for both neural tube defects and ASD. Folate and folic acids are major environmental contributors to DNA methylation and are protective factors for a number of human diseases, but the epigenomics of protection is poorly understood. Studies have shown that mothers of children with autism were significantly less likely to take a prenatal supplement around conception and reported significantly lower average intake of folic acid than mothers of typically developing children during the first pregnancy month.62, 63, 64 Further, if mothers did not report taking prenatal supplements periconceptionally and they or their child had the susceptible MTHFR allele, their child was at much higher risk for autism.62 As supplemental folic acid demonstrates antioxidant properties65 and rectifies conditions of oxidative stress and low methylation capacity66, 67 potentially induced by environmental toxins, the reduced risk associated with periconceptional prenatal vitamin use could be a result of mechanisms that combat the effects of environmental toxins by providing important methyl donors for DNA methylation.

Perinatal life is a critical time for DNA methylation and for susceptibility to environmental factors

Epigenetic marks such as DNA methylation may be able to explain some of the variability observed with both environmental and genetic risk factors in autism. Epigenetic modifications to nucleotides or chromatin provide long-lived effects on gene expression and phenotype without modifying the DNA sequence. Epigenetic mechanisms such as DNA methylation can be altered by environmental changes68, 69 and are heritable and stably maintained following environmental exposures, thus providing an important interface between genetic and environmental risk factors in complex disorders such as autism. Several environmental exposures have been correlated with reduced global DNA methylation in humans.67, 69, 70 Nutritional modification of folate levels can alter DNA methylation with profound effects on phenotypic outcome of social animals such as queen determination in honeybees,71 as well as agouti coat color and obesity in mice.72 Deficiencies in methylation and oxidative stress pathways have been implicated in autism.67 Therefore, the protective nature of folate and other B vitamins is likely at the epigenetic interface of DNA methylation, where it may counteract the impact of environmental factors that reduce DNA methylation levels.

The earliest stages of pregnancy are the most critical for dietary methyl donors. Oocytes, early pre-implantation embryos and embryonic stem cells have higher overall levels of DNA methylation, as they utilize non-CpG methylation in addition to higher CpG methylation as compared with differentiated cell types.73 Two global waves of demethylation and remethylation occur in early development, first at an early postzygotic stage and again in the fetal primordial germ cells.74 Pre- and periconception stages are likely the most critical windows for folic acid protection of neural tube defects and autism because these early embryonic events require methyl donors for dynamic DNA methylation changes at a time when they are also most vulnerable to environmental exposures.

The developing human brain is also acutely sensitive to alterations in epigenetic pathways, as observed by the fact that mutations in epigenetic effectors can result in human neurodevelopmental disorders.75, 76 Classic examples of ASD caused by epigenetic mediators include Rett syndrome, caused by mutations in the X-linked gene encoding the ‘reader’ of DNA methylation marks methyl CpG-binding protein 2 (MeCP2),77 and Rubinstein-Taybi syndrome, caused by mutations in the gene encoding the transcriptional activator CREB-binding protein.78 Levels of MeCP2 protein in brain are reduced in 79% of ASD individuals compared with controls and correlated with increased methylation of the MECP2 promoter in males.79 Alterations in DNA methylation patters have also been observed in blood in the circadian gene RORA and the oxytocin receptor gene OXTR in cases with ASD but not controls.80, 81

Controlling genetic susceptibility is likely to be critical to understanding environmental interactions at the epigenetic interface. My laboratory recently used a mouse model of genetic and epigenetic susceptibility to ASD, the Mecp2308 mutant mouse, and perinatally exposed the dams to human-relevant doses of organic pollutant polybrominated diphenyl ether (PBDE).82 PBDE exposure of the dams resulted in long-lasting effect on learning and social behaviors, primarily in the female offspring. Decreased sociability was associated with reduced global DNA methylation levels in female but not male offspring, and a compounding interaction of both PBDE exposure and MeCP2 mutation was observed in a test of long-term spatial memory.

Furthermore, the reciprocally imprinted human disorders Angelman and Prader–Willi syndromes83, 84 have highlighted the importance of parental imprinting in the brain development and how methylation imprinting errors can be sufficient to cause neurodevelopmental problems.85, 86 Interestingly, imprinting mutations in Angelman syndrome, while rare in the population, are significantly increased in offspring from pregnancies obtained through artificial reproductive technologies.87, 88, 89 In addition, the most frequent large CNV observed in ASD are duplications of 15q11-q13 (Dup15q syndrome),13 and my laboratory has recently observed an unexpected but significant association with the persistant organic pollutant PCB-95 levels in the brain and Dup15q syndrome.90 Together, these observations have suggested that genes and environmental factors are intertwined within epigenetic pathways in the risk for ASD.

Genome-wide analyses of the methylome of relevance to neuronal development and ASD

In the past, DNA methylation analyses were mostly limited to the analysis of individual gene promoters and CpG islands (CGIs) through time-consuming and low-throughput bisulfite sequencing-based methods. Analysis of the whole human DNA methylome at base resolution through recent next-generation sequencing efforts has revealed striking differences in the epigenomic landscapes of pluripotent and lineage-committed human cells.73, 91 In brief, these studies have confirmed that CGI promoters are strongly depleted for DNA methylation, and gene promoter methylation is inversely correlated with gene expression. However, the ‘shores’ of CGIs, defined as the regions 2 kb upstream and downstream of the CGI promoter, carry the most informative marks relevant to tissue-type discrimination.92 Interestingly, gene bodies and intergenic regions show high (>75%) methylation in many human tissues, including embryonic stem cells and cortex.73, 92, 93, 94 In all cell types, gene body methylation levels actually positively correlate with transcript level, making methylated-associated silencing of CGI promoters that is observed for X-chromosome inactivated or imprinted genes the exception rather than the rule.

But some of the best evidence for DNA methylation levels having a positive association with transcription comes from genome-wide methylome sequencing of multiple eukaryotic species. Gene body methylation is evolutionarily conserved in eukaryotes and shows a parabolic correlation with gene expression, as intermediately expressed genes show the highest level of gene body methylation.95 Highly developed multicellular organisms have increased levels of DNA methylation compared with lower organisms within kingdoms.95 In humans, gene body methylation levels and patterns vary considerably between cell types and developmental stages,92, 96, 97 with pluripotent embryonic stem cells showing drastically different epigenomic landscapes than primary fibroblasts. These studies suggest that increased methylation levels, as well as differential methylation patterns marking cell lineages and developmental stages, is a feature of recent evolution.

DNA methylation levels also appear to be critically important for the mammalian nervous system. In the context of neurons, increased transcription of neurogenic genes during neuronal differentiation require the DNA methyltransferase Dnmt3a and the deposition of DNA methylation patterns at regions flanking promoter regions.98 This suggests that neurogenic transcription requires an unmethylated promoter flanked by highly methylated shores. Multiple studies indicate that the proper deposition of methylation patterns is important for the brain function, as indicated by the fact that DNMT1 and/or DNMT3A regulate synaptic function,99 memory formation100 and behavioral plasticity.101

Active demethylation is another process by which DNA methylation patterns can be diversified and is hypothesized to be important in activity-dependent transcriptional responses of neurons. Several studies indicate that demethylation may have a role in neurogenesis102 and neurotransmission,103 and implicate TET1 hydroxylation of methylcytosine in the brain.104 In addition, multiple lines of evidence suggest an involvement of activation-induced cytidine deaminase (AID or AICDA) in demethylation. AID bound silent methylated OCT4 and NANOG differentiation factor genes and was required for reprogramming of human induced pluripotent cells.105 AID also contributes to the global demethylation that occurs in mouse primordial germ cells,106 and overexpression of AID in culture results in demethylation of hydroxymethylcytosine.104 Interestingly, AID was first characterized for its role in immunoglobulin class-switch DNA recombination where it targets transcription-induced R-loops.107, 108 R-loops are hybrid structures of RNA and DNA in which a nascent RNA molecule with high G content (G-skew, found frequently in CGIs) displaces one strand of DNA because of increased affinity.109 R-loops have recently been demonstrated to protect active CGI promoters genome-wide from de novo DNA methylation.110 Altogether, this suggests that active methylation and demethylation may regulate postnatal neuronal maturation in response to transcriptional changes from early-life exposures and experiences.

Using bisulfite conversion followed by high-throughput sequencing (MethylC-seq), my laboratory recently discovered that human SH-SY5Y neuronal cells contain partially methylated domains or PMDs.111 Novel hidden Markov models were developed to computationally map the genomic locations of PMDs in both cell types and showed that autosomal PMDs can be over 9 Mb in length and cover 41% of the IMR90 (lung fibroblast) genome and 19% of the SH-SY5Y genome. Genomic regions marked by cell line-specific PMDs contain genes that are expressed in a tissue-specific manner, with PMDs being a mark of repressed transcription. Genes contained within neuronal highly methylated domains (N-HMDs, defined as a PMD in IMR90 but HMD in SH-SY5Y) were significantly enriched for neuronal differentiation functions. However, not all neuronally expressed genes were contained within N-HMDs. Instead, N-HMD genes showed significant enrichment for specific subsets of neuronal gene functions, including cell adhesion, ion transport, cell–cell signaling, synaptic transmission, transmission of nerve impulses and neuron differentiation. Autism candidate genes were significantly enriched within PMDs, including CNTNAP2, GABRB3, MACROD2 and NRXN1. Interestingly, the largest PMD observed in SH-SY5Y cells marked a 10-Mb cluster of cadherin genes (including CDH9 and CDH10) with strong genetic association to ASD on chromosome 5p14.24, 111, 112 In addition to cultured cell lines, we have recently performed MethylC-seq on five human tissues (cerebrum, cerebellum, kidney, NK cells and placenta) and placenta was the only tissue containing PMDs.113 These results have suggested that PMDs are a developmentally dynamic feature of the human methylome that can be observed in a normal human tissue that is a normal byproduct of all births.

Future directions of integrative epigenomic strategies

Our recent results suggest that large-scale methylation domain maps could be relevant to interpreting and directing future investigations into the elusive etiology of autism. We hypothesize that the transition between PMD and HMD may be an important epigenetic event in neuronal maturation and that in utero environmental factors impacting the methylome may alter the optimal expression of a network of synaptic genes with relevance to ASD. A large number of known autism candidate genes are found in methylation domains that will be detectable in placenta, including the domains highly methylated in both neurons and placenta but not lung fibroblasts (L-PMDs) and those highly methylated only in neurons (N-HMDs). These findings suggest that deficiencies in methylation levels within these defined genomic regions may predict a problem in neuronal methylation and transcription of these important synaptic genes.

Genome-wide data from the Encyclopedia of DNA Elements (http://www.nature.com/encode) project is still in the early stages, providing epigenomic data on a few human cell lines. However, these publically available data sets will continue provide rich sources of information about epigenetic differences between tissue types that are expected to be useful in interpreting the vast amounts of individual genome sequences that are rapidly coming available. For advancing understanding of etiological causes of ASD, it will be very important to the field to integrate the various layers of genetic and epigenetic information together with precise measurements of behavior and individual exposures. Human epidemiology studies may need to continue to look across generations to understand how human genomes and epigenomes may becoming increasingly susceptible to ASD risk through a multitude of environmental factors.

As the interface of genetics and environmental exposures in humans can be overly complex and overwhelming, even to the best designed human studies, animal models in which genetics and environmental factors can be controlled and systematically tested for impacts on social behaviors will continue to be important experimental validations of suspected causal associations. Cell culture model systems for studying human neurons, such as patient-derived induced pluripotent stem cells114 or next-generation genetic technologies of artificial zinc-finger technologies,115 transcription activator-like effectors116 and genome editing with CRISPR/Cas systems,117, 118 will also likely to be important approaches for studying the environmental epigenetic interface in humans.

There is growing appreciation in the field of ASD research that a single discipline or methodology will not be sufficient to solve the complex etiologies of these increasingly common disorders. While genetic strategies have been successful in identifying specific genes and gene pathways in rare ASD cases, a more integrated approach of examining genetic, environmental and epigenetic events will be essential in solving the complex enigma of ASD etiology and treatment.