Background
ASDs are a class of neurodevelopmental disorders characterized by persistent deficits in social communication/interaction and restricted, repetitive patterns of behaviors, interests, or activities (DSM-5) [
1]. The genetic risk factors for ASD are heterogeneous, and up to 1 thousand genes are estimated to be involved [
2]. Recent whole exome and genome-sequencing projects, focused on the identification of rare de novo mutation in the probands of ASD family “trios” or “quads”, have discovered hundreds of genes with functionally disruptive mutations [
3‐
12], some of which also map to ASD-associated de novo copy number variations (CNVs) [
13]. Many of the implicated genes encode proteins involved in synapse formation, transcriptional regulation, and chromatin remodeling [
10,
14], indicating a likely convergence of functional pathways, despite genetic heterogeneity.
Chromodomain helicase DNA binding protein 8 (
CHD8) emerged as a top ASD-candidate gene from multiple exome-sequencing studies [
6,
8], which altogether have analyzed thousands of ASD probands and in some cases their families too [
15,
16]. More importantly, disrupting
chd8 in zebrafish development has recapitulated multiple features of ASD, including macrocephaly observed in some ASD cases carrying
CHD8 mutations [
15,
16]. In addition, macrocephaly is often found in ASD caused by disruption of other candidate genes [
17].
CHD8 is a ubiquitously expressed member of the CHD family of ATP-dependent chromatin-remodeling factors that play important roles in chromatin dynamics, transcription, and cell survival [
18]. Previous studies [
19‐
21] have shown that the
CHD8 protein interacts with β-catenin and negatively regulates Wnt signaling, while both β-catenin and Wnt signaling play critical roles in normal brain development and neuropsychiatric disorders [
22], including ASD [
14].
CHD8 also functionally interacts with p53 [
23] and recruits MLL histone methyltransferase complexes to regulate cell cycle genes [
24].
To elucidate the roles of
CHD8 in neurodevelopment and address the contribution of its disruption to ASD pathogenesis, three groups have recently reported genome-wide
CHD8 binding sites and transcriptomic changes upon shRNA-mediated knockdown of
CHD8 expression in neural progenitor cells (NPCs) [
25], neural stem cells (NSCs) [
26], and SK-N-SH neuroblastoma cells [
27]. The results showed that
CHD8 binds to thousands of genes, largely biased to promoters, in NPCs, NSCs, and developing mammalian brains. Reduced expression of
CHD8 resulted in the potential disruption of gene networks involved in neurodevelopment, which contained many ASD-risk genes [
15,
25‐
27]. The transcriptional targets of
CHD8 have also been studied in non-neural systems, as it is also involved in cell cycle regulation, Wnt signaling, and several forms of cancers [
19,
21,
24,
28,
29].
As ASD is a developmental disorder with symptoms emerging in early childhood and the
CHD8 causal mutations in patients are germline, it is important to establish cell models that can mimic the persistent loss of
CHD8 function in the developing embryos prior to and during neuronal differentiation and brain development. Therefore, we applied CRISPR/Cas9 technology [
30] to generate
CHD8
+/−
iPSCs by knocking out one copy of the gene in an iPSC line derived from a healthy male subject. We then differentiated both the wild-type (WT) and the
CHD8
+/−
iPSCs to NPCs and subsequently neurons and performed comparative transcriptomic analysis (RNA-seq). Our approach has several advantages: precisely targeted changes at the DNA level, persistent reduction of
CHD8 expression, no introduction of extra genetic materials (e.g., virus vector) to the cells, and greater flexibility in the types of differentiated cells that can be generated.
We found that heterozygous CHD8 knockout (KO) disrupted the expression of many genes involved in extracellular matrix formation, neuronal differentiation, and skeletal system development. Interestingly, CHD8-regulated genes were enriched with ASD-risk genes, schizophrenia-risk genes, and genes implicated in regulating head size or brain volume. Furthermore, we found that CHD8-regulated genes significantly overlapped with the downstream targets of several critical genes (TCF4, EHMT1, SATB2, and NRXN1) that have been associated with ASD or other neuropsychiatric disorders. Taken together, our results not only shed light on the molecular roles of CHD8 in neurodevelopment, but also provide evidence of potential convergence of cellular pathways that could be disrupted by mutations of distinct ASD-risk genes.
Methods
Development of iPSCs from skin fibroblasts
We have been developing iPSCs from controls and patients with 22q11.2 deletion and diagnosis of a psychotic disorder [
31]. The study and consent forms were approved by the Institutional Review Board of Albert Einstein College of Medicine . Consent was obtained by a skilled member of the research team who had received prior human subject training. One of the healthy male control (without 22q11.2 del too) was used in the current study. Exome sequencing was performed on DNA extracted from white blood cells of this subject to detect coding variants prior to generating the
CHD8 KO. We used GATK [
32] for variant calling and ANNOVAR [
33] for variant annotation.
The iPSC line used in this study was generated from fibroblasts obtained from a skin biopsy performed by a board-certified dermatologist. The procedures for growing fibroblasts and iPSC reprogramming are detailed in Additional file
1. Pluripotency was confirmed by immunocytochemistry using antibodies (Ab) against Tra-1-60, Tra-1-81, SSEA3, and SSEA4, which are expressed in pluripotent stem cells. In addition, the capacity to differentiate into all three germ layers was established by in vitro assays, as previously described (see Additional file
1 for details) [
34,
35].
Design of the CHD8 single guide RNA sequences
Single guide RNA (sgRNA) sequences targeting the region adjacent to the Ser62 codon of
CHD8 were picked using the online CRISPR design tool [
36] from the Zhang lab at the Broad Institute, and the two selected sgRNAs were predicted to have very low probability off-target sites. The sgRNA sequences were then cloned into the pSpCas9 (BB)-2A-Puro (PX459) vector (a gift from Dr. Feng Zhang, Addgene plasmid # 48139) [
30].
Human iPSCs were cultured and fed daily in mTeSR1 (Stem Cell technologies) on Matrigel (BD)-coated plates at 37 °C/5 % CO2/85 % in a humidified incubator. Cells were maintained in log phase growth, and differentiated cells were manually removed. iPSCs were exposed to 10-μM ROCK Inhibitor for ~4 h to improve cell survival during nucleofection. After 4 h, growth medium was aspirated, and the cells were rinsed with DMEM/F12. iPSCs were dissociated into single cells using accutase and harvested. Nucleofection was performed using the Amaxa-4D Nucleofector Basic Protocol for Human Stem Cells (Lonza) according to the manufacturer’s instructions. Briefly, 8 × 105 cells and 5 μg of the CRISPR/Cas9 plasmids with either sgRNA1 or sgRNA2 were nucleofected using the P3 Primary Cell 4D-Nucleofector X Kit L with program CA-137. Cells were resuspended in mTeSR1 + 10-μM ROCK Inhibitor and placed in one well of a 6-well Matrigel-coated plate. The following day, cells were fed with fresh mTeSR1, and were subsequently fed with fresh medium every day. On days 4–14, cells were exposed to 0.5 μg/ml puromycin for 6 h. Puromycin-resistant colonies were picked and expanded in mTeSR1 without further puromycin treatment.
Characterizing CHD8 knockout lines
“TA” cloning was used to identify the knockout alleles. A 479-bp PCR amplicon flanking the CRSPR/Cas9-targeted sites was generated using the primers 5’-CTGTAAGACAGGTTGGGCTG-3’ and 5’-CTTGTTTCTTGCCTCTATACTTGA-3’. The PCR product was purified and ligated into pCR™2.1 using a TA Cloning Kit developed by Life Technologies following the manufacturer’s protocol. Recombinant plasmids were introduced into competent E. coli and selected in ampicillin. Plasmid DNA was extracted and sequenced across the insert using one of the PCR primers.
Western blotting confirmed that the CHD8
+/−
lines expressed lower levels of CHD8 protein. Specifically, cell lysates from NPCs differentiated from WT, and KO iPSCs were separated by SDS PAGE, transferred to PVDF membranes, and then blotted with anti-CHD8 antibody (Bethyl Cat #A301-224A). Anti-actin antibody (BD Biosciences, Cat # 612656) was used for loading control.
Neuronal differentiation
Neurons were generated from iPSC-derived NPCs as described by Marchetto et al. with slight modifications [
34,
37]. A detailed description of the protocol can be found in Additional file
1. Essentially, the protocol leads to a mixed population of glutamatergic and GABAergic neurons, from which RNA was extracted and sent for sequencing.
RNA-seq analysis
We obtained 101-bp paired-end RNA-seq reads from Illumina HiSeq 2500. RNA-seq reads were aligned to the human genome (hg19) using Tophat (version 2.0.8) [
38]. The number of RNA-seq fragments mapped to each gene was determined for genes in the GENCODE database (v18) [
39]. Exonic/intronic/intergenic rates were calculated by CollectRnaSeqMetrics in Picard [
40]. Cufflink (version 2.2.1) [
41] was used to generate the gene-expression values as FPKMs (fragments per kilobase of exon per million fragments mapped). We restricted our analysis to 12,843 protein-coding genes with average FPKM >1 across all four samples. DESeq2 [
42] was used to determine differentially expressed genes (DEGs) in NPCs and neurons. The list of significantly DEGs was defined at false discovery rate (FDR) < 0.05. DAVID [
43,
44] was used for Gene Ontology (GO) analysis with 12,843 expressed protein-coding genes as background. Ingenuity pathway analysis (IPA) [
45] was used for canonical pathway analysis and disease association, with the ingenuity knowledge base (genes only) as background. Toppgene [
46] was used for human phenotype ontology analysis, and the whole genome was used as background. The RNA-seq data have been deposited in the Gene Expression Omnibus (GEO; accession # GSE71594).
To find neurodevelopment genes specifically affected by CHD8
+/−
, we added an interaction design in DESeq2 (option: ~celltype + genotype + celltype:genotype) to specifically model the interactions between development status (NPCs or neurons) and CHD8 status (WT or KO).
Validating targeted deletions and assessing off-targets using RNA-seq reads
First, we examined if
CHD8 was targeted and edited precisely according to our CRISPR sgRNA design. A 2-bp deletion in chr14:21899785 (hg19) and a 10-bp deletion in chr14:21899722 (hg19) were identified in a proportion of RNA-seq reads that mapped to targeted regions of the two CRISPR sgRNAs in
CHD8
+/−
samples (KO1 and KO2, respectively). This was confirmed by DNA sequencing. The two short indels were not found in any of the WT samples. We also called short indels (supported by at least five RNA-seq reads) from RNA-seq reads by samtools [
47], but we did not detect any additional indels that were present in
CHD8
+/−
samples but not in the WT controls.
Quantitative real-time PCR (qPCR)
qPCR was carried out on reverse transcribed PCR using the 2
−ΔΔCt method as previously described [
48,
49]. A detailed description and the primers used for this analysis can be found in Additional file
1.
Definition of CHD8-binding genes
ChIP-seq peaks of
CHD8-binding sites in NPCs were from Sugathan’s report [
25]. Only peaks replicated by all three antibodies were used. Genes with at least one peak from 2 kb upstream of the transcription start sites to the transcription terminus were defined as
CHD8-binding genes.
Interaction network analysis
DEGs in NPCs with
CHD8-binding were imported into the STRING database v10 [
50] to construct protein-protein interaction networks. We retained any interaction (i.e., edge) from experiments and databases that had a median confidence ≥0.4.
For detecting converged networks of multiple ASD-risk genes, we first collected DEG lists from our
CHD8
+/−
NPCs,
TCF4 knockdown,
EHMT1 knockdown,
MBD5 knockdown, and
SATB2 knockdown, respectively [
51,
52], and then imported genes shared by at least two lists into the STRING database to construct gene networks. GO enrichment was calculated by “Enrichment” function in STRING. Networks were visualized using Cytoscape [
53]. The same approach was also applied to DEGs from
CHD8 KO,
ZNF804A KD, and
NRXN1 KD neurons.
Upstream regulator prediction
IPA was used to predict upstream regulators for 841 DEGs without CHD8 binding in NPCs. In this analysis, the p value from IPA measures the significance in overlap between query genes and pre-defined sets of genes that are regulated by a specific regulator, using the Fisher test. At the end, we used p < 0.05 to select upstream regulators that (a) regulated at least five non CHD8-bound DEGs and, themselves, were (b) in our NPC DEG list.
ASD/schizophrenia-risk gene sets
The first ASD gene set was obtained from the SFARI gene-scoring module [
54], using genes scored as high confidence, to minimal evidence and syndromic. The second ASD gene set was from the AutismKB [
55] core dataset, which includes syndromic autism related genes and non-syndromic-related genes, designated as high confidence. High-confidence and probable ASD genes in Willsey’s paper [
56] were used as the third set (“Willsey_ASD”). Genes predicted from whole exome-sequencing and co-expression network [
57] were used as another set (“Liu_ASD”). The other two lists were derived from massive whole exome sequencing: one (“Iossifov_ASD”) focused on de novo mutations [
11] and the other (“DeRubeis_ASD”) combined de novo and inherited mutations to develop a high-confidence list (FDR < 0.1) [
10]. Two schizophrenia gene lists were from the SZgene database [
58] and a recent GWAS report [
59].
Identification of common GO terms for DEG lists from different CHD8 studies
DEGs were determined by the following criteria from data in four previous studies. For the study by Cotney et al. [
26], we selected genes with logFC > 0.1 and log counts per million (logCPM) between 2 and 10 to meet the Poisson assumption, as described by the original authors. However, we repeated differential expression analysis with a less stringent FDR cutoff, using the Benjamini-Hotchberg method instead of Bonferroni to adjust
p values for significance. For the study by Sugathan et al. [
25], we used ComBat in the sva package [
60] to adjust batch effects, followed by differential expression analysis with DESeq2 [
42]. DEGs were selected by
p value < 0.05; 96 % of DEGs in Sugathan’s list were in our reanalysis list. For Wilkinson et al’s study [
27], we used the DEG list provided by the authors.
Enriched GO terms for each of the five DEG lists were first determined by ClueGO [
61] (
p value < 0.05). Subsequently, GO terms shared by ≥3 DEG lists were considered as common and used for subsequent analysis of function overlap between the five
CHD8
+/−
and knockdown studies. The relationships among the selected GO terms were based on their shared genes, which was measured using kappa statistics [
62]. Two GO terms were connected by an edge if they had a kappa score >0.4. ClueGO relies on term similarity to define functional groups of multiple terms. In our analysis, we set initial group size to 5 (default value, 2) and percentage for group merge (default values, 50 %) to 80 % to obtain a summary of less redundant functional clusters of common GO terms. Since a GO term can be included in several functional groups, we assigned each term only to the functional groups in which it had the most significant group
p values, meaning that this term had the most similar genetic component in this group. Subsequently, terms of the same groups formed a closed circle. In addition to edges connecting terms to show their relationships, we also added edges between terms and studies to reveal enrichment of specific GO terms among individual DEG lists.
Statistics
To test if DEGs were significantly overlapped/enriched with a specific gene set, 12,843 expressed genes in our samples were used as background (of all genes) for the Fisher exact test. Statistics tests were conducted in R [
63] and multiple test correction was applied unless specified otherwise.
Discussion
To better understand the effect of genetic disruption of
CHD8 in ASD subjects, we have applied CRISPR/Cas9 technology to knockout one copy of
CHD8 in a control iPSC line and studied its effect on gene expression during early neurodevelopment. In comparison to previous studies [
25‐
27] that used a gene knockdown approach to reduce
CHD8 expression, our approach generated heterozygous disruptions that better mimic the germline mutations in ASD patients and allows for the study of long-term effects of
CHD8 disruption in neurogenesis in vitro. In addition, creating
CHD8
+/−
iPSCs provides a truly renewable resource for investigators, as opposed to NPCs, which have a finite replicative capacity. Furthermore, iPSCs can differentiate into any cell type, including cerebral organoids [
77]; NPCs are restricted in their differentiation potential.
Perhaps one of the most important findings emerging from our transcriptomic analysis is that several genes known to be related to head size or brain volume, either from the analysis of human phenotype ontology [
46] or identified through GWAS [
67‐
69], displayed significant changes in their expression in
CHD8
+/− neurons. Studies examining
CHD8 function in zebrafish during embryonic development revealed that the macrocephaly phenotype observed in ASD probands with
CHD8 mutations is likely caused by disturbed neuronal proliferation at early developmental stages [
15,
25]. By uncovering genes like
HMGA2 and
FAT3, which have been associated with head size, our finding thus provides new molecular insights that may eventually link
CHD8 mutation to abnormal neuronal proliferation and macrocephaly. This association between macrocephaly and ASD is not the first to be reported in genetically defined subgroups. Mutations in
PTEN are also associated with severe macrocephaly and ASD [
78]. Although
PTEN itself was not differentially expressed in our
CHD8
+/−
samples,
FOXO1 and
FOXO3, two critical transcription factors in PTEN signaling, were. Interestingly, IPA analysis of the differentially expressed genes reported an enrichment of genes in PTEN signaling, suggesting that there may be a link between the PTEN and
CHD8 pathways, and the molecular link underlies the observed macrocephaly and ASD. In this regard, we should mention that
PTEN was differentially expressed upon
CHD8 knockdown in NPCs in a previous study [
25]. We should also point out there were additional genes in our DEG lists that have been previously suggested to be associated with hippocampal volumes, such as
BDNF,
DISC1, and
NRG1.
Dysregulated genes in
CHD8
+/−
cells exhibited significant overlap with previously defined ASD-risk genes, including some high confident candidates like
ANK2,
SCN2A, and
SUV420H1 [
54] that also showed significant differential expression (Table
2). We further demonstrated, interestingly, that
CHD8-regulated genes significantly overlapped with the downstream targets of other ASD-risk genes like
TCF4 and
NRXN1, providing transcriptomic evidence that ASD-risk genes have overlapping function and converge on downstream regulatory pathways. This is extremely important in a genetically heterogeneous condition, such as ASD, in which any individual candidate gene carrying deleterious mutation may only contribute to 1~2 % ASD cases [
79].
TCF4 is associated with Pitt-Hopkins syndrome (MIM: 610954), which is defined by severe psychomotor delay, epilepsy, daily bouts of diurnal hyperventilation, mild postnatal growth retardation, postnatal microcephaly, and distinctive facial features [
80]. It is also a top schizophrenia candidate gene. In NPCs and NSCs,
CHD8 binds to the
TCF4 gene body in a region that is also enriched with H3K27ac [
25,
26], a histone modification associated with active enhancers.
TCF4 is significantly upregulated in both
CHD8
+/− NPCs and neurons. Moreover, TCF4-regulated genes significantly overlapped with
CHD8-regulated genes. Taken together, these results suggest a direct connection between
TCF4 and
CHD8 regulatory networks. The common genes in the two networks, especially those regulated oppositely by
CHD8 and
TCF4, such as
HMGA2, which was upregulated in
CHD8
+/−
(Table
1) and downregulated in
TCF4 knockdown [
81], are strong candidates for regulating the development of brain size.
It was intriguing to find that DEG lists from different
CHD8
+/−
or knockdown studies had only limited overlaps. A recent study of knockout vs. knockdown in zebrafish
egfl7 proposed that compensatory networks would be activated to buffer against deleterious mutations from knockout, which was absent in knockdown [
82], providing a potential explanation for the lack of good overlap among
CHD8 KO and KD findings. However, we found that at the function and pathway levels, genes involved in similar functions were affected by reduced
CHD8 expression in different contexts. It is conceivable that a limited number of upstream regulators are directly regulated by
CHD8, and the subsequent response of the downstream targets is mostly dependent on genetic background, cell culture conditions, and other experimental factors. In this regard, we should mention that five (
GDPD4,
VPS13B,
KMT2C,
SETBP1, and
CLTCL1) of the ~1000 ASD-risk genes were predicted to be functionally disrupted by premature stop or frameshift variants located to the coding exons of the subject used to prepare our WT iPSC line (Additional file
10: Table S9 and S10). However, none of these five genes exhibited differential expression in the WT neurons when compared to samples from other control iPSC lines derived from six unrelated subjects (data not shown). As neuronal differentiation is a complex process, affected by both environmental cues and intrinsic cellular signaling, our analysis suggests that it is important to study the effects of the same genes under different experimental conditions. Nevertheless, comparison of our results with the transcriptomic data in ASD-derived organoids indicates that reduction of
CHD8 expression in our
CHD8
+/−
samples is probably more consistent with the gene regulation in ASD-developing brains, although it should be noted that the ASD-derived organoids were from patients with unknown genetic mutations.
As most of the functionally disruptive mutations uncovered in the
CHD8-coding regions in the ASD probands introduce premature stop or frameshift mutation [
15,
16], we used CRISPR/Cas9 technology to make small deletions in the first coding exon of
CHD8 in this study. While the 2- or 10-bp deletion is predicted to cause frameshift, and no functional protein is expected from the mutants, RNA transcripts from the knockout
CHD8 copy were observed in our RNA-seq data, indicating nonsense mediated mRNA decay is incomplete if it occurs to the mutated
CHD8 transcripts (Fig.
1).
CHD8 encodes a multi-domain protein, and deleterious mutations in
CHD8 have been found in almost every important functional domain [
15,
16]. While our data indicate that
CHD8 regulates multiple pathways related to neural development, the different mutations in ASD individuals may impair distinct and specific aspects of
CHD8 functions. In the future, it will be valuable to carry out gene-expression profiling using ASD-specific iPSCs to see how our current findings can be recapitulated in additional iPSC-derived NPCs and neurons. As genetic background likely plays an important role in modulating
CHD8 function during brain development, it is important to both perform our current knockout analysis in additional control iPSC lines and to derive patient-specific lines from multiple ASD individuals with
CHD8 mutations. In addition, it will be extremely valuable to apply CRISPR technology to correct the
CHD8 mutations and perform transcriptomic analysis and other molecular assays once the patient-specific lines are established.
Competing interests
The authors declare no competing financial interests and no non-financial conflicts of interest for any of the authors.
Authors’ contributions
HML and DZ conceived of the project. HML performed experimental data analysis. PW, ML, and DZ performed the bioinformatics data analysis. EP and AH prepared the iPS lines, neuronal differentiation, RNA-seq samples, and qPCR validation. WG designed the CRISPR guide sequences. ZZ and WG performed the Western blot analysis. PW, ML, HML, and DZ wrote the manuscript. All authors edited and approved the final manuscript.