Background
Autism spectrum disorders (ASDs) are childhood neurodevelopmental disorders characterized by impairments in social interaction, communication, and by restricted, repetitive, and stereotyped patterns of behavior [
1]. The Centers for Disease Control and Prevention (CDC) reported in 2012 that approximately 1 per 88 children in the United States has a diagnosis of ASD [
2]. Boys are five times more likely to have ASDs than girls. Although autism is typically thought of as a childhood disorder, some affected patients need care even after they reach adulthood. In fact, a recent study demonstrated that it can cost about $3.2 million to take care of an autistic individual over his or her lifetime [
3]; therefore, autism presents a great social and economic toll on society.
Understanding the causes of ASDs is critical for the development of better diagnoses and treatment strategies. ASDs are highly heritable and are indeed among the most heritable neurodevelopmental and neuropsychiatric disorders [
4]. The genetic basis of ASDs has been pursued aggressively over the past few decades using cytogenetic studies, linkage analysis, and candidate gene association analysis [
5]. With the development of high-throughput SNP genotyping technologies, genome-wide association studies (GWAS) [
5‐
9] and copy number variation (CNV) studies [
10‐
13] have been conducted over the past few years, revealing the association between specific candidate genes and loci with ASDs, but with moderate effect sizes.
Recent genetic studies demonstrated that next-generation sequencing (NGS) technology can be a powerful tool to identify the genetic basis of human diseases, especially Mendelian disorders [
14‐
16]. Unlike GWAS that relies on proxy association of genetic variants with unknown disease causal variants, NGS technology enables researchers to interrogate the complete human genome or exome for the detection of both common and rare variants, hence improving the chance of finding disease causal variants, given the potential ability to perform functional annotation on each of the identified variants. Recently, several studies have been published to examine the role of whole-exome sequencing (WES) to identify genetic risk factors for autism. In 2011, a trio-based study of autism performed WES on 60 individuals from families affected with sporadic ASDs and 20 unaffected control individuals, and suggested that
de-novo sequence variants might contribute to the genetic etiology of ASDs [
17]. A follow-up study from the same group sequenced 209 families and found that
de-novo mutations fall within a highly interconnected β-catenin/chromatin remodeling protein network [
18]. A companion paper using WES on 928 individuals, including 200 phenotypically discordant sibling pairs, reported that highly disruptive (nonsense and splice-site)
de-novo mutations in brain-expressed genes are associated with ASDs and carry large effects [
19]. Another study sequenced 175 trios by WES and nominated
CHD8 and
KATNAL2 as genuine autism risk factors, but also suggested a more limited role for the contribution of
de-novo mutations to ASD pathogenesis than previously reported [
20]. Similarly, an exome sequencing study on 343 families did not identify significantly greater numbers of
de-novo missense mutations in affected
versus unaffected children, but they identified more gene-disrupting mutations in affected children and found that many of the disrupted genes are associated with the fragile X protein FMRP [
21]. The rate of
de-novo mutations has been recently linked to paternal age, in a study that sequenced 78 trios including 44 offspring with autism and 21 offspring with schizophrenia [
22]. Another study sequenced balanced chromosomal translocations in patients with autism or related neurodevelopmental disorders, and revealed the disruption of 33 loci from four categories, reinforcing a polygenic risk model of autism [
23]. These and many other recently published studies suggested that
de-novo mutations may play important roles in susceptibility to autism.
However, current exome sequencing studies on autism may not be comprehensive or representative enough. Many of these studies focus only on simplex families or sequence one affected child from multiplex families. More importantly, the published studies do not specifically analyze inherited mutations, despite the fact that ASDs are highly heritable and that the vast majority of the mutations identified are inherited mutations. We note that one rare exception was published recently, which demonstrated that some familial ASDs were associated with biallelic mutations in known Mendelian disease genes [
24]. Although it is clear that
de-novo mutations explain a fraction of autism patients, it is likely that inherited mutations, in combination or in aggregation, may explain a higher fraction of autism cases. Therefore, we attempted to address this problem by performing a pilot sequencing analysis on patients from multiplex families. We selected a large two-generation family, with parents and eight children, two of whom were diagnosed with autism. DNA samples were available for all subjects, except for one unaffected child. We generated whole-genome sequencing data on the two probands. Not knowing the exact disease model for autism in the family, we performed a series of different procedures for removing variants that are less likely to be functionally important and for finding candidate disease causal genes. Additionally, we genotyped all members of the pedigree (except for the one unaffected child) using Illumina HumanHap550 SNP arrays with approximately 550,000 SNP markers, to help further reduce the number of candidate genes. We have not yet proven whether these mutations singly or in combination contribute to the development of this disease in the two children in this family, and we discuss the potential implications of our study, as a more general issue to the use of NGS for the study of autism and other neuropsychiatric disorders.
Discussion
In this study, we performed a pilot sequencing analysis aimed at identifying potential genetic risk factors for autism in a large pedigree, focusing on inherited mutations. We attempted multiple complementary analytical approaches, each of which identified one to a few candidate genes. We were not able to confirm specific disease-causing mutations with certainty, but we uncovered multiple rare mutations unique to the family, as well as several candidate genes that harbor suspected deleterious coding or non-coding mutations. Among them, based on prior literature, ANK3 is a highly plausible candidate gene that may increase the susceptibility to ASDs in this family. Given that autism is a complex neuropsychiatric disease, it is likely that multiple contributing variants in the family may increase susceptibility; therefore, even if a specific candidate gene does contribute to disease risk, we caution that a single candidate gene may not be entirely responsible (that is, necessary and sufficient) for the genetic risk of autism in this pedigree. Although our findings are restricted to this specific family, these new candidates can certainly be evaluated in future sequencing studies to establish their true relevance to autism susceptibility.
We applied a whole genome sequencing strategy to reveal specific genetic mutations that may confer susceptibility to ASDs in one single family, and these results can also be compared to exome sequencing studies on schizophrenia, ADHD, and other neurodevelopmental disorders. A recent study revealed that
de-novo mutation rate might play a major role in schizophrenia, and a large excess of non-synonymous changes were identified by whole exome sequencing from 53 sporadic cases, 22 unaffected controls, and their parents [
71]. In another study on schizophrenia, four of the 15 identified
de-novo mutations in eight probands were nonsense mutations [
72]. In a previous small-scale exome sequencing study screening attention deficit/hyperactivity disorder (ADHD) genes on a multiplex pedigree, multiple rare coding variants were identified but were not prioritized based on bioinformatics predictions [
42]. In comparison, our study specifically identified rare and family-specific variants rather than
de-novo mutations.
We initially focused on inherited mutations that are likely to be recessive, which shares some similarity with a very recent exome sequencing study on ASD families enriched for inherited causes due to consanguinity [
24]. Other studies have focused on sporadic mutations in families where the parents have been characterized as most likely ‘unaffected’ with autism [
17‐
22], and several observations support the hypothesis that the genetic basis for ASDs in sporadic cases may be different from that seen in families with multiple affected individuals, with some of the former possibly more likely to result from
de-novo mutation events rather than inherited variants. For an approach complementary to ongoing exome sequencing studies aiming to detect
de-novo mutations in ASDs [
17‐
22], we specifically selected a multiplex family to test our ability to find inherited mutations that increase risk for ASDs.
In addition to finding inherited mutations, one unique aspect of our study is the use of whole-genome sequence data, which enabled us to perform exploratory analysis on non-coding variants. Given the far larger number of candidate non-coding variants than coding variants, we had to apply highly stringent filtering criteria to focus on those that are most likely to be functionally relevant. These include the use of bioinformatics predictions from evolutionary constraint [
34], as well as experimental evidence from the ENCODE project [
66]. As our knowledge and bioinformatics approaches for non-coding variants may improve in the future, we may be able to better interrogate the sequencing data to identify disease causal non-coding variants.
We also need to emphasize that previous studies all used the Illumina platform, yet our study used the CG platform, which represents a different type of sequencing technology [
28] and generates vastly different types of output files for downstream analysis. As the Illumina platform uses open data formats, a variety of academic and commercial tools have been developed to analyze data from the Illumina sequencers and improve variant calls; in comparison, the CG platform takes a proprietary, ‘black-box’ approach, so that researchers generally have to rely on variant calls and associated quality scores provided by CG. A recent study has comprehensively compared these two platforms and identified that 12% of the called variants are discordant between platforms, yet >60% of these discordant variants were indeed present in the genome based on Sanger validation [
40]. Another recently published study also compared data from the 1000 Genomes Project and Complete Genomics, and demonstrated that 19% of the single nucleotide variants (SNVs) reported from common genomes are unique to one dataset [
73]. Therefore, current sequencing studies on neuropsychiatric diseases, including ours, may all suffer significantly from false-negative variant calls, and may miss a portion of disease causal variants. Combining data from orthogonal platforms may partially reduce this problem, although this will result in higher sequencing and analytical cost.
In the current study, we first made the assumption that the ASD in the pedigree might be caused by a just a handful of mutations with high penetrance, and under such a model we were able to identify a list of possible such candidate genes. However, in practice, there may be a spectrum of diseases manifesting in each individual, with an as-yet-unknown balance of oligogenic and polygenic modes of inheritance. So, the approaches that we used were somewhat ad hoc, and we were unable to generate statistical support for these candidate genes. Indeed, the appropriate statistical threshold to determine functional relevance, in the context of prior biological knowledge, is not well developed. In summary, our study represents one of the first examples demonstrating the feasibility of whole genome sequencing for familial samples and analyzing inherited mutations on ASDs. Ultimately, we believe that studies focusing on de-novo or inherited mutations can complement each other, and reveal a more comprehensive picture of susceptibility to ASDs, once sufficient sample sizes have been reached by the community.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
LS and KW carried out the data analysis, performed the literature survey, and drafted the manuscript. RG performed alignment and coverage analysis of the whole-genome sequence data, and revised the manuscript. XZ, MH, and GJL interpreted the results and helped with writing of the manuscript. FGO performed the Sanger sequencing validation. CH and CK performed quality control and sample handling. KW and HH conceived the study, guided data analysis, and revised the manuscript. All authors read and approved the final manuscript.