Background
Autism spectrum disorders (ASDs) demarcate the extreme end of a continuum of behavioural difficulties [
1], characterised by impairments of social interaction and communication as well as highly restricted interests and/or stereotyped repetitive behaviours [
2]. The subthreshold end of this continuum is embodied by ASD-related but milder and non-psychopathological phenotypes, which are, as ASD, highly heritable (h
2 = 0.36 to 0.87 [
3‐
9]) and highly persistent [
10,
11] throughout the course of development.
Twin studies have reported no difference in heritability estimates of autistic symptomatology between the extremes of the distribution and normal variation [
7,
8], suggesting that clinical ASD and autistic-like traits in the general population may be etiologically linked. It is therefore possible that some variants influencing the expression of autistic traits might indeed represent underlying ASD quantitative trait loci (QTL). This assumption is supported by studies showing that common genetic variation at 5p14 [
12] carries not only risk for ASD but is also associated with the expression of social communication spectrum phenotypes in the general population [
13]. Candidate gene association studies identified furthermore
CYP11B1 and
NTRK1 as possible candidate loci, which may contribute to both risk of autism and the expression of autistic traits [
14]. Twin studies, however, also suggested that there is heterogeneity among the three components of the autistic triad, and that social communication spectrum phenotypes, which are heritable traits [
6,
15], are potentially aetiologically distinct from other autistic behavioural domains [
15,
16].
While there are multiple efforts to investigate quantitative traits within autism samples both through linkage [
17‐
20] and association designs [
21], there is currently little known about the nature of genetic variants affecting autistic traits in the general population. The largest genome-wide effort to date has been conducted by Ronald and colleagues, using a DNA pooling approach in high- versus low-scoring individuals with respect to social and non-social autistic-like traits [
22]. Although one SNP was replicated within an independent sample, the signal did not reach genome-wide significance. This might be related to some (expected) power loss because of inaccurate calls during the DNA pooling stage. Given the possibility of genetic links between the extreme and the subthreshold end of the autistic spectrum, however, a powerful genome-wide analysis of autistic traits analysed dimensionally in the general population may provide an opportunity to gain insights into the common genetic architecture of the autistic dimension. This is important, as common genetic variation identified by genome-wide association studies (GWAS) in ASD samples [
12,
23‐
27] has so far been either not replicated in more than one study [
28], or did not reach evidence for genome-wide significance. Analyses of joint SNP effects suggested furthermore that the effect of common variation on risk for ASD is modest [
24], highlighting the importance of study power, while other studies suggested that the lack of replication might be partially due to the underlying genetic heterogeneity of ASD, which in turn might be linked to different ASD subtypes [
21]. In this context, it seems surprising that the effect of a common ASD GWAS signal at 5p14 [
12] could be detected within a large population-based cohort investigating a continuum of broader ASD-related traits [
13]. However, cohort designs encompass considerable advantages that can assist in the discovery of common genetic variation: cohort samples are in general large and thus highly powerful study populations, they are robust towards the influence of rare mutations of large effects and trait information can be uniformly assessed with validated instruments across an entire continuum, including both the sub-threshold end and the affected extreme.
Our study aimed to identify common variation in social communication spectrum phenotypes in the general population using GWAS. Association signals were discovered within a large UK population-based birth cohort, the Avon Longitudinal Study of Parents and their Children (ALSPAC) for which the continuity of ASD-related traits has been demonstrated [
29,
30], and followed-up in the Western Australian Pregnancy Cohort (RAINE) Study. Here we report support for single SNP association at 6p22.1 and 14q22.1 based on replication in independent samples.
Discussion
This genome-wide study represents a large quantitative analysis of social communication problems in the general population, analysing a total of 6,948 children of White European descent, and provided support for the implication of common variation in the genetic architecture of these traits. Two of our seven top single SNP signals at 6p22.1 (rs9257616, meta-P = 2.5E-07) and at 14q22.1 (rs2352908, meta-P = 1.1E-06) were replicated within an independent sample of 11-year-old children with comparable measures from Western Australia, although they fell short of reaching conventional levels of genome-wide association. Overall, approximately a fifth (approximately 18%) of the variation in social communication difficulties was explained by joint additive genetic effects of common SNPs (MAF >1%), and our findings support a polygenic mode of inheritance.
Intriguingly, the observed GCTA heritability estimates for social communication traits in the general population are highly similar to recently reported GCTA heritability estimates in relatives of ASD probands [
52], strengthening the molecular support for an underlying broader autism phenotype. Based on analyses of the Simons Simplex Collection and the Autism Genome Project samples (contrasting two population control samples), substantial additive genetic influences were identified in fathers (h
2 = 0.20 to 0.52), mothers (h
2 = 0.20 to 0.37) and unaffected siblings (h
2 = 0.16) [
52]. The heritability estimates in our study are, however, smaller than previous twin study reports on autistic traits (h
2 = 0.36 to 0.87 [
3‐
9]) as GCTA estimates reflect only the lower limit of the narrow-sense heritability and depend on the assumption that causal variation is sufficiently represented through the selected set of genotyped SNPs [
49]. As such, GCTA estimates may account on average only for about half of the heritability observed within twin designs [
53].
The strongest replicated single SNP signal has been identified within the olfactory receptor gene cluster at 6p22.1, which is part of the broader major histocompatibility complex (MHC) region. On a larger scale, this genomic area has been previously related to autistic symptoms through association and linkage of the HLA-A2 class I allele with ASD [
54] (approximately 768 kb downstream of the signal). The extensive LD across the MHC region, however, hampers the evaluation of a single locus candidacy. Both regional gene-based analysis in ALSPAC and the presence of functional non-coding variation pointed to
TRIM27 (OMIM: 602165 [
55]) as a candidate locus, which encodes a member of the tripartite motif (TRIM) family. TRIM27 is a DNA-binding protein associated with the nuclear matrix and interacts with methyl-CpG-binding domain (MBD) proteins [
56], including MBD2, MBD3 and MBD4, and rare autism-specific protein-changing alterations have been observed both in
MBD3 and
MBD4[
57]. Social communication related variation at 6p22.1 may, however, also involve one of the many OR loci or the uncharacterised
ZNF311 gene, as protein altering variation at these sites has been found in LD with rs9257616. Furthermore, the replicated signals at 14q22.1 might be of interest as this association was supported by secondary analyses, including hearing impairments in both ALSPAC and RAINE. It might be speculated that this may reflect the non-pathological equivalent of an increased frequency of auditory symptoms, such as auditory filtering [
58,
59] or impairment in hearing [
60], which is often observed in individuals with ASD.
Partitioning of the genetic variance into chromosomes supported, furthermore, a polygenic model of inheritance, which may involve multiple loci of weak effect. This is consistent with the proposed role of common variation in ASD [
24], which is likely to affect risk to disease through a (log)-additive combination of multiple loci of small effect, but also the implication of common variation within behavioural traits, such as cognitive ability [
61]. It is also possible that these findings may extrapolate to other ages, with evidence from both ALSPAC [
11,
62] and RAINE [
63] suggesting that pragmatic language skills are stable across development. However, much larger sample sizes might be required to detect loci of modest individual effects, and failure to replicate or reach conventional levels of genome-wide association may not necessarily preclude the existence of genuine (but weak) loci. In light of this, also the strongest association signals within ALSPAC, including variation at 15q22.2, although not replicated in the smaller RAINE sample, might be re-visited in future studies. In general, chromosome 15 harbours a large amount of common social communication related genetic variation, which is larger than expected by its size. More specifically, the signal at 15q22.2 was also in LD with variants at
RNF111, a gene which has been recently implicated in Asperger disorder through association [
25]. However, even if this common signal is genuinely implicated in the genetic architecture of social communication traits, the underlying genetic mechanisms are likely to be different at each end of the autistic continuum, as we found no evidence that the Asperger-related single SNP variation contributes to the association signal within ALSPAC (data not shown). In addition, our findings strengthened the evidence for the presence of an ASD QTL at 5p14. Besides the signal reported by Wang and colleagues [
12], which has been previously related to the expression of social communication traits in ALSPAC [
13], we also observed association with a second 5p14 signal, identified by Ma and colleagues [
26]. Conditional analysis suggested that both SNPs refer to the same underlying causal variation, thus linking both loci to the recently proposed disease mechanism involving the transcription of non-coding RNA [
64].
Common genetic effects are implicated within many quantitative traits through a polygenic mode of inheritance [
61,
65]. While genome-wide genetic association screens for anthropometric phenotypes, such as height, have been, however, highly successful [
65], genetic association studies involving complex behavioural traits have so far failed to robustly identify single SNP association signals [
61,
66]. Our discovery sample (Genetic power calculator;
http://pngu.mgh.harvard.edu/~purcell/gpc/) had sufficient power (>0.83) to detect genetic effects explaining as little as 0.7% of the phenotypic variance, assuming for simplicity a normally distributed phenotype and complete LD between marker and disease locus, in addition to a type I error of α = 5E-08. However, the true inherent power of our study might have been compromised as parent reports of social communication difficulties in children represent a far noisier and less reliable quantitative data source than comparable anthropometric phenotypes [
65], making additional data cleaning and analysis steps indispensable. Within our study, we therefore selected a highly similar phenotype definition in both the discovery and the replication cohort. Problems in social communication skills as assessed by the newly defined measure are closely related to difficulties in conversational skills, such as turn taking, topic maintenance and discourse coherence. The newly defined measure had sufficient internal consistency, was highly correlated with the original CCC pragmatic composite scale [
33] and consistent with a previously reported association between social communication traits and common variation at an ASD risk locus at 5p14 [
13]. Furthermore, for pragmatic abilities, parent-report has been shown to be a more accurate measurement than self-report, primarily because this method allows for the assessment of communication in a variety of contexts [
67]. In addition, we selected a Quasi-Poisson regression approach, which specifically modelled the skewed phenotypic data distribution without information loss through transformation. As such, these “power-boosting” measures may have increased the true underlying power of our study through a reduction in measurement noise. Indeed, within the specific context of GWAS of quantitative cognitive/behavioural traits our findings stand out as we identified evidence for social communication-related genetic variation through replication. However, within the general context of GWAS studies, the reported single SNPs signals reached only suggestive levels of genome-wide association and, even under the “power-boosting” circumstances, many more samples might be required to identify common genetic association signals with high confidence. Furthermore, the limited number of items that comprised the SPC (n = 6), may have captured only selected aspects of social communication problems. Thus, further replication efforts may require similar item alignments in order to enhance the comparability of findings across studies.
Acknowledgements
ALSPAC: The UK Medical Research Council and the Wellcome Trust (092731), and the University of Bristol provided core support for ALSPAC, and Autism Speaks (7132) provided support for the analysis of autistic-trait related data. DME is supported by a Medical Research Council New Investigator Award (MRC G0800582). JPK is funded by a Wellcome Trust four-year PhD studentship (WT083431MA). We are extremely grateful to all the families who took part in the ALSPAC study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionist and nurses. We thank the Sample Logistics and Genotyping Facilities at the Wellcome Trust Sanger Institute and also 23andMe for generating the ALSPAC genome-wide data.
RAINE: The authors would like to acknowledge the National Health and Medical Research Council (NHMRC) for their long term contribution to funding the study over the last 20 years. Core Management of the RAINE study has been funded by the University of Western Australia (UWA), Curtin University, the UWA Faculty of Medicine, Dentistry and Health Sciences, the RAINE Medical Research Foundation, the Telethon Institute for Child Health Research, and the Women’s and Infants Research Foundation. DNA collection and genotyping was funded by the NHMRC (572613). AJOW is funded by Career Development Fellowships from the NHMRC (1004065). The authors are extremely grateful to all of the families who took part in this study and the whole RAINE Study team, which includes the Cohort Manager, Data Manager and data collection team.
This publication is the work of the authors and they will serve as guarantors for the contents of this paper.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
BSP, AJOW, WQA and NMW carried out the statistical analysis. BSP, DME, JPK, SMR, WLM and NMW were involved in the preparation of the genotype information. BSP, AJOW, CEP and GDS participated in the design of the study. BSP, AJOW, WQA, JTG, KW, NJT, DMW, JPK, JG, HH, CEP and GDS helped to draft the manuscript. All authors read and approved the final manuscript.