Background
Reading disability (RD or developmental dyslexia) and specific language impairment (SLI) are two of the most prevalent neurodevelopmental disorders, with a prevalence of ≈5–8 % among school-aged children (as reviewed in [
1‐
3]). Both RD and SLI are multifactorial disorders with moderate to high heritabilities and are characterized by high comorbidity, also with other neurodevelopmental disorders such as attention deficit hyperactivity disorder (ADHD) and speech sound disorders (SSD) [
2,
4,
5]. It is likely that RD and SLI, as well as the underlying reading- and language-related skills, share some genetic/neurobiological mechanisms [
6,
7].
Candidate genes that have been implicated in reading- and language-related traits include
DYX1C1 (15q21),
KIAA0319 and
DCDC2 (6p22),
MRPL19/GCFC2 (2p12),
ROBO1 (3p12),
CNTNAP2 (7q35),
CMIP and
ATP2C2 (16q23-24), and
FOXP2 (7q31) (see [
8‐
10] for reviews). More recently, genome-wide association scans (GWAS) using measures of both reading and language have reported suggestive associations in
ABCC13 (21q11.2),
DAZAP1 (19p13.3),
ZNF385D (3p24.3),
FLNC (7q32.1), and
RBFOX2 (as reviewed in [
10]). Several of these candidate genes are known to have roles in important processes in central nervous system (CNS) development, such as neuronal migration, axonal guidance, and neurite outgrowth [
8]. Moreover, a link with steroid hormone-related biology has also been hypothesized (see [
11] for further details).
In these genes, most of the variants that have been tentatively associated with reading and/or language traits are single-nucleotide polymorphisms (SNPs), although other types of genetic variants have also been implicated. These include balanced translocations disrupting
ROBO1 [
12] and
DYX1C1 [
13] in dyslexic cases and translocations and deletions affecting
FOXP2 in a severe form of speech and language disorder, involving childhood apraxia of speech (CAS) [
14].
The putative genetic associations reported so far can explain only a small proportion of heritable variance in reading and language skills [
1,
4,
10]. Part of the “missing heritability” may be represented by Copy Number Variants, defined as structural variations in the genome that result in regions larger than 1 kb showing a non-diploid copy number. Several copy number variants (CNVs) have been identified in severe neurodevelopmental and neuropsychiatric disorders, including schizophrenia (SCZ), Autism Spectrum Disorders (ASD), Intellectual Disability (ID) and Developmental Delay (DD) [
15,
16]. However, only a few studies have focused on reading and/or language impairments, which we review briefly here. In the majority of these studies, a perfect co-segregation between CNVs and poor reading/language performance has seldom been observed.
In a recent investigation on ten Indian dyslexic families, de novo CNVs were identified at several loci, namely
GABARAP (17p13.1),
NEGR1 (1p31.1),
ACCN1 (17q11.21),
DCDC5 (11p14.1), and the known SLI candidate gene
CNTNAP2 (7q35) [
17]. In the same families, candidate susceptibility CNVs affecting the
PCDH11X gene (Xq21.31-q21.32) were also identified [
18]. In a Dutch family, Poelmans and colleagues [
19] identified a heterozygous deletion in 21q22.3 co-segregating with RD and encompassing the genes
PCNT,
DIP2A,
S100B, and
PRMT2.
The largest study to date on CNVs in dyslexia involved 376 RD cases and 337 controls. Candidate susceptibility CNVs were found, overlapping
IMMP2L and
AUTS2 (7q11.22) [
20], a well-known ASD susceptibility locus.
With regard to language impairments, Wisznieski et al. [
21] identified a heterozygous deletion disrupting the gene
TM4SF20 (2q36), co-segregating with language delay in 15 Southeast Asian families. In a CNV scan of SLI families, a ~21-kb exonic microdeletion within
ZNF277 (7q31.1, adjacent to the
IMMP2L/DOCK4 locus) was found [
22]. A genome-wide CNV study comparing 127 independent SLI cases from the same dataset, together with first-degree relatives and unrelated controls, reported novel candidate de novo CNVs [
23], disrupting the genes
ACTR2 (2p14),
CSNK1A1 (5q33.1), and the regions typically involved in 22q11.2 and 8p23.1 duplication syndromes. A recent CNV screen in a longitudinal cohort of children with language-related difficulties or family risk of dyslexia revealed a de novo deletion in 15q13.1–13.3, observed in a child with persistent language impairment, normal reading skills, and no evidence of sensory or neurological problems [
24]. This large heterozygous deletion had been previously reported in cases of broader neurodevelopmental delay [
24].
Other CNVs have been associated with poor reading or language performance in the context of other comorbid disorders. A deletion disrupting both
DOCK4 and
IMMP2L (7q31.1) was found to co-segregate with poor reading performance in a family with two ASD cases, and another
DOCK4 exonic deletion co-segregated with RD in a distinct dyslexic family [
25]. Canonical 16p11.2 microdeletions—usually implicated in mild cognitive impairment, general developmental delay, speech and language problems, and ASD—have been associated with CAS by independent studies [
26,
27]. The same microdeletion was hypothesized to act jointly with a 6q22.31 duplication in a subject with CAS and pervasive developmental disorder [
28]. Prader-Willi/Angelman patients, presenting deletions/duplications of the 15q11.2 region, have been reported to frequently exhibit speech and language delays [
29]. Similarly, subjects with 2p15-p16.1 microdeletion syndrome typically show cognitive, linguistic, and psychiatric disabilities. In this region, a de novo deletion encompassing
BCL11A has been implicated in a mild phenotype characterized by apraxia, dysarthria, and expressive language delay [
30].
Recently, Stefansson and colleagues [
31] investigated the effect of several CNVs previously associated with SCZ or ASD (hereafter called “neuropsychiatric CNVs”) on different cognitive traits in a large Icelandic sample (
N~102,000). By comparing SCZ patients, neuropsychiatric CNV carriers, other CNV carriers, and general population controls, they found that neuropsychiatric CNV carriers performed at a level between SCZ patients and controls on several psychometric tests, suggesting an effect of these CNVs on general cognition. Some neuropsychiatric CNVs showed association with cognitive abilities: among these, 16p11.2del and 22q11.21dup were associated with category and letter fluency, while 15q11.2del was associated with a history of dyslexia and dyscalculia.
In the present study, we have further investigated the potential influence of CNVs on reading and language performance through a comprehensive set of analyses, including total genome-wide CNV burden testing and two complementary methods to screen the genome for individual CNVs that may affect these traits. We used a dataset that has been previously included in a SNP-based GWAS meta-analysis (GWASMA) of reading and language traits [
11], composed of children recruited for school history of RD or ADHD, and their unaffected siblings.
Current CNV research in psychiatric genetics often relies on case/control dichotomous classifications and seldom detects perfect co-segregation between CNVs and disease status. When heritable quantitative traits are available that are strongly correlated with a dichotomous definition of a disorder—as in the case of reading/language traits—analyzing the effect of putative CNVs directly on the quantitative trait provides an effective alternative to the analysis of co-segregation between CNVs and the disorder. The former analysis is aimed at detecting variants with reduced penetrance and variable expressivity on traits of interest, while the latter one is aimed at detecting variants with full penetrance and expressivity. We used both approaches in our study.
Discussion
Our research on potential effects of CNVs on reading and language is novel for two main reasons:
First, we investigated the effects of CNVs on a continuous index of reading and language performance, in datasets enriched for the lower tail of the population distribution. Although a similar approach was used in a recent study by Stefansson and colleagues [
31], who investigated the effect of candidate neuropsychiatric CNVs on cognitive traits in a large sample of the Icelandic population, their study analyzed a broad spectrum of cognitive abilities and included general population controls. It was not aimed at capturing shared variation derived from a detailed battery of reading and language measures in a selected population, as was our study.
Second, to detect effects of CNVs on continuous reading and language performance, we used two complementary approaches: one aimed at detecting copy number-dependent effects in a “dosage-dependent” additive model and one aimed at detecting associations with a “CNV-positive” state irrespective of the non-diploid copy number. These two analyses were performed in order to identify potential CNVs with reduced penetrance and variable expressivity and were in turn complementary to our analysis of co-segregation between CNVs and RD status, which was aimed at detecting variants with high penetrance and expressivity.
In our dataset of subjects with school histories of RD/ADHD and their siblings, we did not identify a significant correlation between CNV genomic burden—both in terms of total length and of total number of CNVs per subject—and PC scores representing reading-language performance. Similarly, our case-control burden analysis did not reveal any significant contribution of CNVs to RD. This is in line with a previous study which detected no significant difference in the genomic burden of large rare CNVs between RD cases and controls [
20]. However, our result is in partial contrast with a recent study which reported an increased CNV burden in SLI cases compared to controls [
23]. On balance, it appears likely that CNVs have a relatively limited role in affecting reading-related performance at the population level, whereas they are known to play a more important role in severe neuropsychiatric disorders such as autism, schizophrenia, and ID [
15,
16,
20] and may also affect severe cases of SLI. Further analyses in independent datasets will be needed to clarify the extent to which CNVs may affect cognitive domains that are shared between reading and language.
In this study, we detected a CNV which co-segregated with the dyslexic status in a family with two RD cases—including the most severely impaired subject in our dataset—and one unaffected sibling. This large CNV event spanned ~1.2 Mb in the pericentromeric region 11q11-q12.1, covering several
OR (olfactory receptors) and
TRIM (tripartite motif protein) genes. While TRIM proteins are not well characterized, the role of olfactory receptors in triggering odor perception signals in sensory neurons is well known. Interestingly, olfactory bulbs dysgenesis/agenesis has been previously implicated in ASD [
43] and reduced volumes have been reported in schizophrenic patients [
44]. However, the partial overlap of this CNV with a centromeric region and the relaxed selection at the
OR loci [
45] suggest caution in the biological interpretation of this variant.
Two other CNVs shared between cases were detected, overlapping potential susceptibility genes. These two heterozygous duplications were observed in a pair of affected siblings but were not detected in any unaffected participant. One of them overlapped 9 exons in the 3′ terminal region of the
UTRN gene (utrophin, or dystrophin-related protein 1, 6q24.2) and the other one overlapped 12 exons within
DNAH14 (dynein axonemal heavy chain 14, 1q42.12). Utrophin is a large skeletal muscle protein—also expressed in the CNS—contributing to postsynaptic membrane maintenance and to clustering of acetylcholine receptors in the neuromuscular synapses and possibly playing a role in anchoring the cytoskeleton to the plasma membrane. However, as the partial duplication of
UTRN overlaps its 3′-UTR region, it is possible that this has no effect on the mRNA produced. This possibility may be addressed through future gene functional analysis. Dyneins are microtubule-associated motor proteins with a key role in cilia-mediated cell motility. Independent studies have reported evidence of involvement in cilia-related processes for two RD candidate genes,
DYX1C1 [
46,
47] and
DCDC2 [
48]. This led to the hypothesis that dyslexia may sometimes be a form of ciliopathy [
46], involving abnormal neuronal development and migration [
48].
Pathway-based enrichment testing of CNV calls detected in RD cases revealed no significant associations for three candidate gene sets representing mainstream hypotheses on the etiology of RD, namely axon guidance, neuron migration, and steroids-related processes. This is in line with the pathway enrichment test based on SNP associations in an earlier GWASMA study that we carried out [
11].
The two complementary strategies for genome-wide association testing between CNVs and our principal component reading-language scores revealed partly different but partly consistent results. The first of these analyses—which made use of CNV calls and tested association with CNV state at each probe—was aimed at detecting associations in regions of overlap of CNV calls, irrespective of the abnormal copy number state. The second analysis, in FamCNV, assessed copy number (or allele dosage)-dependent associations between DNA array intensity data and PC1/IQadjPC1. These are practical strategies to detect different kinds of effects of CNVs on continuous traits, both of which have precedence in the literature: in recent studies, copy number-dependent (dosage) effects were reported for continuous traits such as body mass index [
49] and structural brain measures [
31]; while either deletions or reciprocal duplications of specific regions have been reported to result in similar clinical and phenotypic manifestations, as in the case of ASD, language/developmental delays, and other psychiatric disorders [
15,
16]. Both analyses were run probe-by-probe, but results were then interpreted in terms of consecutive probes showing significant associations, which was appropriate for an analysis of CNVs.
Although no associations reached genome-wide significance in PLINK QFAM meta-analysis, some of the top associated regions involve plausible candidate genes. A ~11-kb CNV overlap, associated with IQadjPC1, lay in an intronic region within
CNTN4 (contactin 4, 3p26.3; Additional file
3: Figure S3a). This overlap was shared by three heterozygous duplications and one heterozygous deletion, which all showed concordant positive effects on PC scores. Contactins are Ig cell adhesion molecules with a fundamental role in neuronal development and plasticity. CNVs and structural rearrangements disrupting
CNTN4 have been implicated in severe neurodevelopmental disorders such as ASD [
50,
51] and DD [
52]. Interestingly, the associated region detected in the present study overlaps with CNVs reported in ASD cases in two previous studies [
50,
51], and contactin 4 is widely expressed in the brain, particularly in the cerebellum, thalamus, amygdala, and cerebral cortex [
50]. However, this association was weaker and not significant with PC1.
Another intronic CNV overlap region of ~21 kb, associated with both PC1 and IQadjPC1, was found within
CTNNA3 (catenin alpha 3, 10q21.3; Additional file
3: Figure S3c). This region resulted from the overlap of three deletions and showed a negative effect on PC scores. α-catenins have a crucial role in cell adhesion, and
CTNNA3 has been implicated in ASD etiology through both CNV studies [
53,
54] and GWAS studies [
55,
56]. Our associated region partially overlaps an inherited compound heterozygous deletion encompassing exon 11, found in an ASD patient [
53]. Expression of
CTNNA3 in mouse hippocampus and cortex at postnatal day 0 suggests a specific neuronal role at very early developmental stages [
53].
We also assessed CNV overlaps in regions previously reported to be disrupted by CNVs in reading, language, or more severe neuropsychiatric disorders. Among these, a ~134-kb region of overlap between nine heterozygous duplications and one heterozygous deletion, encompassing several exons in the 3′ region of
CHRNA7 (cholinergic nicotinic receptor alpha 7, 15q13.3; Fig.
2), presented nominally significant association with PC1 in the CLDRC-RD subset (while no CNV calls were detected in CLDRC-ADHD). The association only approached significance after IQ adjustment and the CNV state exerted a positive effect on PC1/IQadjPC1, with both deletion and duplications showing the same direction of effect. Nicotinic cholinergic receptors are ligand-gated ion channels that mediate fast signal transmission at synapses and are ubiquitously expressed in the CNS. Several studies have suggested a possible involvement of
CHRNA7 in language skills. A CNV encompassing this gene was suggested to contribute to the disruption of synaptic pathways in a patient with ID and language impairment [
57]. A genome-wide CNV screen also reported
CHRNA7 among the genes disrupted in a group of unrelated SLI cases, as well as a significant over-representation of the GO category
acetylcholine binding in a pathway-based analysis of these CNVs [
23]. A recent longitudinal study of children with language difficulties implicated a deletion at 15q13.1-13.3 (BP3-BP5) in the etiology of SLI, and the authors hypothesized a role of
CHRNA7 in the phenotypic effects associated to this region [
24]. The 15q13.3 region is also a hotspot of neuropsychiatric CNVs, which have been implicated in several disorders including SCZ, ASD, ADHD, and epilepsy [
15,
16]. CNVs encompassing this gene have also been tested for effects on general cognitive abilities, including school history of mathematical and reading difficulties, but no associations were reported [
31].
Similarly to PLINK QFAM analysis, FamCNV meta-analysis did not reveal any genome-wide significant association. However, we found a series of eight contiguous SNPs associated with both PC1 and IQadjPC1 in the CLDRC-RD analysis, ~6-kb downstream of
ZNF737 (zinc finger protein 737, 19p12, Fig.
3). This ~58-kb region lay within a ~80-kb deletion which was common in our dataset, and the association was also observed at the nominal significance level in the PLINK QFAM analysis of CLDRC-RD. Both FamCNV and QFAM analysis indicated a positive effect of this deletion on PC1/IQadjPC1. Zinc finger protein 737 has not been functionally characterized, but the presence of a zinc finger domain suggests a possible involvement in transcriptional regulation. Interestingly, a microdeletion within another zinc finger gene,
ZNF277, has been suggested as susceptibility CNV for SLI [
22].
In spite of the interesting suggestive associations discussed above, the modest sample size and absence of a replication sample constitute limitations for the present study, and further analyses in larger datasets are warranted. In addition, the localization of CNV breakpoints and functional validation of candidate CNVs can help to validate and extend such associations. Also, the definition of RD cases was necessarily somewhat arbitrary (as in all studies). Nonetheless, for completeness of our analysis, we used this approach to assess co-segregation with CNVs in the sibships. As there is no universal agreement on the diagnostic definition of dyslexia [
1,
3], we used a “performance only”-based criterion, classifying all the participants in the lowest 10 % of the IBGdiscr score distribution as RD cases and considering them as representative of reading impairment. Finally, it may be observed that many of the CNVs identified are in CNV hotspot regions and they may represent benign variants. Nonetheless, the fact that these CNVs are frequently detected in the general population does not rule out potentially modifying effects on reading and language skills. For this reason, we did not exclude CNVs present in the DGV from our analysis.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
CF and SEF conceived and supervised the present study. AG performed the statistical analyses, genotype, and phenotype QC. AG, CF, and SEF designed the experiments and wrote the manuscript. AV and MF provided technical and theoretical assistance in FamCNV analysis and pre-analysis QC. SDS was responsible for DNA samples management. JCD, EGW, RKO, and BFP are CLDRC co-investigators (involved in the collection and preliminary elaboration of reading and language traits). All the authors contributed to the paper and approved the submitted version.