Background
Autism spectrum disorders (ASDs) are a set of heterogeneous neurodevelopmental conditions mainly involving impaired communication and repetitive behaviors [
1]. ASD is the most common neurodevelopmental condition in human: The median worldwide prevalence of autism is around 1% according to the latest worldwide surveys [
1]. The etiology of autism presents a strong genetic component: Twin and sibling studies have consistently shown that ASD is one of the most highly heritable complex disorders in humans [
2]. In the past decade, several studies exploiting whole-exome sequencing (WES) data highlighted both de novo and inherited deleterious mutations which are either causative or contributing to the autistic phenotype [
3‐
9]. The increasing scope of WES studies and the ever-growing cohort sizes expanded the discovery rate of ASD-associated genes, to the point where up to 1000 genes are estimated to contribute to a different degree to ASD etiology and are collected in the SFARI genes database [
10,
11]. Yet, none of these genes accounts for more than 1% of idiopathic ASD cases [
12]. This extreme variability may be at the basis of the phenotypic heterogeneity characteristic of ASD. Hence, the identification of subgroups with a more homogeneous molecular asset is essential to comprehend ASD etiology and program personalized treatments. Understanding how the large number of genes implicated in ASD susceptibility may converge to affect human brain development is critical [
12]. Similar to other neuropsychiatric disorders, most of the genes involved in ASD encode for neuronal components crucial for brain function [
10,
11]. In addition, a relevant portion of genes are involved in general transcriptional regulation and/or chromatin remodeling [
10,
13]. The link between deleterious mutations affecting a chromatin-regulatory gene and those involved in synaptic transmission and brain activity remains unclear, although a reasonable hypothesis suggests chromatin-regulatory gene mutations may affect transcriptional programs impacting genes involved in synaptic transmission and brain activity.
Recent works on
postmortem brain tissues revealed shared abnormalities in gene expression in a large subset of autism cases [
14‐
16]. Namely, two main types of gene co-expression modules were shown to be consistently deregulated: (1) downregulated genes involved in synaptic transmission, encoding for neuronal markers and enriched for ASD-associated SFARI genes and (2) modules of upregulated genes involved in immune and inflammatory responses, enriched for markers of microglia and astrocytes, but generally not for genes directly associated with ASD [
14‐
18]. Multiple independent studies observed perturbations of epigenetic marks distribution in
postmortem brain tissue of individuals with ASD [
19‐
21]. Both Wong et al. [
16] and Nardone et al. [
19] reported an altered DNA methylation landscape among multiple brain regions of ASD individuals. Corley et al. [
21] showed that epigenetic alterations detected in ASD were preferentially directed at intragenic and bivalently modified chromatin domains of genes predominately involved in neurodevelopment. Interestingly, the methylation landscape in adult neurons affected by ASD closely resembled the pattern of earlier time points in fetal brain development [
21]. These findings suggest that a delay in the epigenetic program can contribute to deleterious transcriptional programs and to the establishment of ASD phenotypes. This model is supported by the identification of mutations in genes involved in the regulation of both DNA and histone methylation during brain development [
13]. A recent work by Wong et al. [
16] integrated data from gene expression, DNA methylation and histone acetylation from ASD and healthy individuals, proposing the existence of two major subgroups of ASD. The first subgroup recapitulated all known molecular changes typical of ASD [
16]. The second one was indistinguishable from control samples in terms of transcriptional and epigenetic alterations. It is therefore tempting to stratify deleterious genomic variants by classifying their molecular effects into subgroups at higher resolution. In this scenario, a recent publication identified a set of human regulatory regions evolved after the separation from old world monkeys and highlighted how these were enriched within genomic regulatory regions altered in ASD [
22]. Indeed, genomic loci relative to hominoid-specific regulatory regions showed a significant overlap with regions that lose the H3K27ac histone mark, typical of active enhancers, in ASD subjects [
22]. This evidence suggests that ASD-related epigenetic defects may be caused by altered activity of evolutionarily young regulatory regions.
Transposable elements (TEs) are genomic sequences capable to mobilize [
23] and shape the regulatory landscape within the host genome [
24‐
27]. Long Interspersed Nuclear Elements 1 (L1s) are the most abundant autonomous TEs (~ 17% of the human genome) and the only transposon class known to retain the ability to mobilize autonomously in human [
28]. Full-length (FL) L1 elements are about 6 kb long and take advantage of a
copy and paste mechanism where a full-length sequence gives rise to a L1 RNA intermediate which is reverse-transcribed into a new genomic locus. FL L1s contain two open reading frames (ORF1 and ORF2), encoding for, respectively, a nucleic acid chaperone and a protein with endonuclease and reverse transcriptase activity mediating retrotransposition [
28]. Retrotransposition events mostly produce 5′-truncated L1s that are unable to re-mobilize. However, L1s sequences can have an impact on the regulation of transcription of flanking genes [
29,
30].
Only about 100 out of more than 10,000 full-length L1s found in the human genome [
31] are potentially active [
32]. To prevent potential deleterious effects of L1 abnormal activity, cells have developed several mechanisms to safeguard and fine-tune L1 retrotransposition. These include DNA methylation, transcriptional repression and L1s RNA degradation though the activity of the PIWI/piRNA pathway [
33]. Nevertheless, TEs are known to escape silencing at specific embryonic stages [
34], affecting early human development by regulating nearby protein-coding genes. Waves of hypomethylation during embryogenesis are linked to higher rates of transcription and retrotransposition of L1 RNAs [
35]. Somatic L1 retrotransposition has been observed in the neuronal lineage leading to brain mosaicism, and L1 activity has been shown deregulated in a plethora of neurodegenerative and neurodevelopmental diseases [
35‐
40] although its extent and functional significance remain unclear [
35,
40,
41]. Recently, attention has been focused on the functional role of L1s independently from retrotransposition [
42]. The majority of L1 RNAs are retained in the nucleus, and they can function as regulatory long non-coding RNAs (lncRNAs), controlling transcriptional and chromatin landscapes. For example, L1 RNAs are required for mouse embryonic stem cell (mESC) self-renewal and pre-implantation during development [
43]. L1 RNA expression and mobilization must therefore be considered two independent activities under distinct regulatory pathways and with different functional outcomes [
44,
45].
Altered DNA methylation levels within L1 sequences have been shown in multiple neurodevelopmental diseases, including ASD [
46]. The reduction of methylation and an increase in L1 expression was reported in ASD
postmortem brains [
46,
47]. Here, trimethylation of histone H3K9 (H3K9me3), which is responsible for the formation of condensed heterochromatin and prevents L1 activation, was significantly reduced at L1 ORF1 and ORF2 sequences but not at the 5′-UTRs in ASD samples [
46]. Furthermore, Tangsuwansri et al. [
47] demonstrated that, in lymphoblastoid cell lines derived from a subset of ASD subjects with severe language impairment, the overall methylation level of L1 elements was decreased compared to controls, and this was inversely correlated with the level of expression of L1-containing genes [
47].
In this study, we aim at quantifying the expression of young and transcriptionally active L1 elements in ASD assessing whether specific individuals affected by ASD or in vitro models of ASD show an altered L1s expression. Furthermore, we characterize and explore the transcriptional dynamics of L1 elements and the possible impact of their expression on ASD-relevant genes.
Discussion
ASD is a highly heterogeneous group of neurodevelopmental disorders. The identification of common molecular targets is therefore instrumental in defining homogeneous groups of individuals affected by ASD for clinical diagnosis and personalized medicine. Several works determined that transcriptional deregulation affecting both coding and non-coding gene expression occurs in ASD [
14‐
18]. However, the transcription of TEs has often been overlooked with only a few studies showing a general alteration of expression and epigenetic regulation of L1s in ASD [
46,
47]. Here, we devoted special attention to the pattern of expression of evolutionarily young FL L1s, since they seem pervasively transcribed and are both a controller and controlled by the epigenetic status of a cell [
34,
40,
70,
71]. The main aim of this work was to assess whether L1 expression is altered in ASD brains, in an in vitro model of iPSC and differentiated neurons KOs for several genes known to be directly involved in the etiology of ASD, and in the blood of discordant siblings. Moreover, we aimed at evaluating the impact of L1 expression on the transcription of protein-coding genes.
Our results show that all ASD/KO samples present a moderate positive net number of upregulated L1s. However, a rather consistent increase was evident only in three samples of
postmortem ASD ACC and in iPSC and differentiated neurons KO for ATRX. These samples showed instances of widespread L1 upregulation, with 30–50% of analyzed L1s presenting significantly higher expression levels compared to controls. ATRX is a SFARI level 1 gene encoding for a protein which contains an ATPase/helicase domain belonging to the SWI/SNF family of chromatin remodeling proteins [
52,
68]. ATRX KO cells were previously shown to present increased chromatin accessibility at genomic loci occupied by retrotransposons [
68]. By the analysis of public ChIP-seq data for ATRX in human cell lines, we showed a strong enrichment of upregulated L1HS/L1PA sequences in both iPSC and differentiated neurons, suggesting that ATRX loss of function may directly lead to an increased transcription of young FL L1 elements.
Large-scale genome investigations have contributed to the identification of almost one thousand genes putatively involved in ASD [
11]. These genes may be arbitrarily divided into two large functional groups: (a) genes which exert a crucial role in synaptic function and (b) genes involved in transcription regulation and/or chromatin remodeling, including ATRX. It is therefore tempting to speculate that mutations in a subset of the latter may directly influence L1 transcriptional regulation, since they are better positioned to have a genome-wide impact on the epigenetic status of chromatin and therefore exert a widespread effect on the transcriptional landscape of cells.
Upregulation seems to occur in a cell-type-specific manner since individuals characterized by the strongest L1 increase in the ACC do not present the same pattern of expression also in the PFC. Furthermore, no changes are observed in the blood of ASD subjects compared with their healthy siblings. However, it is important to note that the analyzed blood dataset is characterized by a rather lower coverage (~ 10–15 million reads of 50 bp per sample) compared to RNA-seq data retrieved from Velmeshev et al. and Deneault et al. (~ 100–150 million reads of 150 bp per sample). Further studies are therefore needed to confront this important issue.
In the study of cell lines, it is interesting to point out that only differentiated neurons KO for ATRX showed a high number of DE genes, although all the KO for ten ASD-related genes showed a certain extent of gene deregulation in iPSC. This was the neuronal sample with the strongest L1 upregulation. This result is consistent with the observation that brain samples characterized by the strongest L1 upregulation are also among the ones with the highest number of DE genes. An increased level of L1 RNA expression in the brain seems thus a biological marker which can be associated only with a subset of ASD cases characterized by the deregulation of a large number of canonical coding and non-coding genes and an increase in intron retention, all features consistent with the idea of a general chromatin dysregulation. The possibility of the existence of two distinct groups of ASD subjects, rather different at the molecular level, is in line with the work by Wong et al. [
16] that suggested the existence of two major subgroups of ASDs. While the first subgroup recapitulated the known molecular changes typical of ASD [
16], the second one was indistinguishable from control samples in terms of transcriptional and epigenetic alterations.
Most upregulated L1s are intronic, and some of them might be transcribed independently from their host transcripts. In one of the ACC sample characterized by L1 upregulation, a small number of upregulated L1s were hosted in significantly downregulated protein-coding genes. The analysis of an independent dataset [
70] added support to the possible existence of an inverse L1/host gene expression relationship for a subset of genes and that this pattern might be a feature of genes with neuronal functions expressed in specific areas of the brain. However, all types of expression patterns were found. An higher number of loci showed a concomitant upregulation of L1s and their host genes, while in other loci increased expression of L1s seemed to have no consequences. The inspection of L1/host gene transcriptional relationship is of crucial importance to understand the effects of L1s on the transcriptional output of the genome in
cis and warrant further investigations.
We also observed a significant relationship between the number of reads mapped in the introns and the expression of FL L1 (
p value = 1.4e−08,
r value = 0.75). We are aware of the fact that an increased number of intronic reads might be responsible for an apparent overexpression of intronic TEs [
72]. For polyA + samples, this can be a bias introduced by a differential intron retention. However, when this happens, the apparently upregulated TEs reside in retained introns. In our analysis, we find only a single overlap (out of ~ 800 retained introns) between upregulated TEs and retained introns which suggests that the upregulated L1 might derive from independently transcribed units.
Several models can be proposed on how L1s can regulate the expression of their host genes, influence the differentiation and homeostasis of neurons and be a crucial player in triggering neuronal dysfunction. The function of TEs as regulatory non-coding RNAs is currently under intense investigation. The identification of a large number of DNA/RNA hybrids at L1s loci suggests that TEs might exert their function in
cis [
69]. They can also act by organizing locally chromatin domains within the same
locus or by recruiting other sequences belonging to different chromosomes. Being capable of interacting with different proteins, they can recruit complexes to specific regions of the genomes. In this context, a recent study showed that L1 RNAs were functionally crucial for binding of Nucleolin-KAP1 complex to its target chromatin, allowing for ESC self-renewal and promoting rRNA synthesis [
73]. According to this model, expressed L1 sequences can control and be controlled by the deposition of epigenetic marks and may promote silencing and subsequent re-activation of specific sets of genes during development. Within a gene/L1 pair, L1 expression may give rise to transcriptional interference on the host gene or may guide the establishment of novel epigenetic marks. Given the increasing evidence that epigenetic alteration occurs in ASD [
16,
20,
21,
47] and may directly occur at L1s genomic sequences, a direct link between epigenetic control of L1s RNA by regulatory genes mutated in ASD cases and the expression of host genes may be hypothesized.
Our study focuses on L1s fragments longer than 5 kbp, including FL L1s that maintain the potential to retrotranspose. It remains therefore open the possibility that the increased expression of L1s gives rise to uncontrolled retrotransposition and therefore to somatic mosaicism in ASD brains [
35,
38,
40]. Recently, ASD brains have been shown to present somatic single nucleotide mutations [
74,
75]. A deep analysis of genomic sequences in ASD
postmortem brains is therefore required to have a full understanding of the molecular consequences of L1s RNA upregulation.
Our results also have relevant clinical implications. If deleterious mutations within ATRX or a defined set of genes with similar roles in L1s transcriptional control are indeed at the basis of the molecular etiology of a subset of ASD subjects, WES-derived data may be used for personalized medicine. Effective medications for the treatment of ASD core symptoms are still lacking. Most of the drug currently in development for ASD is derived from knowledge of genes implicated in monogenic disorders associated with altered neurodevelopmental trajectories and autistic symptoms such as Fragile X, Landau–Kleffner and Rett syndromes [
76,
77]. As a consequence, the therapeutic approach to ASD subjects has traditionally focused on associated conditions [
76], with poor impact for its core symptoms. These drugs typically target genes associated with synaptic pathways such as dopaminergic and glutamatergic receptors [
76]. Interestingly, a recent study revealed that treatment with a low dose of romidepsin restored social deficits in animal models of autism [
78]. Romidepsin inhibits the activity of the enzyme histone deacetylase [
78], thus restoring the expression of genes involved in neuronal signaling and downregulated in ASD.
Our results have to be considered exploratory and need to be reproduced in bigger cohorts. If validated, they may provide a basis to stratify ASD cases for clinical treatments with drugs modifying the epigenetic status of cells or by interfering with L1 RNA expression. Recent observations on the potential therapeutic use of manipulating TEs in disease conditions in other tissues [
79] suggest that molecular tools to interfere with TEs expression could represent a new strategy for the personalized treatment of neurodevelopmental disorders.
Conclusions
The analysis of TEs expression is technically very challenging and caution is needed in the interpretation of results. However, this should not prevent the exploration of the behavior of such elements both at the transcriptional and the genomic level. Much information relative to TEs expression in diseases should be already present in the large amount of data so far collected and, at least in part, made available to the Community. The limited knowledge and lack of widely adopted standard pipelines to analyze TEs often prevent their analysis in the original clinical studies. However, TEs-specific exploration of public dataset has to be considered an important step in exploiting the full potential of genomics.
The importance behind TEs relies on the fact that these elements are revealing a high degree of activity in the brain and their dysregulation appears to be associated with diseases. Given the heterogeneity of neurological diseases and the paucity of studies specifically addressing this issue, it is not surprising that current results, based on small cohorts, seem to be giving contrasting information. Here, we present evidence suggesting that dysregulation of L1s in ASD is not a feature common to all ASD subjects but only to a subgroup of them, clarifying recent observations which proposed dysregulation of L1 as a common feature of ASD subjects. Identifying subgroups of subjects in neurological diseases is crucial and might have therapeutic implications such as being at the basis of stratification of cases for specific clinical treatments informing the choice for specific drugs. The pattern of L1 expression we observed in our analysis could indicate a mechanistic relationship between L1 expression and broader gene expression regulation, or it could represent a marker for widespread expression dysregulation. While our analysis mostly rules out technical biases, results must be taken with the proper care and should be validated in larger cohorts.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.