Background
Autism spectrum disorders (ASD) are heterogeneous neurodevelopmental disorders, both in terms of clinical manifestations and genetic risk factors [
1]. Disease frequency among siblings of affected children is approximately 2% to 8%, which is much higher than the prevalence rate of the general population and monozygotic twins have 60% concordance for classic autism and 92% for broader autistic phenotypes, indicating strong genetic inheritance as the predominant causative agent [
2]. Genetic studies show that ASD can arise from rare, but highly penetrant, mutations and genomic imbalances [
3,
4] with more than a hundred disease-associated genes and genomic loci having been reported [
5,
6]. Such mutations may contribute to ASD etiology by affecting conventional genes directly or indirectly by altering the function of non-protein coding RNAs (ncRNAs) expressed in the same genomic loci. Recent evidence has implicated such ncRNAs in neurodevelopmental and neurodegenerative disorders including autism [
7‐
15].
Large transcriptomic consortiums such as ENCODE [
16] and FANTOM [
17,
18] have demonstrated that the human genome is pervasively transcribed and that the primary output are ncRNAs. Through diverse mechanisms, these ncRNAs control protein production and function at multiple levels, including epigenetic control of their corresponding or distant loci [
19,
20], alteration of localization, stability or processing of targets [
20,
21], or by modulating translational efficiency by binding to the 3’ UTR of transcripts, as in the case of microRNAs [
22,
23]. Natural antisense transcripts (NATs) are a conserved class of long (>200 nt in length) ncRNA molecules that are transcribed from the opposite DNA strand of a sense RNA partner with which they have sequence complementarity [
18,
24]. Such antisense RNAs can exert
cis-regulatory functions to increase (concordant) or decrease (discordant) expression levels of their corresponding sense mRNA [
21]. The gene regulator functions can also work
in trans by affecting genes from distant genomic loci.
Here, we developed an algorithm to mine existing public transcriptomic repositories for the presence of NATs that are produced from ASD candidate genes. We believe that ncRNA information processing systems involving such transcripts represent a critical but under-appreciated dimension of the cell machinery that must be considered in order to identify pathological events and facilitate novel therapeutic development strategies for ASD.
Methods
Ethics statement
The University of Miami Institutional Review Board has deemed this study exempt from the full review due to the use of de-identified human post-mortem brain samples, with no possibility to track back the identity of the donors. There is no animal study involved in this paper.
Postmortem brain tissue and RNA extraction
Tissue samples were provided by the National Institute of Child Health and Development (NICHD) at the University of Maryland. A complete description of the samples is provided in Additional file
1: Table S2.
For RNA extraction ~100 mg of brain tissue was lysed in trizol (Life Technologies), 200 μL of chloroform were added and the sample was incubated at room temperature for 10 minutes. The samples were then centrifuged for 20 minutes at 4°C. The supernatant (aqueous phase) was then transferred to a new tube containing 1.5 volumes of 100% ethanol. The ethanol/RNA mixture was then loaded onto a RNeasy column (Qiagen) and purified as per the manufacturer’s instructions, including on-column DNase treatment. Typical yields from both non-ASD and Autism subjects were about 10–12 μg of total RNA from 100 mg of tissue.
Primer design
Primers were designed using Primer 3 software with the sequences from AceView and synthesized by Integrated DNA Technologies (Additional file
2: Table S3). Primers were designed for a splice junction when possible; when primers were designed for an exon they were designed either for a region of the antisense transcript that does not overlap the sense gene or for a region where the antisense overlaps an intron of the sense transcript (Additional file
3: Figure S1). In these cases, strand-specific quantitative real-time RT-PCR was utilized to avoid amplifying the transcript encoded on the opposite strand of DNA.
Quantitative real time RT-PCR (qRT-PCR)
For qRT-PCR, total RNA was reverse transcribed using the High-Capacity cDNA Reverse Transcription Kit (Life Technologies). The cDNA was then diluted 1:5 and was used as a template for both SYBR Green (Life Technologies, 4368706) and TaqMan qPCR using the ABI 7900 (Life Technologies). TaqMan probes for human
PGK1 from Life Technologies (Hs00943178_g1) were used to measure gene expression of the endogenous control. Three technical replicates were performed for each reaction. No-template controls were included in each reaction and the melting curve was analyzed to assess the specificity of each primer (Additional file
4: Appendix 1). In case the primers were designed for a single exon and did not span a splice junction, appropriate no-RT controls were used to avoid including samples contaminated with DNA. The results of the quantitative real-time RT-PCR were analyzed with SDS 2.3 software from Life Technologies.
Strand-specific qRT-PCR
To perform strand-specific measurement of antisense transcript expression, we designed primers for a region of antisense transcript that overlaps with an intron or the promoter of the sense gene. Next, we used one-step RNA-to-Ct SYBR Green Kit (Life Technologies, 4389986). We performed reverse transcription (RT) step in a 384-well optical plate using reverse primers to specifically reverse-transcribe antisense RNA and to exclude the possibility of measuring the expression of the sense pre-mRNA. Samples were then incubated at 95°C for 5 minutes to inactivate the reverse transcriptase enzyme. Forward primers were then added to the reaction and quantitative PCR was performed on the same plate. We included no-RT control and no-template controls for each set of primers to control for non-specific binding.
Statistical analysis
For all qRT-PCR reactions, three technical replicates were performed. To compare the expression of antisense RNAs across the three brain regions, GraphPad prism software was used to perform ANOVA followed by Tukey post-hoc test. A p value of below 0.05 was considered as statistically significant. The Student’s t-test was used to compare the expression between the normal brain and ASD.
Cellular fractionation
SH-SY5Y cells were fractionated using a modified NE-PER Kit (PIERCE) to isolate RNA from the cytosol, nucleoplasm and chromatin. Briefly, the cells were collected and washed twice with PBS. Cell membranes were lysed using a hypotonic buffer and cells were ultra-centrifuged to pellet nuclei, and the cytosol was recovered from the supernatant. Nuclei were further lysed and centrifuged in order to pellet the insoluble chromatin and recover the nuclear extract in the supernatant. The insoluble chromatin pellet was solubilized in PBS with mild sonication. RNA was extracted from each of the three compartments using a combination of two protocols: Trizol LS (Invitrogen) and RNeasy Mini Kit (QIAGEN). Each sample was dissolved in the appropriate amount of Trizol LS (1 mL for 300 μL sample) and incubated for 10 min at room temperature. Chloroform (200 μL for 1 mL Trizol) was added to the mix and the sample was centrifuged for 20 minutes at 4°C. The aqueous phase of the supernatant was transferred into a new tube and mixed with 1.5 volumes of absolute ethanol. The sample was then loaded onto the cartridge provided by the QIAGEN kit and on-column DNase treatment was performed as per the manufacturer’s protocol. RNA quality was verified using the Agilent Bioanalyzer RNA6000 nano kit.
Library preparation
RNA samples were prepared for directional RNA sequencing using a modified version of the Illumina sample preparation protocol. Briefly, 1 μg of total RNA was processed using Ribo-ZeroTM rRNA Removal Kits to remove ribosomal RNAs. Ribosome-depleted RNA was treated with phosphatase before being treated with T4 polynucleotide kinase (PNK). PNK-treated RNA was then purified with the QIAGEN RNeasy column purification kit and 3’ and 5’ RNA adapters were ligated to both ends of the RNA in separate reactions. Next, the RNA was reverse transcribed and PCR amplified. PCR products were purified using AMPure beads. RNA sequencing libraries were validated using the Agilent Bioanalyzer High Sensitivity DNA kit and sequenced using the Illumina HiSeq2000 platform at the Genomics sequencing core at the University of Miami. Each sample was run in a single flowcell to increase depth of sequencing.
RNA-seq analysis
The sequencing reads were pre-processed with a custom Python script to trim library adapters. This allowed the generation of 62,500,000 reads per sample on average, which provided an acceptable coverage and sequencing depth. The trimmed reads were then aligned to the human transcriptome assembly GRCh37 from ENSEMBL using TopHat version 2.0.4 [
25]. TopHat was run with default parameters and Samtools [
26] were used to calculate the alignment statistics for each sample. The bam files generated with TopHat were further used as input for Cufflinks [
27] to perform
ab initio transcriptome assembly. The assembled fragments were then annotated using the Cuffcompare module of Cufflinks and AceView database file as a reference. The fragments that originated from introns and incompletely spliced RNAs were filtered out, and Fragments Per Kilobase of transcript per Million reads Mapped (FPKM) values for fragments transcribed from each locus were added to obtain locus expression.
Discussion
Despite overwhelming evidence for the genetic causes of ASD, an exact mode of inheritance has not been elucidated and the wide phenotypic variability of ASD likely reflects the disruption of multiple gene networks and complex regulatory circuits within the genome. Recent data indicate that multiple genomic loci and several rare and highly penetrant gene variants (e.g.,
NLGN3, NLGN4, SLC9A9, NRXN1, RPL10, SHANK2, SHANK3, CNTNAP2, PTCHD1, and
PTEN among others) are involved in ASD [
5,
6,
47]. These genes and loci may account for 20–25% of children with ASD, but none of them can individually explain more than 2% of the cases [
48].
Most eukaryotic genomes are transcribed as ncRNAs of various sizes ranging from 20 nucleotides to over 100 kb [
16]. The number of ncRNAs in eukaryotic genomes increases as a function of developmental complexity [
17,
49,
50]. Furthermore, many ncRNAs are expressed in the nervous system where they are thought to mediate fundamental biological functions [
51,
52]. Natural antisense transcripts have been reported for greater than 70% of transcriptional units within the human genome [
17] and include primate-specific or human-specific [
53] as well as other evolutionary conserved transcripts [
54]. Aberrant expression of regulatory antisense RNAs might have defined consequences on the expression and/or function of protein-coding transcripts [
15] and in some cases on the epigenetic status of the entire genomic loci [
40,
55]. Transcriptomic as well as
in vivo studies have revealed the importance of several long ncRNAs in the maturation of neuronal cell subtypes [
38,
56,
57]. These recent findings have raised the possibility of a more extensive role for long ncRNAs in regulating gene expression during neuronal differentiation and CNS development. Indeed, it was recently reported that several ncRNAs play functional roles in ASD. For example, a long ncRNA disrupted in schizophrenia 2 (
DISC2) is a NAT overlapping the
DISC1 gene and has been implicated in schizophrenia, bipolar disorder [
58] and autism [
59]. A more recent report has indicated the presence of a non-protein-coding antisense RNA corresponding to suspected ASD locus at 5p14.1 [
60]. This antisense RNA was shown to be strongly increased in post mortem brain tissue of ASD patients compared to control individuals and mechanistic studies suggested its role in regulating the level of the MOESIN protein. Moreover, mutations in an X-chromosome gene
PTCHD1 (
Patched Domain Containing 1) were reported in several families with ASD and intellectual disability. Interestingly, deletion of 5’-flanking region of the gene containing a non-coding RNA were detected in several males with ASD while not present in controls [
61]. Non-protein-coding antisense transcripts are reported in the Fragile X Mental Retardation gene (
FMR1) locus. Fragile X syndrome (FXS) is the leading genetic cause of autism and intellectual disability among boys [
62]. Although FXS is considered a monogenic disorder, there is evidence that supports an alternative model in which other ncRNAs contribute to FXS pathogenesis and to the observed phenotypic variations among patients [
7,
63]. We previously reported a ncRNA transcribed from the
FMR1 locus,
FMR4, that is a 2.4 kb long, primate-specific transcript residing upstream of
FMR1 and which may have an anti-apoptotic function [
7]. Therefore, the abundance of RNA produced by transcriptional events from nearly every region of the genome combined with the enrichment of ncRNA transcripts in the central nervous system make regulatory RNAs a prime target for mechanistic studies of neurodevelopmental disorders.
In the current study, we explored and validated the expression of ncRNAs in several reported ASD-related genomic loci utilizing bioinformatics and molecular biology approaches. Our bioinformatics pipeline allowed us to identify 71 noncoding antisense RNAs that overlap 38 of 103 genes previously implicated in ASD. These findings indicate that a large proportion of genomic loci implicated in ASD have a complex structure with transcription arising from both the plus and minus strands of DNA. Antisense transcripts can exert regulatory roles on gene expression in
cis and
trans and can be affected by mutations. Knockdown or blockade of endogenous antisense transcripts can have multiple outcomes, with the corresponding sense transcript concentration showing either an increase (discordant regulation) or a decrease (concordant regulation). It has been proposed that discordant de-repression of sense transcript expression, resulting in upregulation of sense RNA expression, can be achieved by removal or steric blockade of many but not all antisense transcripts. Here, we noticed that two exonic antisense RNAs,
SYNGAP1-AS and
PQBP1-AS, have tissue expression patterns that are discordant to that of their protein-coding partners, whereas two other promoter-associated antisense RNAs,
NIBPL-AS and
FOXG1-AS, have concordant tissue expression patterns with their sense genes. These findings suggest a possible functional regulation exerted by these antisense RNAs on their sense counterparts, a phenomenon already described for a subset of sense-antisense pairs [
21]. Discordant pairs might interfere with transcription initiation from opposite strand, alter epigenetic structure of or may form double-stranded RNAs. Concordant pairs may potentially share the same regulatory elements, alter stability of sense mRNA or the sense-antisense transcripts are co-regulated as recently described for the majority of divergently transcribed long ncRNA/mRNA gene pairs, expressed during embryonic stem cell differentiation [
41]. Thus, the presence of these transcripts in several ASD candidate genes suggests complex genomic structure of these loci and warrants functional studies that include both protein-coding genes and regulatory long noncoding antisense transcripts.
We demonstrated that 12 of the 18 randomly selected antisense RNAs overlapping ASD-NATs are expressed in the human brain where they can have specific regional expression, suggesting a possible region-specific function of these RNAs. Differential expression analysis of NATs in the PFC, STG and cerebellum revealed a significant increase in SYNGAP1-AS expression in the PFC and STG of autistic patients compared to control individuals. We also observed a statistically significant negative correlation of SYNGAP1-AS and SYNGAP1 expression in the PFC of non-ASD individuals and a similar trend in the PFC of ASD patients. These data, together with the observed discordant regulation of SYNGAP1-AS and SYNGAP1 mRNA, suggest a possible scenario in which upregulation of antisense RNAs lead to the dysregulation of the protein-coding gene expression.
Many noncoding RNAs function at the chromatin level, acting as scaffolds for the recruitment of functionally related epigenetic enzymes to specific loci [
35,
64‐
66]. The expression of these ncRNAs is usually restricted at the nuclear and chromatin level where they exert their function. RNA sequencing analysis of RNA expression in the cytoplasm, nucleoplasm and chromatin of the SH-SY5Y neuroblastoma cell line showed that some ASD-NATs have clear localization in the nucleoplasm or chromatin. The peculiar subcellular localization of these antisense RNAs implies that they may have functional roles in the nucleus and additionally supports the functionality of these ncRNAs in the cell. Among the chromatin-associated antisense RNAs, we found
SYNGAP1-AS, providing additional support to the hypothesis that this NAT might have a regulatory function on its sense mRNA partner by mediating the epigenetic modifications of the regulatory elements controlling
SYNGAP1 expression.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
DV developed bioinformatics pipeline, performed data mining, qRT-PCR, RNA-seq analysis, statistical analysis, prepared figures and drafted the manuscript. MM processed clinical samples, performed RNA extraction, helped in coordination of the project and drafted the manuscript. MAF conceived of the study, helped in its design and coordination, drafted the manuscript and performed RNA-sequencing. All authors read and approved the final manuscript.