Background
Developments in the field of transcriptomics contributed to the discovery of vast number of RNAs, majority of which are non-coding RNAs [
1]. Long non-coding RNAs though do not code for functional proteins, but are known to play significant regulatory roles which in turn impact various biological processes including development, differentiation, and metabolism [
2‐
5]. While only a small number of the lncRNAs have been extensively characterized, they are largely thought to act by their interaction with other biomolecules in the cell: DNA, RNA and protein [
6‐
10]. However, lncRNA-miRNA and lncRNA-protein interactions have been discussed previously [
7,
11,
12], the interaction of lncRNAs with DNA has not been studied extensively [
13]. Previous studies have reported that binding sites of well-known lncRNA HOTAIR and HOTTIP have an enriched DNA sequence motif. Using chromatin isolation by RNA purification sequencing (ChIRP-seq) technique, it was revealed HOTAIR lncRNA preferentially occupies a GA-rich DNA motif leading to disruption in Polycomb occupancy thereby regulating the chromatin state [
14]. Alternatively, Xist does not directly interact with DNA, rather harnesses the sequence-specific YY1 transcription factor to attach to sites in the X chromosome [
14,
15]. The lncRNA Fendrr expressing in lateral mesoderm of mid-gestational mouse embryos, interacts with both PRC2 and TrxG/Mll complexes in vivo via dsDNA/RNA triplex formation at target regulatory elements, and directly increases PRC2 occupancy at these sites [
16]. In another example, the region between minor and major transcript initiation sites (~400 bp) of DHFR gene gives rise to a noncoding RNA transcript. This lncRNA transcript has shown to repress the transcription of downstream protein coding gene by forming a purine–purine–pyrimidine triplex motif with the DHFR promoter [
17,
18].
PARTICLE lncRNA that expresses after low dose radiation exposure was reported to interact with the tumor suppressor methionine adenosyltransferase (MAT2A) promoter through triple helix formation. It was also observed to interact with transcription-repressive complex proteins G9a and SUZ12 (subunit of PRC2) and cause transcriptional repression of MAT2A gene via methylation of this promoter [
19]. Recent reports show presence of GA-rich sequences at MEG3 binding sites, which modulate the interaction of the lncRNA through RNA–DNA triplex formation, and is assumed to be a characteristic of target gene recognition by the chromatin-interacting lncRNAs [
20]. Another similar report, depicts RNA:DNA–DNA triplex formation at
SPHK1 (Sphingosine kinase 1) promoter by an antisense RNA khps1. This leads to up regulation of the
SPHK1 gene via histone acetylation and ultimately cause increase in cell proliferation [
21]. Thus it can be interpreted that lncRNAs could function through interactions with genomic DNA by forming of DNA–RNA triplexes, where lncRNAs act as a third strand [
7].
Since long time intermolecular triple helix formation has been implicated as possible mechanism of controlling cellular processes such as inhibiting protein-DNA interaction and functional processes including transcriptional regulation, chromatin modification, post-transcriptional RNA processing and DNA repair which is mainly revealed by in vitro experiments [
22]. Large number of proteins such as helicases, heterogeneous ribonucleoproteins (hnRNP), cytoplasmic type III intermediate filament (IF), transcription factors (TFs), high mobility group (HMG) box proteins as well as proteins involved in the cell cycle and DNA repair have shown to be associated via interacting at the triplex sites [
23]. Traditionally the presence of triplexes in vitro was investigated by gel retardation assays, circular dichroism and UV absorbance spectroscopy. Gel shift in gel retardation assay, presence of two melting peaks and a sharp negative peak at 210 nm in UV melting and CD spectroscopy respectively are characteristic features that helps in detection and validation of triple helical structures in vitro [
22,
24]. A number of experimental approaches to identify RNA–DNA interaction sites have been used in the recent years such as chromatin isolation by RNA purification (ChIRP) [
25] and capture hybridization analysis of RNA targets (CHART) [
26]. These approaches are based on affinity capture of target RNA: chromatin complex or RNA by designing antisense-oligos, followed by high-throughput sequencing which then produces a map of genomic binding sites. Computational approaches for instance, Triplexator [
27], R-loop finder [
28] and Triplex-Inspector [
29] offers enormous promises to computationally predict such interactions on genome-scale. Thus the upcoming methodologies to discover triplexes in the genome can aid in the study of functional triplexes and the roles played by non-coding RNAs.
Depending on sequence composition and relative orientation of the RNA strand (i.e. third strand interacting with duplex DNA), triplex structures can form 3 types of motifs (1) pyrimidine motif (Y) wherein the third strand is composed of pyrimidine (CT) bases bound parallel to the purine strand of DNA (2) purine motif (R) wherein the third strand is composed of purine (AG) bases bound antiparallel to the purine strand of DNA (3) mixed motif (M) where guanines and thymines bind either parallel or anti-parallel with respect to the purines in the duplex [
24,
27]. Using an exhaustive computational approach, we screened the human genome for potential triplexes mediated through lncRNAs to understand their possible function mediated through formation of a triplex.
In the present study, we explore evidence on whether lncRNAs could modulate genomic regulation by interacting with DNA through the formation of highly stable DNA: DNA: RNA triplexes. The enrichment of PTS in the promoters of genes suggests its role in gene regulation and the same was evident when we constructed a co-relation network between lncRNAs and genes, consistent with the known role of some lncRNAs in transcriptional and epigenetic regulation of genes. To the best of our knowledge, this is the first comprehensive genome-wide computational analysis of PTS mediated through lncRNAs.
Discussion
Current understanding of lncRNA function is limited to a few candidate lncRNAs when compared to the large number of lncRNAs annotated till date [
40]. Increasing evidence suggests that at sequence levels lncRNAs play critical roles, although they have low sequence conservation, the promoter region are largely conserved with respect to their exons as in case of mRNA promoters [
41,
42]. The present functional associations of lncRNAs are largely limited to ‘guilt by association’ methods. The integration of omics datasets can help us predict the potential functional role of lncRNAs. For example, diverse number of lncRNAs have been closely associated with p53 and found to be regulated by p53 in turn being responsible for maintaining cellular stabilities [
43]. Computational approaches would provide valuable insights in understanding the biogenesis, regulation and function of lncRNAs thus could provide a huge impetus in the field. With the availability of sequencing based approaches to understand bio-molecular interactions—be it Protein: RNA, RNA: RNA or RNA: DNA interactions are also described previously in this manuscript.
Previous studies have identified limited lncRNAs such as MEG3, DHFR, FENDRR, HOTAIR which can participate in DNA duplex—lncRNA triplex formation. We employed a computational approach to screen the human genome for possible triple helical formation mediated through lncRNAs. We screened for 23,898 transcripts annotated as lncRNAs in the GENCODE annotation (v19) across the human genome (hg19) for potential triplex forming sequence stretches (PTS). The calculated PTS frequencies were compared across five major features, namely 5′UTR, CDS, 3′UTR, Introns, Promoter and 1000 bases downstream of the transcription termination sites. Additionally, annotation of these regions was done by mapping of experimental regulatory regions, different classes of repeat regions and transcription factors (TF) derived from UCSC Table Browser. A number of lncRNAs have been shown to interact with transcription factors as evident by their higher frequency in promoter region. To elucidate possible functional roles of triplex forming lncRNA, we counter checked those lncRNAs with the functionally annotated lncRNAs from lncRNAdb database. Out of the total 184 lncRNAs from lncRNAdb database (last accessed in April 2017) [
44] only nine showed to have triplex forming capability, namely; MEG3 (ENST00000453837.1)—R motif, HOTAIRM1 (ENST00000434063.3)—R motif, ATXN8OS (ENST00000414504.2)—R motif, BCYRN1 (ENST00000418539.1)—R motif, LINC00599 (ENST00000521242.1)—Y motif, OTX2-AS1 (ENST00000534909.2)—Y motif, TINCR (ENST00000448587.1)—Y motif, SNHG16 (ENST00000448136.1)—M motif, NEAT1 (ENST00000501122.2)—M motif, However, detailed functional analyses of many such genes are needed in order to derive at a clearer picture of the roles of lncRNAs. The motif sequence for the previously known and unknown triplex forming lncRNA categorized into the specific type of motif they form is given in Table
3.
Table 3
Classification of previously known and unknown lncRNA into three types of motifs
R | | 10 | MEG3 [K] | 3 |
| 3 | HOTAIRM1 [UK] | 6 |
| ~1 | ATXN8OS [UK] | 10 |
| BCYRN1 [UK] | 6 |
Y | | ~1 | TINCR [UK] | 32 |
OTX2-AS1 [UK] | 2825 |
LINC00599 [UK] | 37 |
M | – | – | NEAT1 [UK] | 596 |
SNHG16 [UK] | 204 |
SCARNA9 [UK] | 63 |
FMR1-AS1 [UK] | 118 |
Computationally, we found promoter region to have more putative triplex sites and we even proceeded to validate few of the promoter interacting lncRNA for its triplex forming capability at target gene promoters through biophysical techniques. The target genes for example KIAA1324 has been found into correlate with survival in certain carcinomas [
46] and may be important for cellular response to stress [
47], indicating regulation of such genes in a cell by triplex structure at the promoter region through an lncRNA which could reveal functional role of an novel lncRNA and could be valuable source of information.
Our analysis suggests significant enrichment of PTS sites with specific DNA binding proteins, specifically NRSF and CTCF. Incidentally these proteins are also key components which participate in chromatin organization and regulation [
38,
39]. The transcriptional repressive protein, NRSF/REST (repressor-element-1-silencing transcription factor) is responsible for the inhibition of expression of neuron-specific genes. Guardavaccaro and group proposed that degradation of this protein in G2 phase of cell cycle is necessary to depress genes involved in mitosis. They have shown degradation of REST by ubiquitin ligase SCF β-TrCP in G2 phase allowing transcriptional depression of Mad2, an essential component of the spindle assembly checkpoint. CTCF (CCCTC-binding factor) has been previously shown to be involved in regulation of transcriptional by binding to chromatin insulators, hence preventing the direct interaction of promoter and enhancers/silencers. But recently Ong and Corces highlighted involvement of CTCF in framing boundaries across topologically associating domains in chromosomes facilitating CTCF to interact between transcription regulatory sequences. Our independent analysis of PTS sites at interaction domains suggest significant enrichment, suggesting one of the major roles of PTS forming lncRNAs could be in chromatin organisation by closely binding to CTCF/NRSF proteins.
Further studies will be necessary to completely elucidate and validate the functional interactions as predicted in our analysis. With advanced high-throughput approaches secondary structure, protein-binding motifs and other features in the primary sequence could be determined in detail, to present a global landscape of elements in lncRNAs.
Conclusions
Our study focuses on computational identification of potential triplex forming sites mediated through lncRNAs. In total, we screened 23,898 lncRNA transcripts for their PTS frequencies across five major genic features, 5′UTR, CDS, 3′UTR, Introns, Promoter and 1000 bases downstream of the transcription termination sites, showing enrichment in promoter and intronic regions. As computational analysis revealed enrichment for PTS within the gene promoter regions, henceforth we successfully validated presence of triple helical structure formed by a lncRNA with its three target gene promoter region through biophysical methods including gel retardation, UV absorbance and CD spectroscopy. In addition, we observed enrichment of CTCF and NRSF in PTS sites, these proteins are known to play crucial roles in chromatin organisation, hence we hypothesized that PTS could be playing an important role in contributing to 3D chromatin organisation and its regulation mediated through these proteins. Our present study encompasses a genome-wide distribution of PTS sites across human genome mediated through lncRNAs and their possible functional roles.
Authors’ contributions
Study concept and design: VS. Acquisition of data: SJ. Analysis and interpretation of data: SJ. Experimental design and validation: AS, SM. Statistical analysis: SJ. Drafting of the manuscript: SJ, AS. Critical revision of the manuscript for important intellectual content: SJ, VS, SM. Study supervision: VS. All authors read and approved the final manuscript.