Multiple lncRNAs increase invasiveness of cancer cells and facilitate metastasis. Examples of these include h19 [
114], MALAT1 in colorectal and nasopharyngeal carcinoma [
115], SPRY4-IT1 in melanoma [
106], HOTAIR [
116], AFAP1-AS1 [
117], and CCAT2 [
118] in lung cancer, lincRNA-RoR in breast cancer [
119], LEIGC in gastric cancer [
120] and lncRNA-ATB in hepatocellular carcinoma [
121]. Out of these, only lincRNA-RoR and lncRNAs-ATB have a suggested mechanism of action in tissue invasion. lincRNA-RoR likely serves as a “sponge” for miR-145 that is important for regulation of ADP-ribosylation factor 6, a protein involved in invasion of breast cancer cells [
119]. Similarly, lncRNA-ATB, acts as a ceRNA to reduce the effect of the miR-200 family targets ZEB1 and ZEB2, two transcription factors that promote cell motility and metastasis [
121].
lncRNAs can be involved in a number of other processes related to cancer. Some lncRNAs promote a metabolic switch to glycolysis and lactic acid fermentation termed the Warburg effect [
122]. lincRNA-p21 regulates the Warburg effect by preventing ubiquitination of hypoxia-inducible factor-1 (HIF-1), a key transcription factor that promotes upregulation of glycolysis and downregulation of oxidative phosphorylation [
123]. Several lncRNAs have been observed as essential for DNA repair by homologous recombination (HR): ANRIL, PCAT1 and DDSR1. Although the mechanism of ANRIL in HR remains unknown, PCAT1 posttranslationally inhibits BRCA2 [
124], while DDSR1 is suggested to interact with BRCA1 [
125]. Finally, there are implications of lncRNAs on cancer therapies through expression of drug exporters. For example, MRUL promotes expression of ABCB1 that is essential for multidrug-resistance in gastric cancer cell lines [
126].
Novel techniques for lncRNA interrogation
The number of annotated transcribed genomic elements has increased by 100 % in the last decade, the majority of which are in the non-coding space and have a defined function in less than 1 % of cases [
56]. Such a vast number of novel genetic players presents a great potential for clinical applications, especially in view of cancer as a genomic disease. However, it also requires a thorough rethinking of our basic premises on biological systems, pathway structure and information transfer, as well as a clear technological strategy to identification of their function.
The first challenge is presented by the lack of an exhaustive definition of the full cancer transcriptome, regardless of the cell or tissue type. Currently, a major obstacle to analysis of cancer transcriptome is the alignment of sequence reads to the consensus human genome. Ideally, all the reads would be aligned to a genome sequenced by single-molecule DNA sequencing, but the cost and the quality of this technology are still keeping it away from mainstream research. The next issue is the limited dynamic range of transcript detection for RNA sequencing. This can already be solved by applying the recently developed CaptureSeq method for targeted enrichment of transcripts from specific regions of interest [
41]. Furthermore, long read sequencing will be essential for discovery of lncRNAs isoforms and novel exons [
127]. In combination with single cell sequencing it will allow identification of individual lncRNAs species from cancer subpopulations, avoiding the heterogeneity of tissue mixture.
After defining the non-coding elements of the transcriptome, the second challenge is the systematic identification of lncRNAs properties that could lead to identifying their cellular function. This can be achieved by investigating their location in the cellular compartments, structural properties as well as possible interactors.
Quantified localization of lncRNAs through microscopy techniques can provide important information about their properties. RNA-Fish as an established technique for RNA localisation has recently been used to identify subcellular location of multiple lncRNAs, in addition to their expression across a population of cells, spatio-temporal behaviour and coexpression with proximal mRNAs [
128].
Structure of biological molecules is vital to their function, and several techniques have been developed to investigate secondary and tertiary structures of lncRNAs. Techniques such as Parallel Analysis of RNA Structure (PARS) [
129] and Fragmentation Sequencing (FragSeq) [
130] sequence RNAs after specific cleavage of single (FragSeq) or single and double stranded (PARS) nucleic acids, allowing for identification of loops in RNA-structure. Another way to investigate structure is to tag the flexible 2′-hydroxyl groups in the RNA backbone by Selective 2′-hydroxyl Acylation and Primer Extension (SHAPE) [
131]. Finally, similarly to DNA, RNA can be edited with chemical modifications that modify its structure and binding properties. Two established methods can be used to identify methylated RNA sites: Methylated RNA Immunoprecipitation with next-generation sequencing (MeRIP) [
132], or its adaption for hydroxymethylcytosine sites – hMeRIP [
133]. Another common RNA-modification is chemical change of nucleotides adenosin to inosin, which can be detected by inosine chemical erasing sequencing (ICE-seq) [
134].
Assessing the function of lncRNAs by identifying their binding partners can be performed depending on the type of interaction. Binding of RNA to DNA or proteins can be assessed with ChIRP-seq or ChIRP-MS respectively (Chromatin Isolation by RNA purification followed by sequencing or mass spectrometry) [
135,
136]. The specificity of ChIRP is guaranteed by selection of only those RNA that are bound by biotinylated oligonucleotides, similar to RAP [
137] and CHART [
138], as well as by crosslinking of RNA with DNA or protein by UV or formalin. A recent modification to the protocol can detect individual RNA domains that interact with DNA, RNA or proteins [
139]. Instead of biotinylated oligonucleotides, RNA-guided chromatin conformation capture (R3C) reverse-transcribes RNA bound to DNA into cDNA with biotin labelling and joins it with the adjacent genomic DNA with T4 DNA ligase, allowing for streptavidin selection and sequencing [
140]. Furthermore, identification of lncRNAs that bind to a protein of interest such as PRC2 [
141] can be performed through RNA Immunoprecipitation [
142] that was later coupled with sequencing (RIP-Seq) [
65]. The specificity of RIP has been improved in by UV crosslinking of RNA and protein in Cross-Linking ImmunoPrecipitation (CLIP) [
143] and the later modifications with sequencing (HITS-CLIP) [
144] and iClip [
145]. Finally, the affinity of a protein for multiple RNA can be assessed in a high-throughput manner. This can be achieved either on a microfluidic platform by RNA-mechanically Induced Trapping of Molecular Interactions (RNA-MITOMI) [
146], or on a flow cell in RNA-MaP (massively parallel array) [
147].
The final challenge in defining lncRNA functions is developing loss- and gain-of-function lncRNA studies. The RNA interference technology is being supplemented by the powerful CRISPR/Cas-9 system, a newly developed genome-editing technology that allows easier manipulation of lncRNAs behaviour [
148]. CRISPR allows multiple types of manipulation, from deletion of various parts of genomic lncRNAs loci, to insertion of promoters, and novel exons. A recent modification of the CRISPR technique that was developed in Rinn group allows insertion of RNA domains to genomic loci, allowing for identification of in cis behaviour of lncRNAs [
149].