Background
Only approximately 2% of genes in the human genome encode proteins. However, it is now widely accepted that approximately 80% of the human genome is functional, based on ENCODE data. This 80% contains regulatory elements, as well as noncoding RNA genes [
1]. The widely discovered noncoding RNAs include miRNA (microRNA), LncRNA (long noncoding RNA), and pseudogenes [
2‐
4].
Compared with widely known and studied miRNAs, the function and mechanism of LncRNAs and pseudogenes have not been elucidated [
3,
4]. LncRNA is a noncoding RNA with more than 200 nucleotides. Increasing evidence shows characteristic abnormal expression of LncRNAs in many tumours [
5]. LncRNAs regulate oncogenes and tumour suppressor genes, and thus affect the phenotype of cancer cells and biological behaviours including proliferation, differentiation, invasion, and angiogenesis [
5]. On the other hand, pseudogenes that have similar DNA sequences to coding genes lost the original functions because of mutations [
6]. A growing number of studies have shown that pseudogenes have important biological functions [
7,
8]. Pseudogenes have been described as miRNA sponges and ceRNAs (competing endogenous RNAs) to regulate other genes. There are likely many additional currently unexplored mechanisms by which pseudogenes act [
9]. Pseudogenes also induce endogenous small interfering RNAs to inhibit the expression of functional genes [
4].
The role of LncRNAs and pseudogenes in renal cell carcinoma (RCC) has been reported but not yet fully elucidated [
9]. RCC is the most common primary renal neoplasm. Worldwide studies have indicated an increasing incidence and mortality of RCC [
10]. Approximately one-third of RCC patients present with advanced cancer at the time of diagnosis, and almost half of patients will develop RCC with metastasis [
11]. In addition, patients with advanced RCC have poor prognosis, as RCC has shown resistance to chemotherapy and radiotherapy [
12]. Thus far, valuable molecular markers of RCC for early diagnosis and prognosis are still controversial [
13]. Thus, it is essential to have better understanding of RCC and develop new molecular markers.
In the present study, we analysed the months survival (MS) and months disease-free (MDF) of RCC combined with alterations of LncRNAs and pseudogenes. We also identified signatures of LncRNAs and pseudogenes and investigated how we can benefit from the signatures based on the data in the cBioPortal database [
14]. We then validated the relative levels of these LncRNAs and pseudogenes in the serum of 32 patients. Our findings suggest that 6 of these can be non-invasive biomarkers of RCC. Among all the genes, PIK3CD-AS1 is the only one that is closely related to all of the important clinical features. We also found that PIK3CD-AS1 may promote metastasis based on characteristics of PIK3CD-AS1 in RCC.
Discussion
Few LncRNAs and pseudogenes are characterized, although increasing numbers of them have been identified. In addition, few are reported to be included in the signatures regarding the diagnosis and prognosis of RCC. We took advantage of the provisional database of cBioPortal, which includes the data for LncRNAs and pseudogenes, as well as clinical features.
The provisional dataset of RCC includes 538 cases and provides mRNA data in 534 cases, as well as complete data in 446 cases. In all cases, alterations of 2553 LncRNAs and 8901 pseudogenes, including mutation, copy number alteration and expressions, were investigated. We then found that some of the LncRNAs and pseudogenes were closely related to survival and recurrence. Among them, we included a few genes in the signatures based on the Cox model. These signatures are also characterized. First, all genes in the signature can separately predict the survival and recurrence of RCC; the signatures that combined the genes are considered to be of higher accuracy based on the P-values. Second, these signatures are based on the numerous sample dataset, as we mentioned before. Third, we have different signatures of LncRNAs and pseudogenes in prediction of overall survival and recurrence. Thus, these suggest that the signatures might work as potential prognostic markers and are worth further investigation.
Molecular biomarkers are currently investigated in RCC, and biomarkers for the therapy have not yet been clarified [
19]. Previous studies focused on VEGF and cytokines. For example, clinical research of sorafenib suggested that VEGF works as an important molecular marker for progression-free survival and overall survival in advanced RCC cases [
20]. It is reported that patients have a better prognosis if they have lower expression of interleukin 6 and hepatocyte growth factor [
21]. On the other hand, another study showed the limits of cytokines in RCC [
22]. For the other biomarkers, high levels of HIF-2a alone may indicate resistance to most of the targeted therapies [
23].
Currently, increasingly more LncRNAs and pseudogenes are uncovered to be prognostic markers in human cancers. For instance, increased serum MALAT1 indicated a poor prognosis in gastric cancer. Further research has confirmed that knockdown of MALAT1 inhibited cell growth and invasion [
24]. LINC01133 was considered as an inhibitor of EMT and metastasis by directly targeting SRSF6. Based on clinical study, LINC01133 may be a valuable biomarker and a therapy target worth further investigation [
25]. On the other hand, increasing research suggests that pseudogenes play important roles in the pathogenesis and progression of cancer. Chen X uncovered the role of pseudogene CTNNAP1 and its cognate gene CTNNA1 in colorectal cancer [
26]. Researchers in another study found that they benefited from INTS6P1 in plasma when identifying and screening hepatocellular carcinoma (HCC). Lower expression of plasma INTS6P1 was revealed in HCC. The authors suggested that INTS6P1 might be a valuable biomarker in HCC if the AFP were lower than 20 ng/ml [
27].
Of note, in this study, we provide a number of LncRNAs and pseudogenes that can predict not only MS but also MDF. Based on the features of massive clinical cases from cBioPortal, 27 LncRNAs and 45 pseudogenes were selected after screening the entire database. They appeared to be closely related to both months survival and months disease free. Thus, these LncRNAs and pseudogenes are thought to be valuable prognostic markers in RCC, as alterations in them were determined in massive clinical samples. We also studied other clinical features besides prognosis. We focused on metastasis, pathologic tumour stage and tumour pathologic PT. According to this analysis, some genes were confirmed to be closely related to one of the three clinical features. Interestingly, PIK3CD-AS1 was selected, as it is the only one that is related to all three clinical features. PIK3CD-AS1 might be a promising LncRNA in RCC, as upregulation of PIK3CD-AS1 might increase the invasion ability and be related to poor prognosis. In addition, PIK3CD-AS1 might be involved in multiple biological processes, including P53 signalling, PI3K/AKT/mTOR pathway, and RAS/RAF/MEK/ERK signalling. Thus, this analysis provides us new insights into the mechanism of PIK3CD-AS1-related poor prognosis in RCC. We can begin with these signalling pathways before learning more details of the mechanism.
We defined four different signatures of LncRNAs and pseudogenes, which separately predict MS and MDF. Although these signatures of LncRNAs and pseudogenes in RCC have not been validated, their associations with cancer death or recurrence are clear. We input the serum-circulating LncRNA signature in this database and found that it was not related to MDF. Several possibilities may contribute to the conflicting results. First, this LncRNA signature is based on the serum samples rather than tissues, as in our signatures. Second, as long as any of the LncRNAs not related to MDF was added into the signature, it will significantly decrease the prediction ability. Finally, this signature was set to discriminate clear cell RCC (ccRCC) patients and healthy controls. There is not enough data and analysis to support the association with MS and MDF. Other signatures, including miRNA and coding genes, were also analysed in the database. Although they might work as predictions of MS and MDF, we found more benefit in our signatures of LncRNA and pseudogenes. First, the P-values of our signature were much lower than the others, suggesting that our signatures are more dependable. Second, the miRNA alterations in the signature were at a lower level; thus, the miRNA signature might be difficult to detect in most clinical cases, which hardly leads to an effective diagnosis and analysis. Third, the pseudogene signatures, which never been reported before, might introduce new methods to diagnose RCC by detecting them in the serum and urine. Thus, we further determined the level of these LncRNAs and pseudogenes in the serum. Although only a few of them were detectable in the serum and found to be significantly different, this result is of great interest based on the potential clinical roles of these LncRNA and pseudogene signatures. Therefore, the increased level of six LncRNAs and pseudogenes suggested a novel, effective, non-invasive method to diagnosis RCC.
Increasing evidence suggests that pseudogenes play important roles in cancers. For instance, alterations in the expression of OCT4 pseudogenes (OCT4-pg) in different cancers and pluripotent cell lines were observed [
28]. In 2007, Lin [
29] found that OCT4-pg could inhibit the growth and differentiation of mesenchymal stem cells. In human glioma and breast cancers, expression of OCT4-pg was not observed; however, expression of oct4-pg was confirmed and important roles were uncovered [
7,
8,
30]. Kastler found that the pseudogene oct4-pg1 was a member of the Oct4 family and the only one that was expressed in prostate cancer cells. In addition, oct4-pg1, which encodes a protein containing 359 amino acids, maintains the unlimited proliferation and self-renewal of cancer cells. Pseudogenes regulate the expression of functional genes by competitive binding with miRNA to inhibit or promote the occurrence of cancer. For instance, pseudogene TUSC2p1 protects the expression of tumour suppressor gene TUSC2 by competitive binding with miRNA, and thus inhibits the proliferation of breast cancer cells [
31].
In summary, this study provides a valuable solution for screening, considering increasing numbers of LncRNAs and pseudogenes. With this public dataset including vast clinical features, researchers can easily identify the LncRNAs and pseudogenes closely related to overall survival and disease-free months. Thus, researchers can focus on a few LncRNAs and pseudogenes with valuable clinical significance. The signatures that we found based on this dataset provide new insights into the diagnosis and prognosis of RCC. Finally, given that PIK3CD-AS1 is related to all three clinical features, we expect that it may be a special target of therapy in RCC.
Authors’ contributions
BC, WH and TG drafted and revised the paper. CW and JZ developed the design, analysed the data and drafted the paper. YZ was involved in the analysis and interpretation of data. All authors read and approved the final manuscript