Background
Acute myeloid leukemia (AML) is a heterogeneous disease on both the molecular- and phenotypic level, caused by malignant transformation of hematopoietic progenitor cells. During pre-leukemic evolution and disease progression, affected hematopoietic cells gradually accumulate a range of molecular alterations, including somatic mutations, cytogenetic abnormalities, epigenetic alterations, and transcriptomic changes [
1,
2]. Numerous recurrent point mutations, epigenetic changes, and cytogenetic abnormalities have been identified through next generation sequencing technology [
1,
3]. Cytogenetics together with mutation status of
NPM1,
CEBPA, and
FLT3 internal tandem duplications (FLT3-ITD) form the basis of the European LeukemiaNet (ELN) risk classification system [
4], which provides means for risk stratification of AML patients. However, almost half of patients are classified into the intermediate risk group. Further improvements of the risk stratification of AML patients would provide the potential for improved therapy decisions.
LncRNAs are defined as RNA molecules longer than 200 nucleotides that are transcribed while not protein coding. It has been estimated that more than 58,000 lncRNAs are encoded in the human genome [
5,
6]. LncRNAs are involved in a multitude of biological processes that are central in tumorigenesis and progression of cancer, including cell cycle regulation, proliferation, apoptosis, migration, and genomic stability [
5,
7]. LncRNAs have multiple modes of action, including involvement in controlling chromatin condensation, regulation of transcription, regulation of RNA splicing, controlling RNA stability, and promoting or inhibiting translation of mRNAs to proteins [
8].
Most large-scale genomic analyses of cancer patient data have focused on the protein coding region of the genome. However, estimates from the ENCODE study suggest that up to 75% of the human genome gets transcribed into RNA, whereas only about 3% of the human genome is protein coding [
9,
10]. LncRNAs are a group of non-coding RNAs that have several recent discoveries linked to cancer [
11‐
13]. For example,
HOX transcript antisense intergenic RNA (HOTAIR) is known to act as an epigenetic regulator in breast and colorectal cancer [
14‐
16]. Several other lncRNAs are known to play a functional role as oncogenes or tumor suppressors and have clear prognostic potential [
14,
17]. Multiple studies have highlighted the role of lncRNA in hematopoietic cellular development and malignancies. In T cell acute lymphoblastic leukemia (T-ALL), the lncRNA LUNAR1 (leukemia-induced non-coding activator RNA) promotes cell growth via enhanced
IGF1R expression [
18]. The IRAIN lncRNA, located within
IGF1R locus, directly interacts with the
IGF1R promotor [
19]. IRAIN is shown to be downregulated in leukemia cell lines and in high-risk AML patients. Garzon et al. [
7] have previously reported lncRNA expression results from a study consisting of cytogenetically normal acute myeloid leukemia (CN-AML) patients using a custom microarray platform for lncRNA expression profiling, with a focus on assessing association with routine clinical phenotypes and mutations. In that study, lncRNAs were reported to be associated with recurrent mutations in several genes in CN-AML patients, including
NPM1,
CEBPA,
IDH2,
ASXL1, and
RUNX1, and FLT3-ITD [
7,
20]. LncRNA expression has previously also been shown to be associated with treatment response and survival in several other cancer types [
5,
21‐
23].
Despite growing evidence for the potential importance of lncRNAs as prognostic and diagnostic markers across a multitude of cancers, including AML, lncRNA expression in AML has not been comprehensively characterized to date with a focus on ascertaining the potential presence of prognostic lncRNA-based AML subtypes. In this study, we applied whole-transcriptome RNA-sequencing (RNA-seq) with the aim to identify prognostic lncRNAs, to define novel lncRNA-based AML subtypes and to ascertain their prognostic value and relevance for risk stratification of AML patients. Furthermore, novel lncRNA expression-based subtypes were validated in independent patient cohort.
Discussion
The present study is the most comprehensive lncRNA expression study in AML to date. We characterized lncRNA expression using RNA sequencing in a cohort of 274 AML patients (data included in Additional file
6) with the aim to determine if individual lncRNAs were associated with AML outcome and if lncRNA-based prognostic subtypes of AML could be defined. The findings were subsequently validated in the independent TCGA-AML cohort (Additional file
7).
In the Clinseq-AML cohort, 33 individual lncRNAs were found to have independent prognostic information and four robust lncRNA-based subtypes of AML were discovered that are prognostic of overall survival. Some of the established clinical and genetic factors of AML were found to be associated with the lncRNA expression subtypes, although subtypes did not display a high degree of concordance with any of the clinical or genetic factors. Similarly, lncRNA-based subtypes were not found to be concordant with mRNA-based subtypes, suggesting that lncRNA expression represents an independent source of molecular information. Subtype G1 was characterized by displaying the longest overall survival. This group is also dominated by intermediate level of ELN risk and normal karyotypes. It also harbors high frequency of CEBPA double mutations. In de novo AML, CEBPA double mutations are known to have a favorable prognostic significance [
27,
28]. Subtypes G2 and G3 represent prognostically poorer AML subtypes. Both of these subtypes have a high frequency of patients with intermediate risk level based on ELN risk classification. In comparison to subtype G1, they possess more cytogenetic abnormalities. Subtype G4 represents a group of AML patients with poor prognosis, with the highest frequency of TP53 single and double mutations. When ascertaining the independent prognostic value of lncRNA subtypes, given ELN risk classification (which includes cytogenetic classification), and genetic mutations, the lncRNA subtype model was confirmed to provide a significant prognostic value. We have also developed a subtype prediction biomarker panel consisting of 35 lncRNAs (Additional file
2), which provided equivalent classification as the full set of lncRNA features considered in this study and could be seen as a candidate biomarker panel for lncRNA-based subtyping in AML.
We have validated our lncRNA expression-based subtype model in independent TCGA-AML cohort. Our results show that similar to Clinseq-AML cohort, in the TCGA-AML cohort, the lncRNA-based subtypes are significantly associated with overall survival. In particular, it is evident that subtype G1 is associated with more favorable outcome and subtype G4 indicates worse outcome. These associations are evident in both the cohort even after adjusting for known prognostic factors through multivariate analysis.
Both Clinseq-AML and TCGA-AML cohorts have similar percentage of cytogenetically normal patients, 47.4 and 45.1% respectively. Cytogenetic abnormalities, such as del7 (9.9% in Clinseq-AML, 9.9% in TCGA-AML) and del5 (6.2% in Clinseq-AML, 5.6% in TCGA-AML), have very similar distribution in both the cohorts. However, frequency of recurrent genetic abnormalities such as inv(16) (3.3% in Clinseq-AML, 7.7% in TCGA-AML) and inv(3) (1.8% in Clinseq-AML, 0% in TCGA-AML) are not similar. Interestingly, the Clinseq-AML cohort contains both de novo and non-de novo AML patients; however, the TCGA-AML cohort is completely comprised of de novo AML cases. We performed differential gene expression analysis between de novo and non-de novo samples in the Clinseq-AML cohort (Additional file
8). However, we did not find any significant difference in lncRNA expression pattern between de novo and non-de novo AML as no lncRNA is significantly differentially expressed (fdr < 0.05).
We would like to stress the fact that there are several differences between the Clinseq and TCGA cohort such as difference is sequencing protocol, batch effect, and frequency of recurrent genetic abnormalities, as discussed above. Our analysis shows that despite the various sources of heterogeneity and cohort differences, lncRNA expression-based subtypes are consistent and have significant association with survival. Previously, Garzon et al. [
7] studied lncRNA expression in cytogenetically normal acute myeloid leukemia (CN-AML) patients using a custom microarray platform with a focus on assessing lncRNAs association with routine clinical phenotypes and mutations. In contrast, present study contains a more representative set of AML patients and ascertains the presence of lncRNA-based molecular subtypes in AML. Furthermore, the present study is almost twice in compared to the previously published results [
7], which only include CN-AML patients. We also note that RNA sequencing, which is employed here, provide an unbiased and comprehensive approach to lncRNA profiling compared to targeted microarray-based expression profiling which may be limited by selection bias during design of the array. Despite such differences, similar to Garzon et al. [
7], our results show that pathways such as mRNA processing, immune system process, and chromosome organization are enriched in lncRNA subtypes G1, G3, and G4 respectively (Fig.
6 and Additional file
3).
We have also compared lncRNA expression-based subtypes with mRNA expression-based subtypes (C1 to C7). The mRNA subtypes were generated using the same methodology as lncRNA expression-based subtypes (for details, see Additional file
5). Our analysis shows that lncRNA-based subtypes are not directly correlated with mRNA-based subtypes and lncRNA subtypes provide independent prognostic information.
Although the present study is the largest lncRNA expression study reported to date, the sample size in this study might represent a limiting factor to establishing potential additional lncRNA subtypes that are rare (i.e., present in a low proportion of AML patients), since there would be too few principal examples present in this cohort. Furthermore, the RNAseq-based lncRNA profiling method applied in this study has limitations in quantifying lncRNA molecules at very low abundances. These limitations can be overcome by using a larger sample size and deeper sequencing technology.