Background
Leukaemia, a cancer of the blood cells, is the most common type of cancer in children and adolescents. The leukemic cells arise from the abnormal and clonal proliferation of hematopoietic progenitor cells, leading to disruption of normal marrow function and determining haematopoiesis failure. In addition, the leukemic cells rapidly move through the bloodstream and crowd out healthy blood cells, increasing the body's chances of infection and other complications. There are two main subtypes of paediatric acute leukaemia: the commoner acute lymphoblastic leukaemia (ALL) and the rarer acute myeloid leukaemia (AML) [
1]. ALL arises from the malignant transformation and aberrant proliferation of B cell progenitors (about 85% of cases) or T cell progenitors (about 15% of cases). B-cell acute lymphoblastic leukaemia (B-ALL) is the most common form of ALL, associated with distinct gene expression profiles and driven by three main types of initiating genetic alteration: (i) chromosomal aneuploidy; (ii) rearrangements that deregulate oncogenes or encode chimeric transcription factors, and (iii) point mutations [
2]. T-cell acute lymphoblastic leukaemia (T-ALL) is less frequent than B-ALL and has a worse prognosis. Indeed, although current chemotherapy protocols and stem cell transplantation have achieved good results, T-ALL paediatric patients have a poor prognosis: about 20–30% of patients relapse, with a 5-year survival of approximately 20% for T-ALL patients [
3,
4]. Childhood T-ALL is featured by recurrent alterations mostly deregulating three pathways: (i) expression of T-lineage transcription factors, (ii) NOTCH1/MYC signalling, and (iii) cell-cycle control [
5,
6]. The etiopathogenic mechanisms leading to leukemic transformation are still largely unknown, but genetic, immunologic, viral, and environmental factors have been implicated [
7‐
9]. Today, the classification of ALL into risk groups is based on the assessment of minimal residual disease assessed by molecular biology and cytometry during treatment combined with the analysis of poor prognosis genetic aberrations (e.g., t(4;11), t(17;19) etc.) [
10].
Childhood AML is a more heterogeneous disease associated with poor outcomes. It is characterised by immature clonal myeloid cells’ proliferation and aberrant differentiation [
11]. This hematologic malignancy encloses a wide spectrum of genomic insults and molecular alterations that influence clinical outcomes and provide potential targets for personalised therapy [
12]. In ALL and AML, the classification into risk groups is the first and crucial step towards tailored patient management and facilitates a targeted approach with the most appropriate therapeutic treatment. The last decade has witnessed great advances in our understanding of the genetic and biological basis of childhood acute leukaemia, the improvement of experimental models to probe mechanisms and evaluate new therapies, and the development of more efficacious treatment stratification such as the recently introduced molecularly targeted therapy and immunotherapy [
13,
14]. The onset of high-throughput sequencing and bioinformatic approaches have revolutionised our understanding of the molecular taxonomy of childhood leukaemia [
15]. These modern applications of next-generation sequencing (NGS) technology have uncovered considerable heterogeneity and molecular complexity within this paediatric haematological disease, based on the interplay of genomic mutations, epigenetic remodelling, transcriptome misregulation, and aberrant cell signalling and proliferation pathways [
16]. Many of these alterations may have important implications for the diagnosis and risk-stratification, highlighting the importance of implementing genome and transcriptome characterization in the clinical management of acute leukaemia to facilitate more accurate risk-stratification and, in some cases, targeted therapy.
The recent transcriptome-wide gene expression studies not only characterised the mRNA misregulation of ALL resulting from aberrant functioning of transcription factors, epigenetic rearrangements, structural variants, or chromosome mutations [
17], but they have also uncovered evidence of significant relationships between lncRNAs dysregulation and malignant hematopoietic transformation, with specific lncRNAs gaining interest as diagnostic biomarkers, novel therapeutic targets, and predictors of clinical outcomes [
18,
19]. LncRNAs are transcripts usually longer than 200 bp and lacking an open reading frame. They can alter gene expression by acting on different steps of regulation, including chromatin modification, transcription, splicing, RNA transport, and translation [
20,
21]. However, the precise role that lncRNA expression plays in the pathogenesis of paediatric ALL has been scarcely studied and even less understood.
Here we want to present the transcriptome-wide analysis of polyadenylated long non-coding RNA profiles in B-ALL and T-ALL cases matched with a control population composed of normal cord blood-derived T cells and B-cells. A specific lncRNA signature was identified to distinguish leukemic B- and T-ALL, normal lymphoid B and T cells, and AML.
Methods
Study population
The procedures followed in the present study are in line with the Helsinki declaration and have been approved by the local ethical committees of the IRCCS-SDN (Ethical Committee IRCCS Pascale, Naples, Italy—protocol number 5/19 of the 19/06/2019) and the AORN Santobono-Pausilipon (Ethical Committee Cardarelli/Pausilion, Naples Italy—protocol number 07/20 of 03/06/2020). Both parents signed informed consent and all participants provided informed assents. All children enrolled in the study were included at moment of diagnosis, patients’ clinical features are presented in Tables
1 and
2 and Additional file
1: Dataset S1 (B-ALL patients) and in Tables
3 and
4 and Additional file
2: Dataset S2 (T-ALL patients).
Table 1
Clinical information of B-ALL patients used for RNA-seq experiment
F | 12 | Caucasian | 1750 |
F | 6 | Caucasian | 15,200 |
F | 4 | Caucasian | 29,820 |
F | 7 | Caucasian | 518,000 |
F | 14 | Caucasian | 16,570 |
F | 3 | Caucasian | 5960 |
M | 15 | Caucasian | 192,900 |
M | 17 | Caucasian | 131,000 |
M | 5 | Caucasian | 4500 |
Table 2
Clinical information of B-ALL patients used for validation experiments
M | 5 | Caucasian | 1260 |
F | 6 | Caucasian | 1370 |
F | 3 | Caucasian | 5160 |
M | 2 | Caucasian | 18,650 |
M | 3 | Caucasian | 67,020 |
M | 17 | Caucasian | 131,000 |
M | 5 | Caucasian | 4500 |
M | 10 | Caucasian | 31,520 |
M | 3 | Caucasian | 11,940 |
F | 4 | Caucasian | 1700 |
Table 3
Clinical information of T-ALL patients used for RNA-seq experiment
M | 8 | Caucasian | 368,120 |
F | 6 | Caucasian | 212,850 |
M | 2 | Caucasian | 500,000 |
M | 8 | Caucasian | 447,000 |
M | 17 | Asiatic | 303,440 |
F | 4 | Syrian | 583,000 |
Table 4
Clinical information of T-ALL patients used for validation experiment
F | 16 | Caucasian | 1960 |
M | 9 | Caucasian | 6280 |
M | 2 | Caucasian | 500,000 |
M | 7 | Maroccan | 52,840 |
M | 16 | Caucasian | 120,700 |
M | 0 | Caucasian | 177,000 |
M | 10 | Caucasian | 447,000 |
M | 13 | Caucasian | 30,970 |
M | 9 | Caucasian | 16,280 |
F | 11 | Caucasian | 262,080 |
RNA sequencing
Total RNA was extracted from leukemic cells derived from Bone Marrow blood of B-ALL and T-ALL patients and purified B lymphocytes and T lymphocytes from cord blood of healthy donor using Trizol (Thermo Fischer Scientific, Waltham, MA, USA) reagent protocol, according to manufacturer instructions. RNA concentration and quality were determined using Qubit (ThermoFisher Scientific, MA, USA) spectrophotometer. RNA-seq libraries were prepared with 3′-DGE approach and sequenced SEx100 on an Illumina Novaseq platform.
FASTQ files were aligned with STAR v. 2.7.1a [
22] on the GRCh38 human genome. Raw counts were obtained using HTSeq v. 2.0.0 [
23]. Normalisation and differential expression analysis were performed with DESeq2 v. 1.36.0 [
24]. LncRNA annotations were done with Biomart v. 2.52.0 [
25]. Hierarchical clustering and heatmap representations were performed as in Buono et al. [
26]. Functional enrichment analysis of lncRNA clusters were analysed with gProfiler [
27]). Expression correlations were calculated with “pearson” method in R. Statistical significance of gene overlapping were calculated using Fisher’s exact test through the R package GeneOverlap v. 0.99.0. GO enrichment analyses for correlated genes were performed with EnrichR [
28] using as input the list of all the positively correlated genes with p-value < 0.001.
Real time PCR analyses
Total RNA was extracted from B-ALL, T-ALL and PBMCs derived from cord blood using the Trizol Reagent protocol. After extraction, RNA was quantified using NanoPhotometer NP80 (Implen, USA). Next, 1 µg of total RNA from each sample was reverted in cDNA using SuperScript III First-Strand Synthesis SuperMix kit (Thermo Fisher Scientific) according to the manufacturer’s protocol. The expression level of selected lncRNAs was measured by qRT-PCR using the following formula: 2-∆Ct on C1000 Touch Thermal Cycler (Bio-Rad, Hercules, CA, USA) using iQ SYBR Green Supermix (#1708882, Bio-Rad). Ribosomal Protein S18 (RPS18) level was used as an endogenous control to normalize lncRNAs expression. The following primers were used:
RPS18: fw 5′ - CGATGGGCGGCGGAAAATA-3′; rev 5′—CTGCTTTCCTCAACACCACA-3′
LINC00958: fw 5′ -TGCAGCAAGATAGCTCCAGG-3′; rev 5′- CCTGGCGTCTGTGTAGTGTT-3′
LINC00114: fw 5′- TAGAGGCCTGATGGAGTGGA-3′; rev 5′- CTGCCCAGGAAACTGTAGGT-3′
AL713998: fw 5′- AACATTTGGTGCCGAAAGCC-3′; rev 5′- GCGAGGGAAGTCTCTTGCAT-3′
AC008060: fw 5′- CGAGGCTTGGACAAATGCAG-3′; rev 5′- CAGTCCCAAAGGAAGCGGAT-3′
AL590226: fw 5′- GAATCCACAGATGGCGTGTG-3′; rev 5′- TCAGGTAGCTGCGAGTTCAA-3′
PCAT18: fw 5′- GTC CCA GCA CTT CAC TGG TT-3′; rev 5′- AGC TGG GAT ATG GTA GCA GC-3′
HHIP-AS1: fw 5′-TCA CAC CAC CAC TGA GCA AC-3′; rev 5′- AGC TCT GCT TGG TGA ATG GA-3′
AC247036: fw 5′- TGT CCT GTG GTG GGA AAA ACA-3′; rev 5′- ACC CGG GAG TCA TCT GAA CA-3′
LINC01222: fw 5′- AGCAGGGGTAACATTATGGGC-3′; rev 5′-AGC TGC TCC CCC TTT ATC TTC-3′
AC116351.1: fw 5′- TGGAAAGTCCAGCGACAGAC-3′; rev 5′: GTCTCCCTTCACAGTGGCAA - 3′
Discussion
Nowadays, thanks to improvements in diagnostics and treatment protocols, the outcome for paediatric patients with acute leukaemia is quite favourable. Especially in the case of B-ALL, about 80% of children go through a full recovery. However, there are cases of relapses in which standard therapies are ineffective, leading to a poor prognosis. On the other hand, paediatric T-cell leukaemias often have a poorer prognosis due to their aggressiveness and resistance to many standard treatments [
36]. It is indeed crucial to identify novel targets of the paediatric leukaemia to allow an accurate and timely choice of the treatment protocol most appropriate for the patient's clinical situation. This could be a tricky decision to make since childhood leukaemias are heterogeneous diseases. The advent of NGS has strongly boosted the identification of new biomarkers for use in diagnosis and/or therapy. Yet, these advances have been rapid but uneven. While some aspects have been studied in detail, such as cell surface protein and protein-coding genes that could be targeted in therapy protocols), other aspects have been overlooked, such as the molecular non-coding footprint underlying the disease. Our work aims to contribute to bridge this gap and finely characterise the lncRNA landscape of paediatric acute leukaemias. LncRNAs are a class of biomarkers of crescent interest in the haematologic and oncologic field [
19,
29‐
31]. They do not encode proteins and have been reported by several studies to modulate gene expression at the transcriptional, post-transcriptional, and epigenetic levels [
20]. In particular, due to their involvement in vital oncogenic processes such as differentiation, proliferation, migration, angiogenesis, and apoptosis, lncRNAs have attracted much attention as potential diagnostic and prognostic biomarkers in leukaemia [
40,
41].
Starting from NGS transcriptome analyses of B-ALL and T-ALL patients in comparison with B and T lymphocytes from cord blood, we identified a specific lncRNAs signature able to discriminate B-ALL and T-ALL not only from healthy subjects but also between the two types of leukaemia. We selected some candidate lncRNAs that have never been associated with ALL and tested their expression in a larger cohort of patients. For most of them, this experiment confirmed the expression absence in the healthy patient and a significant upregulation in a specific type of ALL, hinting at a potential diagnostic application in clinical practice. Further, we found a significant negative correlation of AC247036.1 with WBC at diagnosis, that is historically considered a risk factor for treatment failure [
42‐
44]. This data highlighted AC247036.1 as a possible favourable prognostic factor for T-ALL treatment success.
Furthermore, we showed that the lncRNA landscape is specific not only for the two paediatric lymphoblastic leukaemias (B-ALL and T-ALL) but also for myeloid ones. Interestingly, the T-ALL lncRNA signature is somewhat more related to AML than B-ALL, despite the great etiopathological difference between the two diseases. This finding was unexpected. However, it is important to consider that both T-ALL and AML may have common traits in the case of leukemic transformation. Specifically, in the case of AML transformation, it is more likely to find the ectopic expression of T-cell-associated antigens ( such as CD2, CD5 and CD7) than B-cell ones (CD19, CD20) [
45]. In particular, the CD7 antigen was found to be expressed in 30% of de novo AML and some authors proposed to use the ectopic expression of this antigen for planning AML blasts specific CAR-T therapy Identifying common traits between AML and T-ALL in terms of lncRNA could open a novel scenario to investigate on altered pathways leading to leukemogenesis and characteristics of aggressiveness [
46]. Our data showed a certain similarity between these two diseases also for the lncRNA landscape. To further investigate the issue, we discriminated between high CD7-content-AML and low CD7-content-AML. Highlighting these two types of AMLs in the PCA analysis with T-ALL and B-ALL, we found that even if the spatial distribution is still heterogeneous, high CD7-content-AML patients are the closest with T-ALL patients, sometimes even intermingling in the same cluster (Additional file
11: Fig. S3).
In the final part of this work, since the role of many lncRNAs involved in childhood acute lymphoblastic leukaemia is still unknown, especially in T-ALL, we performed correlation analyses to try to identify the potential role of the lncRNAs in this pathology. Our in silico analyses revealed a gene ontology enrichment in the key pathways for T-cell differentiation for those genes positively correlating with AC247036.1, suggesting its potential role in the modulation of some genes involved in these processes associated with leukemogenesis. Further, we found that lncRNA PCAT18 was associated with the expression of CD3D antigen, a well-known T-cell marker used in diagnostics for monitoring minimal residual disease by flow cytometry. This finding is new and confirmed the relationship between the PCAT18 lncRNA expression and T cell lineage commitment, however additional functional experiments are needed to evaluate the role of PCAT18 in sustaining leukemic growth. Last, our data disclose a highly significant positive correlation between the lncRNA HHIP-AS1 and its relative sense protein-coding transcript HHIP [
33]. This probably happens because HHIP-AS1 is actively transcribed from a SHH-responsive bidirectional promoter shared with the SHH signalling intermediate HHIP. In SHH-driven tumours, the knockdown of HHIP-AS1 induces mitotic spindle deregulation and the consequential reduction of tumorigenicity in vitro and in vivo [
47]
. Taken together, these data suggest HHIP-AS1 to be a suitable candidate for further functional studies to explore its possible role in enabling the pro-mitotic effects of SHH pathway activation in childhood T-ALL.
Ultimately, our work made available to the research community a comprehensive map of the lncRNA landscape of the various types of paediatric leukaemia, useful not only for diagnostic purposes but also, after appropriate ad hoc functional studies, for therapeutic purposes. However, it is important to remark the pilot nature of this study due to the reduced sample size. Further studies with a larger cohort of patients will be needed to consistently correlate the expression levels of target lncRNAs to patients' clinical information in order to disclose a possible lncRNA prognostic role contributing to risk stratification and, therefore, to an improvement in the clinical management of the paediatric patients. In this respect, it is important to note that lncRNAs are easily detectable. Their identification could be included in normal clinical practice strengthening the diagnostic process and improving paediatric patient management. Increasing the patient cohort could help to correlate the expression of specific candidate lncRNAs identified by our study with clinical information, testing their potential prognostic effectiveness in stratifying patients according to their clinical characteristics. This aspect made our study an important resource for the scientific community, laying the foundations for future functional and clinical studies.
Conclusion
In conclusion, here we presented an extended analysis of the lncRNA profile for B-ALL, T-ALL as well as cord blood-derived T and B cells. Specific lncRNA signatures were detectable in the case of B-ALL and T-ALL. In the case of T-ALL it was interesting to find that PCAT18 was strongly associated with the expression of the CD3D, a T cell lineage specific antigen. Moreover, HHIP-AS1 appeared to be associated with the SHH pathway, that is frequently deregulated in T-ALL. Although the observational nature of our study, it made available to the research community a comprehensive map of the lncRNA landscape of the various types of paediatric leukaemia, useful not only for diagnostics purposes but also, after appropriate ad hoc functional studies, for therapeutic purposes. In this respect, it is important to note that lncRNAs are easily detectable. Their identification could be included in normal clinical practice strengthening the diagnostic process and improving paediatric patient management. Increasing the patient cohort could help to correlate the expression of specific candidate lncRNAs identified by our study with clinical information and testing their potential prognostic effectiveness in stratifying patients according to their clinical characteristics. This aspect made our study an important resource for the scientific community, laying the foundations for future functional and clinical studies.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.