Introduction
Normal karyotype acute myeloid leukemia (NK-AML) is a complex heterogenous disease, and its pathogenesis, disease evolution and prognosis have not been fully elucidated. NK-AML is the most common cytogenetic type of AML, accounting for 40–49% of adult AML and 20–25% of pediatric AML diagnoses [
1,
2]. In NK-AML, Bullinger et al. first distinguished two distinct groups: the FAB M1/M2 subtype with FLT3 mutation, in which the GATA2, NOTCH1, DNMT3A and DNMT3B genes are highly expressed, and the FAB M4/M5 subtype, in which the abnormally expressed genes are associated with granulocyte and monocyte differentiation, the immune response, and hematopoietic stem cell survival. Of note, patients classified into these groups have different outcomes [
3]. Leukemia stem cells (LSCs) are a minor fraction of self-renewing cells that are capable of initiating and maintaining leukemia [
4]. AML LSCs have been demonstrated to exhibit self-renewal, relative quiescence, apoptosis resistance, and increased drug efflux, which likely render these cells less susceptible to conventional therapies aimed at bulk proliferative disease [
5]. Therefore, for the purpose of eradicating AML and achieving long-term remission, treatment courses must eliminate the LSC population [
6]. Recently, high-throughput sequencing has increased our knowledge of the genomic and transcriptome landscapes of AML [
7,
8], but merging these data with the in vivo biology of LSCs is still in early stages. Thus, a powerful approach is needed to characterize malignant cell populations, such as LSCs, and their biological information.
In AML, traditional sequencing technologies have masked the characteristics of minor populations of leukemic cells, while single-cell sequencing can explore the differences in the genomic [
9,
10], transcriptomic [
11,
12] and epigenomic landscapes [
13] of this disease between cells at single-cell resolution. The application of single-cell RNA sequencing (scRNA-seq) has led to the identification of new LSC populations and new markers in AML patients and has also identified independent factors of a poor prognosis and strategies to prevent AML recurrence [
11,
12]. In AML, scRNA-seq identified intratumoral heterogeneity and distinguished malignant AML cells from normal cells. In brief, this technique provides a powerful means to potentially address questions related to stemness, developmental hierarchies, and interactions between malignant and immune cells [
14]. However, the application of scRNA-seq in AML, the most common hematological malignant tumor in adults, is still in its infancy, especially in NK-AML, and there is no systematic description in the literature.
Here, we adapted scRNA-seq technology to acquire transcriptional data for thousands of single cells from bone marrow (BM) aspirates. We profiled 36,865 cells from 5 NK-AML (M4/M5) patients and 2423 cells from 1 healthy donor by scRNA-seq and acquired a total of 18 cell subpopulations. We also profiled the single-cell transcriptome atlas of BM cells from the NK-AML (M4/M5) patients and healthy donor. In addition, we revealed the existence of a key cell subset, which could be a group of LSC-like cells that may play key roles in the initiation and maintenance of NK-AML, as this population coexpressed multiple genes related to AML pathogenesis and a poor prognosis. Finally, through combination of clinical data from the GEO database with qRT‒PCR analysis results, we verified that integrin subunit α-4 (ITGA4), inositol 1,4,5-trisphosphate receptor type 2 (ITPR2), adhesion G protein-coupled receptor E2 (ADGRE2), ankyrin repeat domain 28 (ANKRD28), lysine demethylase 5B (KDM5B) and cyclin-dependent kinase 6 (CDK6) were significantly upregulated in NK-AML and that ITGA4 and ITPR2 may be biomarkers for predicting NK-AML prognosis.
Methods
Human specimen procurement and isolation
Five patients who were pathologically diagnosed with NK-AML (M4/M5) and one healthy volunteer at The First Affiliated Hospital of Guangxi Medical University between 2019 and 2020 were enrolled in this study. None of the patients were treated with chemotherapy, radiation or any other antitumor medicines prior to BM sample collection. This study was approved by the Ethics Committee of The First Affiliated Hospital of Guangxi Medical University. Written informed consent was obtained from every participant in accordance with the Declaration of Helsinki.
BM mononuclear cells were isolated using density gradient centrifugation according to the manufacturer’s instructions. In brief, 2 ml of fresh BM aspirate and 2 ml 1 × DPBS (Gibco) were collected in an EDTA anticoagulant tube and subsequently layered onto Lymphocyte Separation Medium. After centrifugation, BM mononuclear cells (the third layer) were carefully transferred to a new tube and washed with 1 × DPBS. After supernatant removal, the cell pellets were suspended in red blood cell lysis buffer (Solarbio) and incubated on ice for 10 min to lyse red blood cells. After washing twice with 1 × DPBS, the cell pellets were resuspended in cell freezing medium (90% fetal calf serum supplemented with 10% dimethyl sulfoxide (DMSO)). Finally, the BM mononuclear cells were viably frozen and stored in liquid nitrogen or a – 85 ℃ freezer.
Viably frozen cells were thawed using standard procedures. First, frozen cells were thawed at 37 °C, suspended in Dulbecco’s Modified Eagle Medium (DMEM) and washed with DMEM. The cell suspension was passed through a 40 µm filter after resuspension in DMEM at a concentration of 1–2 million cells per ml. Finally, we obtained a single-cell suspension. Cell counts and viability were determined with a hemocytometer with trypan blue staining (Gibco). Samples were analyzed on a Chromium system (10 × Genomics) according to the manufacturer’s instructions for an expected capture rate of 20,000 single cells per patient.
Sample processing with the 10X genomics platform and cDNA library preparation
To process the previously mentioned single-cell suspensions, we added a single-cell sample, gel beads and partitioning oil to 10X Genomics Single Cell A Chip Kits and acquired gel beads in emulsion (GEMs). The GEMs were reverse transcribed using a Bio-Rad C1000 Touch. The conditions were as follows: 53 ℃ for 45 min, 85 ℃ for 5 min, and hold at 4 ℃. The hot cover was set to 53 ℃. After reverse transcription, cDNA was recovered using Recovery Agent, which was provided by 10X Genomics, and then purified with Silane DynaBeads as outlined in the user guide. Purified cDNA was amplified before being cleaned using SPRIselect beads to eliminate short fragments. The PCR conditions were as follows: 98 ℃ for 45 s; 98 ℃ for 20 s, 67 ℃ for 30 s, and 72 ℃ for 1 min for 12 cycles; and 72 ℃ for 1 min. The cDNA concentrations of samples were quantified using the Qubit3.0 Fluorometer (Invitrogen). The cDNA libraries were constructed using the Chromium Single Cell 5’ Library Kit. The constructed libraries were analyzed on an Illumina HiSeq2500 sequencer in the PE150 mode.
scRNA-seq data processing
The Chromium-prepared sequencing data were demultiplexed and converted to FASTQ files with 10X Genomics Cell Ranger software (version 3.1.0). The same software package was used for filtering, alignment, and count quantification. The FASTQ files were aligned to the human reference genome GRCh38. These preliminary data were then analyzed with the R package Seurat. Cells with too many or too few genes or too much mitochondrial RNA were filtered out, as these might represent doublets. Specifically, cells with < 500 or > 4000 genes, a UMI ≥ 8000 or a mitochondrial gene percentage ≥ 10% were filtered. The expression value of each gene in a given cluster was compared against that in the rest of the cells using the Wilcoxon rank-sum test. Significantly upregulated genes were identified using the following criteria: (1) gene expression ≥ 1.28-fold in the target cluster, (2) gene expression of cells belonging to the target cluster > 25%, and (3) p value < 0.05. After rigorous quality control, a total of 39,288 cells and an average of approximately 2 × 104 genes were retained in the six samples for subsequent scRNA-seq analysis.
After removing poor-quality cells from the dataset, we employed the global-scaling normalization method “LogNormalize” to normalize gene expression and identified highly variable genes in the single cells. Subsequently, the most variable genes were identified, and a linear dimensionality reduction approach (principal component analysis, PCA) was performed with the variable genes. The principal components were then included in a graph-based clustering algorithm. For visualization purposes, a nonlinear dimensionality reduction approach (t-distributed stochastic neighbor embedding, t-SNE) was used, and the t-SNE plots were colored according to the clusters determined in the previous step. Ultimately, we identified 18 clusters from the scRNA-seq data.
TCGA and GTEx databases
We obtained BM mRNA expression data and clinical parameters from normal donors in the GTEx database and downloaded the mRNA expression data and clinical parameters of AML patients in the TCGA database. Then, we integrated and normalized the data from these two databases using the R package “GTEx.merge.R” and acquired differentially expressed genes (DEGs) by comparing the AML and normal control data. The R package “GTEx.Survival.R” was used to obtain prognosis-related genes and survival curves by survival analysis. P < 0.05 was considered statistically significant.
Functional analysis
Genes with a P value < 0.01 and an absolute log2-fold change (log2FC) > 0.36 between a target cluster and other clusters were used for GO and KEGG pathway enrichment analyses. In addition, preranked gene set enrichment analysis (GSEA) was performed. The required input files were extracted from the expression matrix, and the enrichment analyses were performed using OmicShare tools. Interactions between DEGs were analyzed using the Gene Multiple Association Network Integration Algorithm (GeneMANIA;
http://www.genemania.org/). The Search Tool for the Retrieval of Interacting Genes (STRING;
https://string-db.org/) was used to investigate the protein‒protein interactions between DEGs.
Survival analysis
The transcriptome data and clinical information of the GSE106291 dataset were downloaded from the GEO database. After normalizing the read counts and log2-transforming the gene expression values, the gene expression matrix of each patient was obtained. A median threshold of gene expression was used to categorize patients into high- and low-expression groups. A total of 250 AML patients and the top 76 DEGs from a cluster were selected for survival analysis to explore the relationship between the expression of target genes and patient clinical outcome. GraphPad Prism 7 software (GraphPad Software Inc., La Jolla, CA, USA) was used to visualize the Kaplan‒Meier estimates of survival curves. P < 0.05 was considered statistically significant.
Quantitative real-time PCR (qRT-PCR)
A total of 30 BM samples were obtained from 20 chemotherapy-naive NK-AML (M4/M5) patients and 10 healthy volunteers between 2013 and 2016 at The First Affiliated Hospital of Guangxi Medical University. The NK-AML(M4/M5) patients were classified according to the French-American-British (FAB, 2016) Criteria. All NK-AML (M4/M5) patients received regular follow-up, and the follow-up period ended in December 2017. Patients who had other hematological diseases or malignant tumors were excluded. The healthy volunteers had no obvious abnormalities in any examination indexes. Written informed consent was obtained from all the participants according to the Declaration of Helsinki prior to BM collection. Detailed clinical features of the 30 samples are provided in Additional file
2: Table S1C.
Up to 2 ml of BM sample was extracted from each participant, and BM mononuclear cells were separated by density gradient centrifugation. Total RNA was isolated from the BM cells using TRIzol reagent (Invitrogen, USA) according to the manufacturer’s instructions. RNA was then reverse transcribed into cDNA. Reverse transcription was performed with a SuperScriptTM III Reverse Transcriptase kit (Invitrogen: 18080–044) on a Gene Amp PCR System 9700 (Applied Biosystems). qRT‒PCR was performed on a ViiA 7 Real-time PCR System (Applied Biosystems) using the 2X PCR Master Mix Kit (Arraystar). The following reaction conditions were used: 95 °C for 10 min, followed by 40 cycles of 95 °C for 10 s and 60 °C for 1 min. ACTB was used as an endogenous reference gene, and the primer sequences used in the present study were as follows: ACTB forward: 5′GTGGCCGAGGACTTTGATTG3′ and reverse: 5′CCTGTAACAACGCATCTCATATT3′; ITPR2 forward: 5′TGCGCCAATCAGCTACTTCT3′ and reverse: 5′TCAGGATTAAGCTCTGCAGCTA3′; ADGRE2 forward: 5′GGTCCTGGAACCTGAGAAGC3′ and reverse: 5′AGGTGCTGGTGTTCTGGATG3′; ANKRD28 forward: 5′TGGTCACCGTCTATGTCTTCAG3′ and reverse: 5′AGGGCTTATTGTTGCTCTATTATC3′; KDM5B forward: 5′AATAGAACCCGAGGAGACAACG3′ and reverse: 5′GACAGACATACAGGTCCACAGCA3′; ITGA4 forward: 5′CTGGGTAGCCCTAATGGA3′ and reverse: 5′ATGCCCACAAGTCACGAT3′; and CDK6 forward: 5′CATTCAAAATCTGCCCAACC3′ and reverse: 5′GGTCCTGGAAGTATGGGTGA3′. The relative expression of target genes was calculated with the comparative 2 − ∆∆Ct method. As the data did not exhibit a normal distribution, the relative expression of target genes was compared among different groups using the Mann‒Whitney U test.
Discussion
At present, the treatment of NK-AML remains a challenge. The standard therapy for AML is still cytarabine- and anthracycline-based regimens (‘‘3 + 7’’ regimen) to achieve and maintain complete remission (CR) and cure AML. Approximately 60–80% of young people and 40–60% of elderly people (60 years or older) achieve initial remission after chemotherapy. However, at least 40% of these patients relapse with refractory disease, and the five-year survival rate is approximately 30% to 40% [
20,
21]. LSCs are a key factor in cancer treatment failure and disease evolution. In AML, there are a small number of cells, including LSCs, side population cells and other stem cell-like cells, that possess the biological characteristics of quiescence, multilineage differentiation, self-renewal and disease maintenance, leading to disease relapse [
22,
23]. Therefore, characterization of stem cell-like populations and the development of therapeutic strategies targeting these cells are the basis for achieving long-term remission of AML.
In this study, scRNA-seq was performed on BM mononuclear cells from 5 patients with NK-AML (M4/M5) and 1 normal control. The transcriptome profiles and gene expression patterns of the NK-AML (M4/M5) patients and healthy individual at the single-cell level were demonstrated. We captured a broad distribution of cell types, including monocytes, T cells, B cells, DCs, erythrocytes, and multiple leukemia cell populations. All 18 cell types were identified in all donors. Our results are consistent with the differential distribution of BM cells in healthy controls and AML patients and provide more evidence of the heterogeneity in NK-AML.
Due to cell scarcity, it is difficult to distinguish cancer stem cells (CSCs) using classic next-generation sequencing. In the scRNA-seq data, we found that cluster 12, which was rare in number, highly expressed not only multiple LSC markers, such as CD34, CD96 and CD133, but also the nontraditional LSC markers CD38 and CD46. CD38 is highly expressed mainly in multiple myeloma and chronic B-cell leukemia [
24,
25]. CD46 is a key regulator of the classical and selective complement activation cascade in the innate immune system and is associated with a variety of immune inflammatory diseases [
26]. GO and KEGG enrichment analyses showed that the DEGs of this cluster had biological functions related to the malignant pathways of AML. In AML, the first study of CSCs was published in the late twentieth century, in which Bonnet and Dick isolated a subset of CD34
+/CD38
− leukemia cells. Compared with CD34
+/CD38
+ and CD34
− cells, CD34
+/CD38
− cells can initiate AML in nonobese diabetic mice with severe combined immunodeficient disease (NOD/SCID mice) [
27]. To date, the CD34
+/CD38
− phenotype is still a recognized marker for LSC isolation. Our data showed that cluster 12 with high expression of CD34 and CD38 may be an LSC-like subpopulation, such as a side population. Hence, we speculated that the cluster might be an LSC-like group that plays important roles in NK-AML initiation and maintenance.
To explore gene expression patterns in this cluster, we identified 12 genes that were upregulated in AML and associated with prognosis in combination with data from the TCGA and GTEx databases. Studies on KIT, HGF and ERG in AML have demonstrated that these genes are all upregulated and that high expression predicts an unfavorable outcome [
28‐
31]. Moreover, KIT mutations are common in AML. KIT is expressed on more than 10% of blasts in 64% of de novo AML cases and 95% of relapsed AML cases. Thus, KIT represents a potential therapeutic target in AML [
32]. FCHSD2 is also highly expressed in AML, and its overexpression significantly increases cellular chemotherapy resistance. It was shown that FCHSD2 is a predictor of outcome in AML patients and that the determination of FCHSD2 expression at the time of diagnosis could help to predict the responses of AML patients to chemotherapy [
33]. There are few studies on the DPYD, ARHGEF6, MCTP2 and SPN genes in AML, but these genes are known to be differentially expressed in other tumors, such as colorectal cancer, lung cancer and hepatocellular carcinoma. In addition, ITGA4 and ITPR2 were validated in AML datasets from the GEO database. Low expression levels of ITGA4 and ITPR2 indicated a poor outcome. The prognostic significance of ITPR2 was the opposite of the significance reported in the literature, which may be due to tissue differences and requires further validation in the future. Furthermore, the functional and prognostic significance of the other various genes requires future experimental clarification. PCR analysis further demonstrated that the expression levels of ITGA4, ITPR2, ADGRE2, ANKRD28, KDM5B, and CDK6 in NK-AML (M4/M5) were significantly higher than those in normal controls, suggesting that these genes may be possible biomarkers of NK-AML.
ITGA4, a member of the integrin alpha chain family of proteins, is considered to be an adverse prognostic factor for chronic lymphoblastic leukemia (CLL) with an invasive course and short time to treatment, and ITGA4 gene hypermethylation is a characteristic status in CLL compared with healthy controls [
34,
35]
. Protein–protein interaction network analysis showed that ITGA4 was associated with PROM1, KIT, CD34 and CD38, which were closely related to AML. GO and KEGG enrichment analyses showed that ITGA4 was enriched in cell adhesion, leukocyte migration and hematopoietic processes and participated in the PI3K-Akt signaling pathway, which is related to tumor cell migration, adhesion, tumor angiogenesis and extracellular matrix degradation. ITPR2 is a key regulator of calcium ion transmembrane transportation and plays critical roles in cell migration, cell division, the cell cycle and proliferation. ANKRD28 is located at 3p25.1, and its function remains unclear. ANKRD28 is widely expressed in human tissues, especially in the BM, brain and testis. ITPR2 and ANKRD28 were demonstrated to be novel biomarkers for worse prognosis in NK-AML [
36,
37]. ADGRE2 encodes a protein that is expressed mainly in myeloid cells and promotes cell‒cell adhesion. Upregulation of ADGRE2 was significantly associated with shorter OS in AML using publicly available genomic data [
38]. KDM5B encodes a lysine-specific histone demethylase that belongs to the jumonji/ARID domain-containing family of histone demethylases. This protein plays a role in the transcriptional repression or certain tumor suppressor genes and is upregulated in certain cancer cells [
39]. Downregulation of KDM5B produces efficient antileukemic effects in MLL-rearranged AML cells and HL-60 cells, suggesting that KDM5B may be a potential epigenetic target for AML treatment [
40,
41]. CDK6 is a member of the CMGC family of serine/threonine protein kinases and is important for cell cycle regulation. CDK6 is highly expressed in AML patient samples and represents a promising target in MLL fusion-expressing, FLT3-ITD-positive and NUP98 fusion protein-driven AML [
42,
43].
In summary, we leveraged the combination of single-cell transcriptomics and qRT-PCR to parse leukemia cells in NK-AML (M4/M5). Our results provide insight into the transcriptomic profiles of NK-AML (M4/M5) BM samples for the first time and identify a distinct LSC-like cell population with possible biomarkers in AML with a normal chromosomal karyotype. These findings provide a new population of LSCs and offer potential biomarkers and prognostic predictors for clinical applications in NK-AML. However, our experiment had a limited sample size, and the data may not be representative. In addition, the results have not been verified yet. Therefore, our next research step is to isolate the LSC-like cells according to their cell-surface markers identified by current data using flow cytometry. Quantitative Real-time PCR and Western blotting will be performed to detect the mRNA and protein expression of candidate marker genes to further verify our research results. The data and findings can guide therapeutic strategies aimed at malignant cells and target genes in NK-AML. Finally, the present study provides evidence that scRNA-seq plays an important role in the study of NK-AML and suggests that strategies promoting scRNA-seq may be valuable approaches for hematological malignancy therapy.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.