Background
Diffuse large B-cell lymphoma (DLBCL) is the most common type of non-Hodgkin's lymphoma. About 60% of DLBCL patients experience effective remission after rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP) regimens. However, approximately 30–40% of patients eventually relapse and 10% are primary refractory cases [
1]. The International Prognostic Index (IPI), which is widely used to evaluate the prognosis of DLBCL, mainly depends on five traditional clinicopathological features: age, the Eastern Cooperative Oncology Group (ECOG) performance, Ann Arbor stage, lactate dehydrogenase (LDH) level, and extranodal sites, but does not consider the molecular characteristics and microenvironmental differences in lymphoma. Given that DLBCL is a highly heterogeneous tumor, having the same clinicopathological features does not always lead to the same prognosis [
2]. Therefore, IPI score is not sufficient to accurately predict the prognosis [
3]. It is necessary for us to develop new strategies to identify risk among DLBCL patients more reliably, so as to personalize treatment strategies. Recent studies have shown that risk models based on multi-gene expression are a reliable choice [
4‐
6].
Metabolic reprogramming in tumor cells—notably, aerobic glycolysis, glutamine catabolism, macromolecular synthesis, and redox homeostasis—support the requirements of exponential growth and proliferation [
7]. The upregulation of many metabolism-associated genes (MAGs) is driven by the activation of oncogenes. For example, the proto-oncogene c-MYC can activate most glycolytic enzyme genes (principally hexokinase 2(HK2), phosphofructokinase (PFK)-M1, lactate dehydrogenase (LDH)-A and pyruvate kinase M2(PKM2)) to provide fuel for aerobic glycolysis, which subsequently enhances oxidative phosphorylation (OXPHOS). Another oncogene closely related to MAGs is AKT, it directly promotes aerobic glycolysis by upregulating the expression of HK2, PFK1/2 and glucose transporters (GLUT). It can also activate mitochondrial hexokinase (mHK) to promote the coupling of glycolysis and OXPHOS [
8,
9]. Therefore, MAGs are considered promising diagnostic markers and potential therapeutic targets. In addition, recent studies have focused on the relationship between metabolism and survival: pan-cancer studies have indicated that tumor subtypes with different MAG expression patterns lead to significantly different survival [
10,
11]. Moreover, several risk models based on MAGs have been proposed for breast cancer [
6], colorectal cancer [
12], gastric cancer [
13] and osteosarcoma [
14]. However, the value of MAGs in DLBCL subtype identification and prognostic prediction remains unclear.
The tumor microenvironment, as the hotbed of the tumor, has significant immune cell infiltration. In accordance with the complexity of the tumor microenvironment, immune cells recruited to the tumor tissues have dual tumor-promoting and tumor-antagonizing characteristics [
15]. The immune microenvironment plays a key role in tumor development and treatment. Studies have shown that metabolic reprogramming is closely related to the tumor immune microenvironment [
16]. Metabolites derived from tumor cells can influence the composition and distribution of cells in the immune microenvironment in many ways, ultimately leading to immune dysfunction and tumor progression [
17]. For example, metabolic reprogramming can affect the differentiation subtypes and functions of T cells and the polarization and function of macrophages [
18]. However, studies on the relationship between MAGs and the immune microenvironment in DLBCL remain limited.
In the present study, we used multiple bioinformatics methods to comprehensively analyze MAGs, identified metabolism-associated molecular subtypes in DLBCL patients, constructed a novel MAG-based risk model evaluating the prognostic value of MAGs in DLBCL, and explored the relationships between MAGs and the immune microenvironment. Finally, two hub genes in the risk model were selected and verified using our own tissue microarray (TMA), and their potential utility as therapeutic targets and diagnostic markers was discussed. Our study may provide new clues on mechanisms and metabolic targets in DLBCL, and it may lay the foundation for accurate immunotherapy that targets metabolic pathways in DLBCL.
Methods
Data sources and preprocessing
The GSE10846 Series Matrix File data were downloaded from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (the annotation platform was GPL570). The data of 412 DLBCL patients with a complete mRNA expression profile and survival time > 0 was extracted from the GSE10846 dataset. Of the 412 patients in the GSE10846 dataset, 232 patients had undergone R-CHOP treatment, while 180 patients only received CHOP treatment. We randomly divided (4:1 ratio) the 412 cases extracted from the GSE10846 dataset into a training cohort (n = 330) and a testing cohort (n = 82). Additionally, data on DLBC were downloaded from The Cancer Genome Atlas (TCGA) database (
https://portal.gdc.cancer.gov/), and 47 DLBCL cases with a complete mRNA expression profile and survival time > 0 were obtained from the TCGA database. We used the TCGA dataset (n = 47) as an external validation cohort to evaluate the predictive efficacy and robustness of the prognosis-associated risk model. Relevant grouping information and clinicopathological features were shown in Table
1.
Table 1
Clinicopathological characteristics of the DLBCL cases in GSE10846 and TCGA datasets
Gender | | | | |
Male | 222 (53.9%) | 173 (52.4%) | 49 (59.8%) | 26 (55%) |
Female | 172 (41.7%) | 141 (42.7%) | 31 (37.8%) | 21 (45%) |
NA | 18 (4.3%) | 16 (4.8%) | 2 (2.4%) | |
Age (year) | | | | |
≤ 60 | 188 (45.6%) | 144 (43.6%) | 44 (53.7%) | 26 (55%) |
> 60 | 224 (54.4%) | 186 (56.4%) | 38 (46.3%) | 21 (45%) |
ECOG-PS | | | | |
< 2 | 296 (71.8%) | 239 (72.4%) | 57 (69.5%) | |
≥ 2 | 93 (22.6%) | 75 (22.7%) | 18 (22.0%) | |
NA | 24 (5.8%) | 17 (5.2%) | 7 (8.5%) | |
(COO) Subtypes |
ABC | 167 (40.5%) | 132 (40.0%) | 35 (42.6%) | |
GCB | 182 (44.2%) | 148 (44.8%) | 34 (41.4%) | |
NA | 63 (15.3%) | 50 (15.2%) | 13 (15.9%) | |
LDH level | | | | |
Normal | 173 (42.0%) | 134 (40.6%) | 39 (47.6%) | |
Elevated | 177 (43.0%) | 143 (43.3%) | 34 (41.5%) | |
NA | 62 (15.0%) | 53 (16.1%) | 9 (11.0%) | |
Ann Arbor stage |
I–II | 188 (45.6%) | 147 (44.5%) | 41 (50.0%) | |
III–IV | 217 (52.7%) | 176 (53.3%) | 41 (50.0%) | |
NA | 7 (1.7%) | 7 (2.1%) | | |
Extranodal sites |
< 2 | 297 (72.1%) | 236 (71.5%) | 61 (74.4%) | |
≥ 2 | 23 (5.6%) | 19 (5.8%) | 4 (4.9%) | |
NA | 92 (22.3%) | 75 (22.7%) | 17 (20.7%) | |
MAGs were obtained from the GeneCards database (
https://www.genecards.org/). We identified 92 candidate prognosis-related MAGs in GSE10846 by univariate Cox regression. Based on the 92 MAGs, the 412 patients were divided into subgroups with different metabolic expression patterns by consensus clustering using the "ConsensusClusterPlus" R package, and unbiased and unsupervised outcomes were obtained.
Construction and validation of a MAG-based risk model
The 92 candidate MAGs related to prognosis were selected, and a prognostic model was constructed using least absolute shrinkage and selection operator (LASSO) regression. The risk score formula (based on the expression of each included gene weighted by its LASSO regression coefficient) was constructed using the following format: risk score = \({\sum }_{i=1}^{n}coef*gene expression\). Thereafter, the risk score of each patient was calculated. Using the median risk score as the cutoff, the training cohort was divided into low- and high-risk groups. Survival curves were generated by the Kaplan–Meier method and the two groups were compared using the log-rank test. A time-dependent receiver operating characteristic (ROC) curve analysis was used to study the model prediction accuracy. Cox regression was used to assess the independent prognostic value of the risk score and other clinicopathological features. To provide a reference for predicting the prognosis of DLBCL patients, we used the "rms" R package to construct a nomogram based on the risk score and clinicopathological features, and a calibration plot was used to assess the prognostic ability of the nomogram.
Immune analyses
The Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data (ESTIMATE) method was performed to calculate the stromal score, immune score, ESTIMATE score, and tumor purity. Next, the Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts (CIBERSORT) algorithm was used to analyze the RNA-Seq data of DLBCL patients in order to determine the relative proportions of 22 infiltrating immune cells. Furthermore, to quantify the immune cell infiltration in each sample, single-sample Gene Set Enrichment Analysis (ssGSEA) was used to assess the enrichment of 28 immune cells in the tumor samples. We then calculated the correlations between the risk score and immune regulatory genes, especially immune checkpoints.
Drug sensitivity analysis and construction of competing endogenouse RNA (ceRNA) network
Based on the Genomics of Drug Sensitivity in Cancer (GDSC) database (
https://www.cancerrxgene.org/), which is the largest pharmacogenomics database, we used the "pRRophetic" R package to predict the chemotherapy sensitivity of each tumor sample. The estimated half-maximal inhibitory concentration (IC50) value of each chemotherapy drug was obtained by regression, and the accuracy of regression and prediction was tested by cross-validation with GDSC training set for 10 times. All parameters were selected as default values, including "combat" for removing batch effect and the average value of repeated gene expression. Furthermore, we used FunRich (v3.1.3) and NPInter (v4.0) to construct a ceRNA network based on the model genes.
Gene Set Variation Analysis (GSVA) and functional enrichment analyses
GSVA is a non-parametric and unsupervised method for evaluating the enrichment of gene sets in relation to mRNA expression data. In this study, gene sets were downloaded from the Molecular Signatures Database (v7.0). Each gene set was comprehensively scored by the GSVA algorithm, and the potential differences in biological functions between the high- and low-risk groups were evaluated. Additionally, to explore the functions of the prognosis-associated MAGs, the "ClusterProfiler" R package was used to annotate the genes with their predicted functions based on Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. GO terms and KEGG pathways with p and q values < 0.05 were deemed statistically significant.
Weighted Gene Co-expression Network Analysis (WGCNA)
To identify the hub genes among the 14 model genes, we used the WGCNA algorithm. After constructing a weighted gene co-expression network, the gene co-expression modules were identified, and the correlations between gene network and clinical phenotype were explored. The WGCNA-R package was used to construct the co-expression network of all genes in the GSE10846 dataset, and the genes with variance within the first 5000 were identified by the algorithm for subsequent analysis. The soft-threshold β was determined by the function "sft$powerEstimate". The weighted adjacency matrix was transformed into a topological overlap matrix (TOM) to estimate the network connectivity, with hierarchical clustering being used to construct the clustering tree structure of the TOM. Different branches of the clustering tree represented different gene modules, and different colors represented different modules. Tens of thousands of genes were classified into modules based on having similar expression patterns (using their weighted correlation coefficients).
TMA tissue samples
The DLBCL TMA contained 104 DLBCL tissues and 28 reactive hyperplasia tissues (from cases with the same gender ratio and age range) collected from 2008 to 2015. It was prepared by the Department of Clinical Biobank of the Affiliated Hospital of Nantong University. Clinicopathological data, including gender, age, B symptoms, Ann Arbor stage, hemoglobin (Hb) level, LDH level, IPI score were collected. In addition, X-tile 3.6.1 software was performed to determine the optimal cutoff values for two hub genes expression. This study was a retrospective study, and the informed consent of all patients was obtained before the study. The Ethics Committee of the Affiliated Hospital of Nantong University approved this research.
Fluorescence-based multiplex immunohistochemistry (mIHC) staining
The DLBCL TMA slides were stained with multiplex fluorescence by using the Opal 7-color Manual IHC Kit (PerkinElmer, MA). After dewaxing by xylene and rehydration by ethanol, slides were heated in a microwave with AR6 Buffer (AR600, AKOYA) and AR9 Buffer (AR900, AKOYA) for antigen retrieval. The slides were incubated with primary antibodies overnight at 4 °C and then incubated with secondary antibody for 10 min at room temperature. At last, we used 4',6-diamidino-2-phenylindole (DAPI; F6057, Sigma) to stain the nuclei and seal the slides. Imaging was achieved using the Vectra 3.0 Automated Quantitative Pathology Imaging System. Tumor and stroma images were captured at ×20 magnification. Finally, the staining was scored by inForm® Cell Analysis software based on the intensity and degree of staining. The degree of staining was compared using the Wilcoxon rank-sum test.
The primary antibodies used in this study were as follows: rabbit anti-PHKA1 (24279-1-AP, Proteintech), rabbit anti-PLTP (ab282456, Abcam), rabbit anti-CD163 (93498, Cell Signaling Technology), rabbit anti-CD68 (76437, Cell Signaling Technology), rabbit anti-CD11B (49420, Cell Signaling Technology), mouse anti-CD66b (ARG66287, Arigobio), rabbit anti-PD-1 (86163, Cell Signaling Technology) and rabbit anti-PD-L1 (13684, Cell Signaling Technology). The secondary antibody was Opal™ polymer HRP Ms + Rb (ARH1001EA, Perkin Elmer).
Statistical analysis
Survival curves were generated by the Kaplan–Meier method and compared using the log-rank test. Multivariate Cox proportional hazards regression was used to identify independent prognostic factors. Wilcoxon rank-sum test was applied to continuous variables with nonnormal distribution. All statistical analyses were performed in R software (v4.0). All statistical tests were two tailed, and p < 0.05 was considered statistically significant.
Discussion
The molecular heterogeneity of DLBCL brings great challenges to precision therapy. It is generally accepted that the traditional IPI score cannot adequately predict the prognosis of DLBCL, and developing more reliable strategies for subtype identification and prognostic classification is urgent [
3,
19]. In this study, we identified two metabolism-associated molecular subtypes, and there were significant differences in prognosis and the immune microenvironment between these two subtypes. In addition, we developed a prognostic risk model based on 14 MAGs. We found that it was a powerful independent prognostic tool with better predictive performance than the IPI score and was closely related to the immunosuppressive microenvironment. Finally, we identified two hub genes among the model genes, and preliminarily verified them in our own TMA cohort using mIHC. Our results may contribute to the development of accurate immunotherapy for DLBCL that targets metabolic pathways.
Consensus clustering is an unsupervised clustering method that can identify different molecular subtypes according to a gene expression matrix [
20]. Using consensus clustering, we identified two metabolism-associated molecular subtypes, which had significant differences in prognosis and the immune microenvironment. Compared to cluster 2, the prognosis of the patients in cluster 1 was poor, accompanied by a high abundance of immunosuppressive cells and a general increase in the expression of immune checkpoints, indicating an immunosuppressive microenvironment. This is consistent with findings regarding other malignancies [
6,
13,
14,
21]. As the consensus clustering was based on a MAG expression matrix, we inferred that the expression of MAGs was related to the prognosis and immunosuppressive microenvironment of DLBCL patients.
To further evaluate the prognostic value of the MAGs, we established a 14-gene risk model in the GEO training cohort by univariate Cox regression and LASSO regression. We then constructed a prognostic nomogram that integrated the risk score based on this model and all significant clinical features. The risk score effectively predicted prognosis in the GEO training cohort and was validated in a GEO internal validation cohort and a TCGA external validation cohort. ROC curve analysis confirmed that the risk score was superior to the traditional IPI score. Multiple validation methods indicated the robustness of the risk model, and it is reasonable to believe that this risk model will be broadly applicable for individualized risk management. As previously mentioned, in view of the close relationship between metabolic reprogramming and the tumor immune microenvironment, we performed multiple immune analyses (ESTIMATE, ssGSEA, and CIBERSORT) to explore the differences in the immune landscape between the high- and low- risk groups. As expected, the high-risk group had a poor prognosis and an immunosuppressive microenvironment characterized by low immune score, low immune status, high abundance of immunosuppressive cells, and high expression of immune checkpoints. The low-risk group showed the opposite trend. This is also consistent with our immune analysis of metabolism-associated molecular subtypes. An increased risk score indicates a “cold tumor” [
22], with attenuated immunotherapy effectiveness and an immunosuppressive tumor microenvironment caused by metabolic reprogramming, which is consistent with poor prognosis. These conclusions further indicated that MAGs might play important roles in the altered immune response in DLBCL.
Notably, in the two groups with poor prognosis (cluster 1 and the high-risk group), in addition to the increase in the abundance of immunosuppressive cells and the expression of immune checkpoints, there was a significant increase in the infiltration of resting and activated NK cells. This is consistent with the results of previous studies, that is, an increased abundance of activated NK cells is associated with poor prognosis [
23]. NK cell dysfunction is common in hematological cancer, and it is related to tumor immune escape [
24]. We also found that KIR2DL1 and KIR2DL3 [
25], the common immune checkpoints on NK cells, were also significantly overexpressed in cluster 1 and the high-risk group. In the future, immunotherapy that blocks KIR2DL1/KIR2DL3 might reduce the abundance of activated NK cells.
Most of the MAGs in the risk model have been reported to be associated with cancer. To identify the most critical genes, i.e., the hub genes, among the 14 model genes for further experimental verification, we used the WGCNA algorithm to select key genes and then identified the overlapping genes among these genes and the model genes. As a result, we identified two hub genes: PLTP and PHKA1. The potential mechanisms of these two hub genes in DLBCL deserve further discussion.
Phospholipid transfer protein (PLTP) is a widely expressed lipid transfer protein that belongs to the lipopolysaccharide (LPS)-binding/lipid transfer gene family. PLTP can promote the transfer of a series of lipid molecules, including diacylglycerol, phosphatidic acid, sphingomyelin, phosphatidylcholine, phosphatidylglycerol, brain glycosides, and phosphatidylethanolamine. These transport functions play an important role in lipid and lipoprotein metabolism [
26,
27]. PLTP is differentially expressed in many kinds of tumors, such as prostate cancer [
27], ovarian cancer [
28], breast cancer [
29], lung cancer [
30], gastric cancer [
31] and glioma [
32]. Such a wide range of cancer types with differential expression of PLTP indicate that PLTP may be an important regulator of some common processes related to tumors.
The phosphorylase kinase regulatory subunit alpha 1 (PHKA1) gene encodes the muscle-type isoform of the PHK alpha subunit [
33]. PHKA1 plays a key role in glycogen metabolism [
34] and PHKA1 mutations cause glycogen storage disease type 9D, also known as X-linked muscle glycogenosis [
35]. However, research on PHKA1 in tumors is still limited. Research has shown that PHKA1, as an important gene related to glycogen metabolism, is related to the metastasis of prostate cancer [
36]. In addition, the increased expression of PHKA1 was associated with younger ages of gastrointestinal stromal tumor patients [
36].
We further preliminarily validated the two hub genes in our TMA cohort using mIHC, which can quantify immune cells in the tumor microenvironment more objectively than traditional semi-quantitative methods [
37]. Our verification results confirmed that the two hub genes were both overexpressed in DLBCL tissues. Thereafter, using X-tile (a valuable tool for outcome-based cutoff optimization) [
38] and the 5-year OS of patients, we determined the optimal cutoff value for PLTP and PHKA1 expression. Based on each cutoff value, we subdivided the DLBCL patients into high- and low-expression groups, and further studied the differences in the tumor immune microenvironment between the pairs of groups. We found that the prognosis of the high-expression groups was poorer, accompanied by an immunosuppressive microenvironment characterized by higher abundances of immunosuppressive cells (M2 macrophages and TAMs) and higher expression of immune checkpoints (PD-L1 and PD-1). Finally, univariate and multivariate Cox regression analyses indicated that PLTP and PHKA1 were both independent prognostic factors in DLBCL. These experimental results showed that high expression of the hub genes was closely related to the prognosis and immunosuppressive microenvironment of DLBCL, which was consistent with our bioinformatics analyses, and further verified the stability and accuracy of the risk model.
Studies have shown that metabolic reprogramming is an important feature of immune cell activation. Immune cells have different metabolic characteristics, which affect their immune function [
16,
18]. Macrophages, as the main immune-infiltrating cells in solid tumors, can polarize into inflammatory (M1) or immunosuppressive (M2) phenotypes based on external stimuli. M1 macrophages have pro-inflammatory and anti-tumor effects, while M2 macrophages have anti-inflammatory and pro-tumor effects [
39]. The metabolic reprogramming of tumors can affect the polarization process of macrophages [
40,
41]. For example, hypoxia and lactic acid accumulation can promote the production of immunosuppressive M2 macrophages. The increase in tumor glycolysis produces a large amount of lactic acid, and the accumulation of lactic acid drives macrophages toward the M2 phenotype. M2 macrophages overexpress arginase 1 (ARG1). ARG1 consumes L-arginine, which is necessary for cytotoxic T lymphocytes to exert anti-tumor activity, and produces polyamines with strong immunosuppressive effects [
18,
42]. Additionally, hypoxia promotes tumor development by inducing the production of angiogenic factors, mitogenic factors, and cytokines related to tumor metastasis in macrophages [
9]. Additionally, macrophages can undergo lipid-based metabolic reprogramming to promote tumor progression via increased membrane cholesterol efflux [
43,
44]. Moreover, M2 macrophages up-regulate fatty acid oxidation, mitochondrial respiration, and angiogenesis, thereby promoting tumor progression [
9,
45]. Our mIHC results also confirmed that M2 macrophages in DLBCL patients with high metabolic gene expression were significantly increased. Therefore, M2 macrophages may have potential as immunotherapy targets.
Interactions between immune checkpoints and their cognate receptors can deliver inhibitory signals to immune cells leading to their dysfunction and exhaustion, resulting in immunosuppressive microenvironment and tumor progression [
46]. Our study showed that the expression of most immune checkpoints significantly increased with increasing risk score, indicating an immunosuppressive microenvironment that was consistent with poor prognosis. Recent studies have shown that immune checkpoints are closely related to metabolism. On the one hand, checkpoint signals can regulate metabolism [
18]. For example, PD-L1 in tumor cells can activate the PI3K-Akt-mTOR pathway, stimulate glycolysis, and enhance glucose uptake by the tumor cells [
47]. CD155-TIGIT signaling in T cells of human gastric cancer inhibits glucose uptake, lactic acid production, and glycolytic enzyme expression [
48]. On the other hand, metabolism also modulate the tumor response to checkpoint blockade immunotherapy. For instance, obesity is recognized to enhance the PD-1 expression, and is associated with better outcome to checkpoint blockade immunotherapy in metastatic melanoma and renal cell carcinoma [
46]. Besides, 2-Deoxy-D-glucose (2-DG), a non-metabolizable glucose analog that inhibits normal glucose metabolism, can enhance the efficacy of anti-CTLA-4 immunotherapy by decreasing PD-L1 protein abundance and increasing expression of type-I interferon (IFN) and antigen presentation genes [
49]. Moreover, Powell's team showed that a glutamine metabolism inhibitor not only improved the immunosuppressive microenvironment, but also effectively reversed PD-1 inhibitor resistance when combined with a PD-1 inhibitor [
50]. Therefore, combining metabolic inhibitors with checkpoint inhibitors is expected to improve the efficacy of checkpoint blockade.
Our research has some unique advantages. In this study, two metabolism-associated DLBCL subtypes were identified, and a risk model based on MAGs was constructed. We used multiple validation methods to evaluate the model: first, we tested the model in a GEO internal testing cohort, then in a TCGA external validation cohort, and finally we identified two hub genes and carried out preliminary verification in our own TMA cohort. Satisfactory results were obtained from the multiple validation methods, confirming the robustness and accuracy of the risk model. In addition, we not only studied the predictive performance of the risk model, but also explored the effect of MAG expression on the tumor immune microenvironment in DLBCL.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.