Background
Cancer treatment has radically changed over time, evolving from a one-size-fits-all approach to a more tailored, personalized approach. Furthermore, where once cancer treatment focused on the tumor the recent success of immunotherapy has highlighted the need to consider the tumor microenvironment in cancer care by harnessing the inherent anti-tumor immune response. Early clinical trials demonstrated the potential of immunotherapy to induce durable responses, resulting in immunotherapy being heralded as a turning point in cancer care. The first immune checkpoint inhibitor (ICI) against cytotoxic T-lymphocyte antigen number 4 (CTLA-4), ipilimumab, received FDA approval in 2011 for the treatment of advanced melanoma [
1]. In the following years, the FDA approved the use of additional immune checkpoint inhibitors and extended their use for a range of tumor types based on their immune checkpoint ligand expression rather than their tissue-of-origin [
1]. To date, immunotherapy has shown promising results in 15 different cancer types and the use of first-line treatment with the ICI pembrolizumab even outperforms conventional chemotherapy in a few cancer types [
2,
3]. Unfortunately, the success of immunotherapy is limited to a minority of patients as a result of tumor intrinsic factors and microenvironmental modifiers, leading to a surge of studies aiming to identify immune-related gene signatures that could predict which patients would be more likely to benefit from immunotherapy.
In this study, we explored long non-coding RNA (lncRNA) profiles of tumors in relation to tumor immune phenotypes. The number and role of lncRNAs were previously underappreciated. Currently, the GENCODE project (v39) lists 18,811 human lncRNAs and 51,306 lncRNA transcripts, and lncRNAs have been involved in various biological processes regulating gene expression and post-transcriptional modification [
4]. Furthermore, emerging evidence supports a role for lncRNAs in regulating the adaptive immune response in addition to the innate immune response with potential implications for cancer immunity and immunotherapy [
5‐
7]. In particular, lncRNAs have been implicated in tumor immune escape through the regulation of the antigen presentation machinery as well as of immune cell development, recruitment and function [
6‐
9]. In addition, few lncRNAs have been shown to modulate immune checkpoint expression, and hence may be associated with immunotherapy response [
10,
11]. While a better understanding of the expression patterns and mechanistic roles of individual lncRNAs can help to dissect their biological functions in cancer, panels or signatures of lncRNAs will more likely hold prognostic and predictive potential. Various immune-related lncRNA signatures have been identified with prognostic connotations for specific cancer types, including gastric cancer, head and neck cancer, lung cancer, colorectal cancer and hepatocellular carcinoma [
12‐
18]. In breast cancer, few lncRNA signatures have been associated with tumor immune infiltration or immune functional status [
19‐
24]. Moreover, lncRNA-based immune-classification has been proposed to identify “immune-active” cases that are characterized by an immune-functional lncRNA signature, high T cell infiltration in tumors and improved immunotherapy benefit [
25]. Together, these studies demonstrate the potential clinical value of immune-related lncRNA signatures, however, more studies with larger sample sizes and prospective study design are needed to validate these findings. Furthermore, the prognostic value of the reported signatures may be limited to the tumor type in which they were identified.
Here, we identified immune-related lncRNA signatures (and proxy protein-coding gene network) that are associated with clinical outcome and immune checkpoint expression in breast cancer and have prognostic value in multiple cancer types. Using the large TCGA breast cancer dataset, we first identified differentially expressed immune-related lncRNAs (ir-lncRNAs) in immune favorable versus immune unfavorable tumors as defined by the Immunologic Constant of Rejection (ICR), a prognostic gene signature of tumor immune activation [
26‐
30]. Next, we mapped the ir-lncRNAs to a coding-non-coding gene network enabling the identification of proximal protein-coding genes using the random walk with restart (RWR) computational algorithm. We then investigated the biological role of these proximal protein-coding genes through pathway enrichment analysis. Finally, we identified a set of three ir-lncRNAs that are in addition associated with immune checkpoint expression and show a stronger effect on overall survival in multiple cancer types as compared with the ICR signature, highlighting the potential role of lncRNAs in defining the immune contexture of tumors.
Methods
Patient cohorts
Initial lncRNA analysis was performed using the TCGA breast cancer cohort, and identified lncRNA signatures were validated in several TCGA cancer datasets (BRCA [n = 798], HNSC [n = 417], SKCM [n = 216], UCEC [n = 311], LIHC [n = 191], STAD [n = 247], BLCA [n = 248], CESC [n = 190], KICH [n = 65], OV [n = 249], LUSC [n = 202], READ [n = 44], COAD [n = 112], LUAD [n = 469], GBM [n = 150], KIRP [n = 188], KIRC [n = 298], LGG [n = 478]) as well as a small breast cancer cohort from Qatar (RAQA [n = 24]) [
31]. Clinical information and mRNA sequencing data from the TCGA datasets were obtained through the GDC portal as previously described [
31], whereas lncRNA expression data was extracted from the TANRIC database.
RNA isolation and total RNA sequencing of the RAQA breast tumors was performed as previously reported [
31]. Both gene and lncRNA expression data were subjected to quality control using FastQC (python v.2.7.1, FastQC v.0.11.2), adapter sequences were trimmed using flexbar (v.3.0.3), and reads were aligned to GRCh37 using hisat2 (v.2.1.0) and SAMtools (v.1.3). After alignment, QC was performed to verify the quality of the alignment and paired-end mapping overlap using Bowtie2 (v.2.3.4.2). Finally, reads were counted to genomic features using subreads (v.1.5.1) and GRCh37.87 (gene expression) or GRCh37.p13 (lncRNA expression).
mRNA-seq data of TCGA and RAQA datasets were normalized within lanes to correct for gene-specific effects (including GC-content and gene length) and between lanes to correct for sample-related differences (including sequencing depth) using the R package EDASeq (v.2.12.0). The resulting gene and lncRNA expression matrices were quantile normalized using R package preprocessCore (v.1.36.0). All downstream analysis was performed using R (v.3.5.1 or later).
ICR consensus clustering
Consensus clustering of TCGA-BRCA samples was performed based on the expression values of 20 ICR genes using the ConsensusClusterPlus (v.1.42.0) and the following parameters: 5000 repeats, agglomerative hierarchical clustering with ward criterion inner and complete outer linkage. The optimal number of clusters for best segregation of samples was determined using the Calinski-Harabasz criterion, and samples were clustered as ICR high (immune hot), ICR medium or ICR low (immune cold). Downstream comparative analyses were performed using ICR high and ICR low tumor samples.
ICR-differentially expressed lncRNA and protein-coding gene network analysis
Using the TCGA-BRCA dataset, we developed an analysis pipeline involving the identification of differentially enriched lncRNAs by ICR cluster and the construction of proxy protein-coding gene networks. Linear Model for Microarray Analysis (LIMMA, FDR p < 0.05) was applied to identify differentially expressed ir-lncRNAs between ICR high (n = 115) and ICR low (n = 128) tumors. Next, the differentially expressed ir-lncRNAs were mapped to a coding-non-coding gene (CNC) correlation network (Additional file
1A) as described in the LncRNAs2Pathways method [
32]. The CNC network consists of 11,391 lncRNAs and 17,222 protein-coding genes. We utilized the random walk with restart (RWR) global network propagation algorithm to identify protein-coding genes that are most likely influenced by the ir-lncRNAs due to close proximity. Proximal protein-coding genes were identified based on their propagation scores as per the RWR algorithm.
Pathway enrichment analysis
Once we defined the proximal coding genes associated with the differentially expressed ir-lncRNAs, we sought to explore their biological relevance through pathway enrichment analysis. First, we applied the approach described in the LncRNAs2Pathways method whereby a pathway enrichment score is calculated using the walkscores of the ranked protein coding genes in a Kolmogorov–Smirnov-like statistic with 1000 permutations. However, we observed that using this approach the walkscore distribution was highly skewed (Additional file
1B), whereby the majority of protein-coding genes have a very small walkscore (~ 0) and only a small fraction had a relatively high walkscore, which may result in false positive enriched pathways. Pathways consisting of predominantly protein-coding genes with small walkscores and few protein-coding genes with relatively high walkscore would be associated with smaller enrichment scores than pathways that were represented by protein-coding genes with primarily high walkscores. To address this limitation, we used a stringent criterion of a walkscore of ≥ 0.01 to generate a ranked list of most proximal protein-coding genes to the differentially expressed ir-lncRNAs, which coincidentally corresponds to approximately 1% of protein-coding genes. Next, we subjected the ranked protein-coding gene list to ConsensusPathDB [
33,
34] and visualized the data by the func2vis R package (v.1.0.1) to identify the enriched pathways, and to Ingenuity Pathway Analysis (IPA) to identify enriched diseases and functions.
Single-sample gene set enrichment analysis (ssGSEA)
Single sample gene set enrichment analysis was applied to calculate enrichment scores of specific gene sets within each individual sample using the GSVA R package (v.1.30.0).
Correlation analysis
Spearman’s correlation analysis was used to assess the correlation between differentially expressed ir-lncRNAs and immune checkpoints. Spearman’s rank correlation coefficients were visualized in a heatmap using the ComplexHeatmap R package (v.2.1.2) with the columns ordered by sum of the correlation scores and the rows ordered by absolute sums of the correlation scores.
Survival analysis
Univariate Cox proportional hazards regression survival analysis was performed using the survival R package (v.2.41–3), Hazard Ratios (HRs) between any two groups of interest and corresponding p values based on X2 test, and 95% confidence intervals (95%-CI) were calculated. Survival analysis was performed with the lncRNA signatures and ICR score as continuous variables and visualized in forest plots that were generated with the forestplot R package (v.1.7.2). The horizontal lines in the forest plot represent the 95% confidence intervals and the squares represent the Hazard ratios. In addition, univariate survival analysis was used to calculate the HRs of an ICR/3 ir-lncRNA combination model that sums the scaled enrichment scores of the ICR and 3 ir-lncRNA signatures. The Kaplan–Meier curves were generated using the ‘ggsurv’ function from survminer (v. 0.4.8) and the optimal cut-off point for stratification within each cancer type was determined by 5-fold cross validation analysis. Log-rank test was used to assess statistical differences in overall survival.
Multivariate cox regression analysis
Multivariate Cox regression analysis was used to determine the contribution of individual lncRNAs to the prognostic value of the 3 lncRNA-signature using the survival package (v3.2-13).
To determine whether the ICR or 3 ir-lncRNA signature is most likely to be the best model, we estimated and compared the Akaike information criterion (AIC) values using ‘extractAIC’ function from the stats package (v3.6.2).
Cell composition deconvolution methods
We applied different deconvolution approaches to estimate the abundance of specific cell subsets from bulk transcriptomic data, including the Consensus Tumor MicroEnvironment cell estimation (Consensus
TME) method [
35] using ConsensusTME (v. 0.0.1.9), and immune cell subpopulation estimation methods based on leukocyte subgroup enrichment scores [
36] or immune metagene expression profiling [
37]. In addition, we applied the Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm [
38] using ESTIMATE (v.1.0.13) to infer the extent of stromal and immune cell infiltration. Pearson scatter plots of the model enrichment scores with the 3 ir-lncRNA enrichment scores were generated using the corrplot (v. 0.92).
Discussion
The vast amount of tumor immunology research studies and immunotherapy clinical trials have clearly demonstrated the importance of the tumor immunophenotype in clinical outcome and highlighted the need for predictive biomarkers of an active tumor immune microenvironment. In our previous work, we defined and validated the ICR signature as a prognostic tool to distinguish ‘hot’ tumors (ICR high) from ‘cold’ tumors (ICR low), whereby, the former are associated with a more favorable clinical outcome and greater treatment response to immune checkpoint blockade [
27,
28]. Mechanistically, we found that ICR low tumors are strongly associated with mutations in MAPK and activation of the TGF-β and Wnt-β catenin pathways [
27,
42]. In a recent pan-cancer analysis, we further demonstrated that the prognostic connotation of the ICR immune phenotype may be differentially impacted by the activation of distinct oncogenic pathways [
28]. As such, the favorable prognosis associated with ICR high tumors was abolished by the activation of TGF-β signaling and a low proliferation molecular profile.
In the present study, we expanded our molecular analysis of ICR immune phenotypes to include lncRNAs as potential regulators of immune disposition and concomitantly immunotherapy response. Analysis of the lncRNA profile of ICR high versus ICR low breast tumors from the TCGA repository revealed a number of differentially expressed immune-related lncRNAs (ir-lncRNAs) which we subsequently mapped to a coding-non-coding gene network using a computational network propagation algorithm. Pathway analysis of those proxy protein-coding genes subsequently identified the genes to be involved in multiple biological processes related to metabolic pathways and protein trafficking. Several of the identified processes play a major role in mitochondrial oxidative phosphorylation which largely defines the metabolic fitness of cancer and immune cells. Generally, as tumors progress cancer cells undergo metabolic reprogramming from oxidative phosphorylation to aerobic glycolysis in order to support growth and survival. This reprogramming creates an environment of metabolic competition for glucose between cancer cells and tumor infiltrating cytotoxic T cells who also increasingly rely on aerobic glycolysis upon activation [
43,
44]. As such, metabolic competition can lead to T cell dysfunction, resulting in unfavorable tumor immune phenotypes. Furthermore, the synthesis of leukotrienes and eoxins plays an important role in shaping the tumor microenvironment by regulating leukocyte migration and promoting tumor growth and metastasis [
45]. Together, this suggests that ir-lncRNAs may be implicated in defining the immune contexture of tumors in addition to promoting tumorigenesis. The presence of an active pre-existing immune response is a crucial determining factor in immunotherapy response, in particular to immune checkpoint blockade.
To address the role of ir-lncRNAs in immune checkpoint expression, we investigated the relationship of ir-lncRNA signatures with 30 immune checkpoint molecules in breast cancer. We found that ICR-associated ir-lncRNAs could be categorized into two clusters, one with positive and one with negative correlations with immune checkpoint molecules. In exception, CD276 (B7-H3) expression showed an opposite correlation with ir-lncRNA expression. CD276 is expressed in many cell types and has been shown to play a role in innate and adaptive immune responses, however, its function as a co-stimulatory or co-inhibitory molecule remains controversial [
46].
Finally, we sought to determine the prognostic value of ir-lncRNAs based on our findings that showed an association of ir-lncRNAs with metabolic activities and immune checkpoint expression, which both regulate immune cell disposition and therefore may impact clinical outcome. We defined three different ir-lncRNA signatures using the TCGA breast cancer dataset, evaluated their prognostic significance in a local breast cancer cohort and explored their clinical value in a pan-cancer setting. Although the local breast cancer cohort (RAQA) is considerably small in size, similar patterns in prognostic significance were observed, highlighting the robustness of the ir-lncRNA signatures across ancestral populations such as the Arab population which remains largely underrepresented. The first signature comprised the top 20 differentially expressed ir-lncRNA in ICR high versus ICR low tumors (20-ICRlncRNA) and demonstrated prognostic significance in 6 solid tumor types (BRCA, HNSC, SKCM, KIRP, KIRC and LGG) with a lower hazard ratio for overall survival than the ICR signature. The second lncRNA signature is composed out of the top 20 ir-lncRNAs that are positively correlated with immune checkpoint expression (20-ICPlncRNA) and overall shows a stronger effect on survival than the ICR signature. Further study is needed to investigate the individual checkpoint molecule correlations with the 20-ICPlncRNA signature in order to gain insight into potential molecular mechanisms and to explore their value in predicting immunotherapy response in larger prospective cancer patient cohorts. Comparison of the two ir-lncRNA signatures revealed the presence of three common ir-lncRNAs, PCED1B-AS1, RP11-291B21.2 and AC092580.4, that could potentially be used as a minimal informative set of ir-lncRNAs with prognostic significance and more practical format for clinical use compared to the ICR signature. Survival analyses of the 3 ir-lncRNA signature confirmed its prognostic value in 7 cancer types; 5 in which it showed a stronger effect on survival compared to the ICR signature (ICR enabled [BRCA, HNSC, SKCM], ICR disabled [KIRP, LGG]) and 2 in which the ICR does not hold prognostic significance (ICR neutral [UCEC, CESC]). These findings suggest that the 3 ir-lncRNA signature could be used to improve prognostic stratification over the ICR and in addition could offer prognostic information in tumors where ICR does not hold prognostic value (ICR neutral). Of note, whereas both signatures show a positive correlation with overall survival in the majority of cancer types, they are associated with a worse survival in kidney renal papillary cell carcinoma (KIRP) and low-grade glioma (LGG). In accordance, several studies have reported an inverse association between high immune cell infiltration or immune activity with prognosis in these specific tumor types. For instance, in low-grade glioma a worse prognosis has been associated with enhanced immune infiltration whereby an increase in M0/M1 macrophages increases the permeability of the blood brain barrier and promotes glioma cell growth and invasion [
47‐
49]. In addition, high B cell infiltration has been associated with worse prognosis in kidney renal papillary cell carcinoma (KIRP) and low-grade glioma (LGG) and may be linked to the presence of a specific immunosuppressive B cell subset, regulatory B cells [
50,
51]. Moreover, despite the presence of tumor immune cell infiltration, T cell function may be suppressed by an increase in immune checkpoint expression as has been suggested by a 15-gene signature in kidney renal papillary cell carcinoma (KIRP) [
52]. Further study is needed to tease out the relation between KIRP and LGG patient survival and the abundance and functionality of diverse immune cell subsets.
In order to further investigate the association of the 3 lncRNAs with multiple cell types within the tumor microenvironment, we used several deconvolution methods. Using the ESTIMATE algorithm, we found that the 3 ir-lncRNA signature strongly correlates with both the ESTIMATE stromal and immune scores across cancers, suggesting that the 3 ir-lncRNA expression might be derived from both the stromal and immune cell compartment within the tumors. Furthermore, we found an overall positive association of the 3 ir-lncRNA signature with pro-inflammatory and cytotoxic immune cell subpopulations, and a negative correlation with T helper 2 and T helper 17 cells, T cell memory cells and immunomodulatory NK CD56bright cells.
Given the potential clinical value of the 3 ir-lncRNA signature, we looked into the reported molecular mechanisms and biological processes that may be affected by these 3 ir-lncRNAs. All three ir-lncRNAs have been found to be overexpressed in multiple cancer types [
53‐
61]. Mechanistically, PCED1B-AS1 has been shown to function as an oncogenic lncRNA regulating miRNA expression, ultimately promoting aerobic glycolysis, proliferation, invasion and epithelial-to-mesenchymal transition while reducing apoptosis of cancer cells [
53,
56,
57,
59]. In addition, PCED1B-AS1 was found to be positively associated with immune checkpoint expression and in particular to increase the expression of PD-L1 and PD-L2 through interaction with mir-194-5p, leading to an enhanced immunosuppression [
10,
58]. Less is known about the function of RP11-291B21.2 in cancer, however, it has been associated with durvalumab response in non-small cell lung cancer and bladder cancer patients, and was found to correlate with several key immune genes [
62]. Single-cell RNAseq analysis further indicates that RP11-291B21.2 is dominantly expressed in exhausted CD8+ T cells [
62]. Furthermore, AC092580.4 expression is strongly correlated with key immune genes and pathways including Gata3 expression, suggesting that it may be involved in modulating T cell polarization and hence anti-tumor immunity [
63,
64]. Additional single cell multi-omics and functional studies are needed to better characterize the cellular origin and interacting partners and downstream signaling pathways of each of these ir-lncRNAs.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.