Background
Cachexia is a metabolic syndrome found in 50% of patients with lung cancer [
1]. It is associated with the risk of death in non-small cell lung cancer (NSCLC) [
2]—the most prevalent histological type [
3]. Low muscle mass in cachectic patients with advanced NSCLC is associated with pain and poor quality of life [
4]. Therefore, early detection of cachexia is of utmost importance in predicting NSCLC patient outcomes and guiding better treatment decisions.
Multiple screening tools are used to measure muscle loss [
5]. One feasible approach is computed tomography (CT), which is routinely performed for cancer diagnosis and follow-up [
6]. CT-based muscle quantification at the third lumbar vertebra (L3) level is commonly used to assess muscle mass in cancer patients [
1]. However, CT scans of NSCLC patients usually do not include L3 [
7]. Studies overcame this limitation by analyzing the pectoralis muscle area (PMA) [
8,
9]. Our group demonstrated that PMA alone is a cachexia predictor in treatment-naive NSCLC patients [
9]. PMA quantification in lung cancer patients is associated with clinical outcomes; however, each study used different calculations and cutoffs to select cachectic patients [
8,
9]. Although informative, the best PMA cutoff that classifies cachectic patients based on CTs remains to be defined.
Molecular markers also contribute to the clinical classification of cachexia; however, none are yet approved for clinical use. The synergic expression of tumor-derived cachexia-inducing factors (CIFs) is correlated with the prevalence of cachexia in different tumor types [
10]. For example, lung cancer cachexia is characterized by increased expression of CIFs that ultimately induce muscle wasting [
11]. NSCLC patients with low PMA present increased expression of
IL8,
IL6, and other critical CIFs [
9]. Cell types in the tumor microenvironment (TME) contribute distinctively by secreting cytokines or other CIFs that act on cell surface receptors of target tissues [
12]. Cancer samples can be divided into clinically relevant immune subtypes depending on TME composition and gene expression profile [
13]. Thus, computational deconvolution of the transcriptome from tumor bulk using digital cytometry and single-cell RNA-Seq (scRNA-Seq) can help infer the TME cellular proportions. TME analysis also has the potential to identify cells responsible for secreting CIF in cachectic patients.
The aim of this study was to build a machine learning (ML) model based on PMA and clinical data to identify NSCLC patients with low muscularity and poor prognosis. We additionally characterized the global secretome and TME immune fraction of these patients by utilizing tumor transcriptomes. Overall, the predictive model determined the best PMA cutoffs to select potentially cachectic patients. We demonstrated that these patients have a pro-inflammatory state characterized by high tumor gene expression of CIFs. Based on computational deconvolution of bulk transcriptomes, we also identified that patients with lower PMA and worse survival show a TME enriched with cytotoxic and exhausted CD8+ T cells.
Methods
Software, tools, and databases are described in Additional file
1.
NSCLC patients
CTs and clinical data were downloaded from The Cancer Imaging Archive (TCIA) database. The NSCLC-Radiogenomics collection contains 211 NSCLC patients [
14,
15] (discovery set). CTs were taken at diagnosis, and patients were treated with surgery. We included 107 patients with primary tumors, presenting whole-body PET-CT before surgery, not subjected to chemo or radiotherapy, with arms raised during CT to avoid positioning bias, and overall survival data. An independent validation set with 36 primary NSCLC patients (all gave their informed consent prior to their inclusion in the study) from the Faculty of Medicine, São Paulo State University (UNESP)—Botucatu, Brazil (2012–2019) was used to confirm the prediction model (approved by the Faculty of Medicine Research Ethics Committee REB#45723921.0.0000.5411). Third-two patients from the validation set had survival data.
CT analyses
We analyzed the PMA using a single axial slice of the CT as previously described [
9]. We analyzed the total skeletal muscle cross-sectional area at the level of the L3 to classify patients with low muscle mass based on pre-established skeletal muscle indexes (SMI; men < 55 cm
2/m
2; women < 39 cm
2/m
2) [
4]. This classification was compared with the one generated by our model. We generated PMA-SMI (cm
2/height
2) to test whether height normalization affects our prediction model.
Cachexia classification model using machine learning
Predictive analysis based on a non-parametric risk prediction model was performed by the Classification And Regression Tree (CART) method using the regression analysis available in "rpart" using RStudio. We performed pruning to generate the tree with minimum × error (prune function; cp = 0.053). We used ten-fold cross-validation to assess the possibility of overfitting. We analyzed the age, tumor stage, PMA, and survival as variables. The imbalance between the two groups was addressed by a balanced resampling using random down-sampling. The Cutoff Finder package was independently applied to select the PMA cutoff that distinguishes patients' survival using R. Area Under the Curve-Receiver Operating Characteristics (AUC-ROC) analysis with a 95% confidence interval (CI) was used via the easyROC web tool to determine the diagnostic value of our PMA cutoffs in predicting low muscle mass. For this comparison, muscularity was defined using L3 SMI cutoffs. EasyROC was also used to test PMA-indexes in determining the LM and HM patients.
Tumor transcriptomic analysis
The tumor RNA-sequencing data (Illumina HiSeq 2500) of 130 subjects from NSCLC Radiogenomics collection were publicly available at GEO datasets (GSE103584). Of the 107 included patients, 46 had tumor transcriptomic data. We compared count data from LM patients (N = 9) with HM patients (N = 37) to perform differential gene expression analysis using the default setting of the BioJupies platform. The differentially expressed genes (DEGs) were selected according to fold change (FC) >|1.5| and
P-value < 0.05. DEGs encoding secreted proteins were filtered based on the Human Protein Atlas database using the majority decision-based method for secreted proteins list of 2943 genes (Additional file
3: Table S1). Principal component analysis (PCA) was performed using the ClustVis webtool.
Enrichment and protein–protein interaction (PPI)
We used EnrichR to perform a functional enrichment analysis of DEGs. We selected the top 10 most enriched biological processes using gene ontology (GO) - 2018. PPI networks were conducted using the STRING tool. We considered experiments, databases, co-expression, neighborhood, and co-occurrence as active interaction sources. Only interactions with the highest confidence (interaction score > 0.9) were included, and disconnected nodes were omitted from the final network. A P-value < 0.01 was considered statistically significant. PPI network data was visualized and annotated using Cytoscape. The network was analyzed using the CytoNCA plugin, which calculated the node's betweenness centrality.
Tumor-muscle crosstalk prediction
We used the consensus list of ligand-receptors generated by Ramilowski et al. [
16] to test if the skeletal muscles express receptors for the tumor-derived factors. We validated the gene expression of each predicted muscle receptor in human primary muscle cells from the FANTOM5 project, 491 muscle tissues from the GTEx portal, muscle tissues from the BioGPS website, muscle tissues from GeneAtlas U133A, and the gcrma human dataset. The alluvial diagram connecting the ligands to their receptors was generated using the SankeyMATIC tool.
Expression profile of secretome genes in malignant cells
To evaluate the abundance of DEGs found in the tumor of LM in malignant lung cells, we used transcriptomic data from the Cancer Cell Line Encyclopedia (CCLE). We selected eight cell lines with higher correlation with tumor tissues using the "TCGA-110CL Cell Line Panel". Considering that lung tumor organoids recapitulate the histology, gene expression, and genomic profile of the original tumor [
17], we included lung tumor-derived organoids transcriptomic data available in the HCMI (Human Cancer Model Initiative) catalog (duplicates of HCM-CSHL-0058-C34). The selected cell lines and organoids are described in Additional file
3: Table S2.
Digital cytometry and scRNA-Seq
To evaluate whether immune infiltrating cells express the DEGs found in the LM, we performed digital cytometry analysis using the CIBERSORTx tool to impute the immune cell fractions of 22 cell types (LM22 matrix signature) from the discovery bulk RNA-Seq data. We applied the default settings and batch-correction. We also performed a gene expression imputation analysis to verify whether the over-expressed genes in LM patients are expressed in the six major leukocyte subsets (B cells, CD4+ T cells, CD8+ T cells, NKT cells, NK cells, and monocytes) of patients with NSCLC derived from scRNA-Seq data [
18]. The gene expression profile (GEP) was estimated by CIBERSORTx group mode analysis. In addition, the specific state of T cells (naive, cytotoxic, and exhausted) was evaluated using Immune Cell Abundance Identifier (ImmuCellAI).
Ligand–receptor interactions from gene expression
We used the computational framework CellPhoneDB to predict cell–cell communication and better understand the interplay of tumor immune cells using its repository of curated ligand–receptor interactions for scRNA-Seq. We used the default setting to select the statistically relevant interaction (
P-value < 0.05) between the NSCLC T cells using the expression profile enriched in LM patients. This analysis was performed by re-analyzing the publicly available scRNA-Seq data from Guo et al. [
19] (GSE99254).
Statistical analyses
Statistical analysis not previously reported was performed using GraphPad Prism® (GraphPad Software, v5.0, 2008 USA). The Mann–Whitney U-test was applied to compare the body composition variables between LM and HM patients (applied for non-parametric data). Chi-square analysis was used for categorical variables. Overall survival analysis was performed using the Kaplan–Meier method, and the log-rank test was used to compare the curves. Differences with a P-value < 0.05 were considered significant.
Discussion
We showed that a lower PMA evaluated in CTs used for NSCLC diagnosis before tumor resection predicts worse outcomes in patients with NSCLC. We also proposed PMA cutoffs for women and men at young and older ages based on ML analysis. Tumor transcriptome demonstrated that LM patients present a cachexia-like profile highly pro-inflammatory and express CIFs. Digital cytometry based on transcriptomic data revealed that TME is enriched with a suppressive immune microenvironment, mainly composed of CD8+ T cells. The prediction of ligand-receptor revealed candidates that may ultimately lead to muscle wasting.
Cancer cachexia classification remains an important clinical challenge. The milestones for the classification of cachexia are weight loss > 5% over the past 6 months or BMI < 20 and any degree of weight loss > 2% or lumbar SMI determined by CT imaging (men < 55 cm
2/m
2; women < 39 cm
2/m
2) [
4]. However, CTs are often taken in the thoracic region of lung cancer patients [
7]. To overcome this limitation, we constructed a cachexia classification model using ML to evaluate clinical parameters routinely available for NSCLC patients. We found that lower PMA can predict worse survival. The PMA (z-scored) cutoff identified in CART was also validated by independent analysis. These results demonstrated that PMA cutoffs are strong predictors of the poor prognosis of NSCLC patients with low muscle mass. Despite not demonstrating an association with worse overall survival, our research group previously demonstrated PMA cutoffs as a potential cachexia predictor in NSCLC patients[
9]. The cutoffs found in this study were PMA < 32.2 cm
2 and < 21 cm
2 for men and women, respectively, similar to those found herein using robust ML. CART was recently used to predict cachexia risk levels in incurable cancer patients with more than 88% of precision and sensitivity [
23]. These results reinforce the importance of using ML approaches applied to clinical features of cancer patients to classify cachexia.
We found a high frequency of smokers in the LM group, which corroborates the literature that indicates an increase in muscle loss in tobacco smokers [
24] and confirms the significant role of nicotine in altering the tumor microenvironment and prognosis, as demonstrated for cachectic patients with pancreatic cancer [
25]. A high risk of tumor recurrence was found in LM patients and agreed with other cachexia studies [
26].
KRAS mutation was frequently found in our LM patients, and evidence suggests
KRAS-mutated tumors are commonly associated with lung cancer cachexia [
27]. Since adenocarcinomas—the tumor with the highest prevalence of the
KRAS mutation [
28]—occur at similar rates in LM (72.7%) and HM (85.7%) groups, we confirmed that
KRAS mutations were associated with muscularity rather than tumor histopathology. Thus, studies investigating differences in the cachexia phenotype of preclinical models inoculated with KRAS-mutated NSCLC cell lines may help to understand the impact of
KRAS mutation in the syndrome. These essential features can help guide cancer cachexia definition towards precision medicine in the future.
Our cachexia classification was validated by the increase of CIFs genes in the tumor of LM patients, such as
LIF,
IL6,
IFNG, and
CCL2 [
10]. We also identified increased expression of genes described in NSCLC patients with lower PMA, such as
NCAM1,
SCG2,
CSF3, and
CCL8 [
9]. To understand how the secretome influences skeletal muscle phenotype and function, we characterized the landscape of cell–cell interaction between tumor and skeletal muscle cells using transcriptomic data from both tissues. This approach has a higher resolution than proteomics and provides insights into cell and tissue communication [
29]. We found receptors for CIFs expressed at high levels in skeletal muscle cells, such as IFNG receptor (IFNGR2) and LIF receptors (LIFR and IL6ST), which can directly trigger atrophy [
1]. The gp130 family members, IL-11, IL-6, OSM, and LIF (upregulated in the LM) potentially impact skeletal muscle wasting by inducing the STAT3 signaling pathway in the preclinical model of cancer cachexia [
30]. These results confirm the importance of these tumor-derived factors in NSCLC cachexia. Previous studies have individually inhibited the classical CIFs described herein [
31]. Although preclinical studies inhibiting CIFs have shown some efficiency, these factors have not provided the same benefit in clinical trials [
31]. Our tumor-muscle interactome revealed a combinatorial action of secreted mediators from cells within the TME that directly acts on muscle wasting. More studies are needed to describe the causality of these ligand–receptors interactions and their combinatorial action in inducing muscle atrophy in NSCLC cachexia.
To investigate the tumor cells responsible for CIFs secretion, we further evaluated the expression profile of neoplastic lung cells. We found that inflammation-related genes are lowly expressed, while genes associated with vessel permeability and T-cell motility, recruitment, and activation are increased [
32]. We identified enrichment in genes associated with T cell apoptosis, specifically the genes
CD274 (encoding PDL1) and
IDO1. The processes involved in tumor immune evasion through IFN-γ (enriched in LM) have been reviewed [
33]. Anti-tumor cells secrete IFN-γ that induces genomic instability and/or immune evasive gene expression signature in malignant cells (including
PDL1 and
IDO1) [
33]. PDL1 is the ligand of the inhibitory receptor PD-1, responsible for CD8 dysfunction in NSCLC [
34]. IDO1 produces local depletion of amino acid tryptophan, contributing to the immunosuppressive microenvironment [
33]. These results corroborate our digital cytometry that demonstrates an increase in CD8+ T cells in TME of LM patients enriched with exhausted cells but also enriched with cytotoxic T cells.
Although cytotoxic CD8+ T cells infiltrated in TME are frequently associated with intense anticancer activity [
35], the CD8+ T cells' signal in tumor tissues is not always related to better outcomes. Using TCGA data, six immune subtypes were identified. Subtype 2 (C2) is characterized by IFN-γ predominance, the highest content of M1 macrophages and CD8+ T cells. Cancer patients presenting subtype C2 showed less favorable survival despite having the highest lymphocyte infiltration [
13]. The authors proposed that these tumors are more aggressive or escape immune recognition by pre-existing remodeling. However, our results raise the possibility that patients with C2 tumors have less favorable survival due to cachexia. In fact, CD8+ T cells induce cachexia and adipose tissue wasting in experimental models of chronic infection via type I IFN [
36].
Considering the immunosuppressive profile of LM tumors and immunotherapy promises to restore T cell cytotoxicity and its anticancer properties, immunotherapy would probably benefit NSCLC cachectic patients. Cachectic patients with NSCLC are resistant to immunotherapy [
37]; thus, restoring the cytotoxicity of CD8+ T cells may be crucial for treatment response. Indeed, clinical trials testing an agent to reverse cachexia combined with an immune checkpoint inhibitor to treat advanced lung cancer patients is relevant since the treatment with ghrelin analog affects T cell development and proliferation [
38]. Therefore, more studies are needed to verify whether immunotherapy combined with CIF-targeted drugs has the potential to reverse cachexia and increase the immunotherapy response.
Strengths and limitations
This study is innovative and valuable for NSCLC cachexia. Here, we integrated imaging, clinical, and transcriptomic data to capture the complex interplay between tumor biology and host response associated with cachexia by applying bioinformatics and ML analyses. The predictive model generated herein was validated in an independent set of NSCLC, demonstrating the great potential of our methodology. However, it has limitations. Patients lack information on other comorbidities that may influence body composition and clinical outcomes. Considering that LM patients had a higher frequency of smokers, it is rational to consider that the skeletal muscle alterations identified are triggered by the combination of smoking and CIFs. However, it is difficult to determine the individual impact of each of these variables. We also failed to apply the golden standard definitions of cachexia, such as weight loss, in those patients. Moreover, even though transcriptomic data have high resolution and produce more robust information than proteomic data, the results should be interpreted carefully considering the complex transcriptional regulation in human cells. Therefore, the results of the secretome and ligand-receptor predictions presented in this study need experimental validation in cachectic patients. Furthermore, the immune TME profile of patients with LM patients must be validated by histological examination to check the correlation between CD8 exhaustion and cachexia in a larger number of samples.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.