Background
Colon cancer is the third most common type of malignant tumor, which affects millions of people worldwide [
1]. Despite significant advances that have been made for the treatment of colon cancer, its morbidity is rapidly increasing and its 5-year survival rate is low [
2,
3]. Accordingly, to better the prognosis of colon cancer patients, it is essential and urgent to identify new indicators for the prognosis evaluation and targeted therapy of colon cancer.
The treatment of colon cancer has evolved to include not only the traditional methods of surgery, chemotherapy, and radiotherapy, but the rapidly developing immunotherapy [
4]. It was also found that reduced immune cytotoxicity [
5] and lack of T-cell infiltration [
6] predict adverse outcomes in patients with colorectal carcinoma. Although immunotherapy has been reported to be effective in colon cancer with microsatellite instability [
4], in contrast to other tumor types, inhibitors of PD-1/−L1 or CTLA 4 have not yet shown relevant efficacy in unselected colorectal cancer [
7]. Also, because of the high heterogeneity of colon cancer [
8], the prognosis may be considerably different between patients with similar clinical characteristics. Thus, it is essential to identify a multiple molecular model reflecting the sensitivity of patients to immunotherapy so that personalized treatments for colon cancer can be achieved.
In recent years, the development of high-throughput gene detection technology provides molecular markers for prognosis prediction and personalized treatment of colon cancer [
9,
10]. However, as we know, none of these signatures were constructed based on multiple immune genes. Therefore, in the present study, we develop and validate a reliable prognostic model of colon cancer using differentially expressed (DE) immune genes, and verified the clinical utility of this model in colon cancer patients.
Methods
Database download
Transcriptomic data and clinical data were downloaded from The Cancer Genome Atlas (TCGA) database. Immune genes and Immune infiltrate data were downloaded from the ImmPort database (
www.immport.org) and Tumor Immune Estimation Resource (TIMER) (
http://cistrome.org/TIMER) [
11], respectively.
Identification of DE genes
The Wilcoxon signed-rank test was used to conduct differential analysis. Benjamini and Hochberg’s algorithm was applied to control the false discovery rate (FDR). Log2(fold change [FC]) > 1 and FDR < 0.05 were set as the cut-offs. Pheatmap package and gplots package was used to make heatmap and volcano map.
Identification of DE immune genes
Based on the identified DE genes and immune gene list, the DE immune genes were detected using R software (v3.5.3). The pheatmap package and gplots package was used to make heatmap and volcano map.
Function and pathway analysis of DE immune genes
The
org.Hs.eg.db package and clusterProfiler package was used to conduct gene ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. GO terms and KEGG terms were identified as significantly enriched when p.adjust < 0.05.
Construction of the prognostic risk model
Based on DE immune genes in the training cohort, univariate analysis was performed to identify significant DE immune genes when
p < 0.05. Then, Lasso regression was performed to eliminate genes that might overfit the model. Lastly, we applied multivariate analysis to identify the optimal prognostic immune genes for the model. The risk score was calculated based on a linear combination of the Cox coefficient and gene expression. The following calculation formula was used for the analysis:
$$ \mathrm{Risk}\ \mathrm{score}={\sum}_{\mathrm{i}=1}^{\mathrm{N}}\left(\mathrm{Expi}\ast \mathrm{Coei}\right) $$
N, Expi, and Coei represented gene number, level of gene expression, and coefficient value, respectively. The median was set as the cutoff value to divided all colon cancer patients into high-risk and low-risk groups. A high-risk score shows poor survival for colon cancer patients. Survival package and survminer package were used to conduct survival analysis. Time-dependent receiver operating characteristic (ROC) analysis for overall survival (OS) was used to evaluate the accuracy of the prognostic model. The survivalROC package was used to conduct a ROC analysis. An area under the ROC (AUC) > 0.60 was treated as an acceptable prediction value, and an AUC > 0.75 was considered as excellent for predictions [
12,
13]. Risk score distribution plots, survival status scatter plots, and heatmap between the low-risk and high-risk groups were also applied to evaluate the model.
Validation of the prognostic risk model
We used the testing cohort and the entire TCGA cohort to verify the accuracy of the prognostic risk model. Survival analysis and time-dependent ROC analysis were used to validate the model. Risk score distribution plots, survival status scatter plots, and heatmap was also used to evaluate the model.
Independent prognostic value of the model in the entire cohort
To assess the prognostic value of the immune gene risk model, we applied both univariate and multivariate analyses of prognostic factors using Cox proportional hazards regression. Age, pathological stage, T, M, and N were treated as continuous variables. Gender was coded as female (0) or male (1). Factors in which p < 0.05 based on both univariate and multivariate analyses were identified as independent prognostic variables.
Clinical utility of the model
To evaluate the prediction ability of the model in colon cancer patients, we assessed the relationships between our model (level of risk genes and the risk score) and the clinical features (age, gender, pathological stage, T, M, and N) in the entire cohort. Patients were separately divided into two groups according to age (> = 70 and < 70 years old), gender (female and male), stage (stage I&II and stage III&IV), T (T1–2 and T3–4), M (M0 and M1), and N (N0 and N1–3). Differences between the two groups were assessed with independent t-tests.
Correlation between the model and immune cell infiltration
To understand whether the model could reflect the status of the tumor immune microenvironment in colon cancer patients, we evaluated the correlation between the risk score of the model and immune cell infiltration in the entire TCGA cohort. Pearson correlation coefficient test was used to estimate the relationship between the risk score of the model and the content of different types of immune cells.
Discussion
Colon cancer is one of the most common carcinomas worldwide, responsible for about 1,100,000 new cases and 550,000 deaths in 2018 [
14]. Several studies have reported the role of the immune gene in the initiation and progression of carcinoma [
15,
16]. In the current study, we established and validated a prognostic model based on five DE immune genes, which could be used as an independent prognostic variable. We found that this model could provide more accurate predictive value than the pathological stage and age in predicting OS at five years. Additionally, the immune gene model could reflect the tumor immune microenvironment according to the correlation analysis between the model and immune cell infiltration. Besides, we conducted an enrichment analysis of function and pathway of the DE immune genes, which might provide a reference for further basic research in colon cancer.
In the current study, we developed a prognostic risk model based on five DE immune genes, named LBP, TFR2, UCN, UTS2, and MC1R. Firstly, this model was constructed by five DE immune genes between colon cancer and normal tissues. These DE immune genes might reflect the progression of colon cancer, which could contribute to the early diagnosis of colon cancer. Secondly, multiple algorithms were applied for model selection, and the prediction value of the model had also been confirmed, which proved the accuracy and dependability of the prognostic model. Besides, these DE immune genes may have great promise to be novel molecular targets in immunotherapy. LBP, as a pattern recognition protein, can activate the cell to produce cytokines when faced with various microbial ligands [
17]. Serum LBP was also proved to be a useful prognostic parameter for breast cancer patients after radiation therapy [
18]. TFR2, which play a crucial role in the regulation of iron homeostasis, was found high expression in human colon cancer cell [
19,
20]. UCNs are corticotropin-releasing factor-related peptides, regulating gastrointestinal motor and visceral pain during stress [
21]. UTS2 was recently used as a new drug target towards colon cancer cells [
22]. Individuals carrying MC1R variants are associated with a higher risk of melanoma, and MC1R had been used as an intervention target for melanoma [
23].
To assess the prediction capability of the model, we analyzed several clinical variables as well as the risk score. Age, the pathological stage, and the risk score were identified as independent prognostic variables. Age is a prominent risk factor for multiple tumors including colorectal cancer [
24], which was in line with the results predicted by the model. Further comparison showed that the predicted value of the model is better than age and the pathological stage. Thus, our model showed a high prediction ability. To evaluate the clinical applicability of the model, we analyzed the relationships between factors in the model and certain clinical variables. We found that higher gene expression of the immune genes in the model was highly correlated with higher pathological stage, which, on the other hand, verifies the reliability of our prognostic model. The previous study has reported that immune infiltration is vital in response to treatment and prognosis of colon cancer [
25]. Galon et al. [
6] reported that individual immune cell markers have prognostic impacts on patients who have colon cancer. It was reported that the inhibition of CD8
+ T cells was associated with enhanced tumor progression, and mesenchymal stromal cells PD-L1 could promote colon cancer by inhibiting the antitumor immune responses of CD8
+ T cell [
26]. In the present study, we discovered that the risk score was negatively related to the infiltration of CD8+ T cells. These results might also confirm that our model was reliable in predicting the prognosis of colon cancer.
In the present study, we conducted an enrichment analysis of the DE immune genes. Leukocyte migration, cell chemotaxis, extracellular matrix, and receptor regulatory activity were enriched GO terms. Solid tumor sample from TCGA comprises both tumor and other cells, among which immune cell play vital roles in the development of the tumor. Chu et al. [
27] reported that leukocyte migration is a natural process, from blood to tissue through the vascular barrier, to deal with the invasion of pathogens, which might reflect the status of tumor microenvironment. The distinct biochemical and biophysical properties of extracellular matrix can influence cell phenotype, and the dysregulation of extracellular matrix dynamics leads to the development of cancer [
28]. KEGG analysis found that cytokine-cytokine receptor interaction, chemokine signaling pathway, and MAPK signaling pathway were significant pathways. Cytokine-cytokine receptor interaction was related to the viability of colon cancer cell lines [
29]. Besides, previous studies have demonstrated that abnormally activated MAPK pathway is closely associated with growth and metastasis of colon cancer [
30,
31]. These significantly enriched functions and pathways may provide a reference for further basic experiments.
The current study has several advantages. Firstly, we constructed a prognostic model of colon cancer based on DE immune genes for the first time. Secondly, we created the model using various statistical methods and validated the model using the testing cohort and the entire cohort. Thus, the prognostic risk model for colon cancer patients was accurate and reliable. Thirdly, the risk score model could be used as an independent prognostic index, which was more accurate than the pathological stage and age in predicting OS. Finally, our model could also be used to predict immune cell infiltration in the progression of colon cancer.
The present study has limitations. Firstly, we developed the prognostic risk model based on public databases, which was not verified by prospective clinical trials. Additionally, the underlying mechanisms of how the detected DE immune genes impact the progress of colon cancer require further study by basic experiments.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.