Introduction
Colorectal cancer (CRC) is one of the most common and prevalent malignant cancers with the third highest incidence frequency and the second highest mortality rate among all cancers worldwide [
1,
2]. In 2022, 151,030 new CRC cases and 52,580 CRC-related deaths were estimated to have occurred in the United States [
3]. Approximately 90% of patients with primary CRC cases at early stage can be cured by surgical resection. However, most patients with CRC are diagnosed at advanced stages with recurrence in distant organs, and thus do not have the opportunity to undergo radical surgery [
4].
Metastasis is the predominant cause of CRC patient death. According to a recent study, 20% of CRC patients who are newly diagnosed have metastatic disease, and 25% of people with localized CRC will eventually develop metastases. Fewer than 20% of metastatic CRC patients survive for five years [
5]. In fact, the lungs are the second most prevalent location of CRC metastasis, accounting for approximately 20–30% of cases [
6]. However, limited therapeutic methods are available due to the lack of understanding in the biology of colorectal lung metastases. Therefore, a better understanding of the molecular mechanism of lung metastatic CRC is urgently needed to improve existing treatments and reduce CRC patients’ mortality.
Previous studies have demonstrated that a number of different molecules participate in the development of CRC metastases. For instance, CXCL12/CXCR4, the chemokine receptor pairs, are thought to be associated with liver metastasis and tumour recurrence in CRC [
7]. CXCR7 activation is thought to promote the spread of CRC cells to the lung instead of the liver [
8]. In addition, some genetic changes, such as WNT pathway activation and RAS mutation, may be linked to an increased proportion of lung metastases [
9,
10]. However, these results are scarcely sufficient to provide a comprehensive picture of CRC lung metastases.
Recently, bioinformatics analyses emerged as an efficient and promising tool to screen significantly aberrantly expressed genes and genetic pathways involved in carcinogenesis, which could provide a rationale to identify potential therapeutic targets cancer and understand a cancer prognosis [
11‐
13]. In particular, many studies utilized integrated microarrays analysis and reported that certain vital genes or pathways potentially are involved in CRC liver metastasis or lymph node metastasis [
12,
13]. However, studies were quite limited in CRC lung metastases. In this study, the GEO2R tool was utilized to identify differentially expressed genes (DEGs) between primary CRC and lung metastatic CRC tissues based on the GSE41258 and GSE68468 profiles. Subsequently, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome Gene Sets analyses were conducted to uncover enriched top biological processes and pathways regulated by the DEGs. The top 10 hub genes related to lung metastasis in CRC and the protein–protein interaction (PPI) network were identified using the search tool Retrieval of Interacting Genes (STRING) and Cytoscape. In addition, the expression and prognostic values of these the hub genes in CRC patients were validated by analyzing the database of The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO). Furthermore,
SFTPD, one of the hub genes specifically upregulated in lung metastatic CRC, was validated to promote cellular proliferation and lung metastasis in CRC in vitro and in vivo. In conclusion, the present study may contribute to identifying key genes and pathways for the diagnosis and prognosis of CRC patients with lung metastases, as well as yield novel and viable therapeutic targets.
Material and methods
Microarray data
The expression datasets GSE41258, GSE68468, GSE35144, GSE12945, GSE17537, GSE29621, GSE17536 and GSE38832 were obtained from the GEO database (
https://www.ncbi.nlm.nih.gov/geo/). GSE41258 dataset includes 378 clinical CRC samples, containing 186 primary CRC and 20 lung metastases. GSE68468 dataset includes 386 clinical CRC samples, containing primary 189 colon tumors and 20 lung metastatic samples.
Identification of DEGs
The GEO2R online tool (
https://www.ncbi.nlm.nih.gov/geo/geo2r/), an interactive web tool, was used to identify DEGs between primary CRC and lung metastatic CRC tissues as previously described [
14]. DEGs were designated based on an adjusted
P value < 0.05 and |log
2 (fold change)|(log
2FC) > 1. Heatmaps of the expression of DEGs were acquired using TBtools. The volcano plot of gene expression was established with Graphpad Prism 8. The Venn diagram was analyzed with a web tool (bioinformatics.psb.ugent.be/webtools/Venn/).
GO/KEGG/ Reactome Gene Sets enrichment analysis
Protein–protein interaction analysis
The STRING web tool (
https://cn.string-db.org) with the default parameters (medium confidence of interaction score) was used to evaluate the potential protein–protein interaction (PPI) relationships among the DEGs. The PPI network was constructed using Cytoscape software (
http://www.cytoscape.org/) and visualized by STRING. The molecular complex detection (MCODE) plug-in in Cytoscape was used to extract the modules of the PPI network with the default settings (the degree cut-off = 2, node score cut-off = 0.2, K-core = 2, and max depth = 100).
Definitions of hub genes
Based on the information from the STRING protein query and degree analysis of the PPI with the cytoHubba plug-in in Cytoscope, we selected the top 10 most dysregulated genes as the hub genes.
Association between expression levels of hub genes and tumour stage in CRC patients
Based on data from The Cancer Genome Atlas (TCGA) database, the UALCAN web tool (
http://ualcan.path.uab.edu/index.html) was used to analyze the correlation between the expression levels of hub genes and the tumour stage of patients with CRC.
Survival analysis in CRC patients
Based on the information from the GEO database, Kaplan–Meier survival analyses for overall survival in CRC patients were performed utilizing Graphpad Prism 8.0. The patients with CRC were divided into two subgroups on the basis of the median expression level of the hub genes.
Human CRC tissue samples
Informed consent was obtained from individuals or individuals’ guardians following to institutional policies and the Declaration of Helsinki principles. And, pairs of primary and lung metastatic CRC tissues or serum were collected from patients at Gannan Medical University's First Affiliated Hospital and subjected to Western blotting or ELISA assay.
Cell culture
MC38 cells were obtained from the American Type Culture Collection (ATCC, Manassas, VA) and maintained in RPMI 1640 containing 10% FBS with 1% penicillin–streptomycin (Solarbio, Beijing, China).
MC38 cells (5 × 105) stably expressing luciferase (MC38-Luc) were injected into the tail vein of C57BL/6 mice. Two weeks later, a single nodule on the lung surface was purified and cultured, which was termed as lung metastatic derivatives (MC38-Luc-LM).
Enzyme-linked immunosorbent assay (ELISA)
As previously described [
16], the protein levels of CCL18 were examined in the cell culture medium of MC38-Luc or serum of patients with CRC using ELISA Kit for mice (Cloud-Clone Corp, MEB522Mu) or humans (BOSTER, EK0686). Each sample was measured in duplicate. The median values were employed for the final statistical analysis.
Western blotting
Cell or tissue lysates were prepared in RIPA buffer containing a protease inhibitor cocktail (Roche, Indianapolis, IN) and separated by SDS-PAGE. The blots were partially cut prior to incubation with antibodies. The following antibodies from Proteintech were used for Western blotting: Clusterin (12,289–1-AP), SFTPD (11,839–1-AP), Osteopontin (22,952–1-AP), MMP3 (17,873–1-AP), APOE (18,254–1-AP), Biglycan (16,409-AP-1) and β-actin (66,009–1-Ig).
Generation of stable cell lines
The construct encoding mouse SFTPD was cloned into the pTSB-Flag-puro lentiviral vector. Viral supernatants were harvested at 48 and 72 h after transfection with 293 T cells utilizing pCMV-dR8.2 and pCMV-VSVG. MC38-Luc cells were infected with lentiviral supernatants and selected with 1.0 µg/mL puromycin for 5 days to generate stable cell lines.
Cell Proliferation Assay
For the cell proliferation assay, stable SFTPD-overexpressing cells were seeded in 24-well plates (1 × 104 cells per well). Cell numbers in triple wells were counted with trypan blue staining daily for 6 days.
Anchorage-independent growth assay
A two-layer soft agar system was used to evaluate the colony formation ability of SFTPD-overexpressing CRC cells according to a previous study [
17]. In brief, RPMI 1640 growth medium supplemented with 1% agar and 10% FBS were employed for the first layer, and 10, 000 cells contained in RPMI 1640 medium with 0.5% agar and 10% FBS were used for the second layer. After incubation for ten to fourteen days at 37 °C in a humidified incubator, the colonies (containing more than 50 cells) were counted using an inverted phase-contrast microscope.
Wound-healing scratch assay
Stable SFTPD-overexpressing MC38-Luc cells (8 × 105 cells/well) were plated into 6-well plates. After the cells reached 100% confluence, a straight wound was created using a 200 μL pipette tip. Then PBS was used to remove the debris and replaced with 1640 medium containing 1% FBS. Images at the indicated times were photographed at 0, 12, 24 and 48 h with a phase contrast microscope.
Migration and invasion assays
For the migration assay, stable SFTPD-overexpressing MC38-Luc cells were resuspended in FBS-free 1640 medium and seeded into the Transwell inserts (Corning, NY, USA) without Matrigel (Corning, NY, USA). For the invasion assay, cells were resuspended in FBS-free 1640 medium and seeded into the Transwell inserts precoated with 10% Matrigel. Migrated or invaded cells were fixed with 4% paraformaldehyde and stained in 0.1% crystal violet for 10 min after incubation for 24 or 48 h. Three random fields of cells were photographed and counted.
MC38-Luc cells (5 × 10
5) were injected into the tail vein of male C56 BL/6 mice aged four-six weeks (GemPharmatech, Jiangsu, China), five mice each group. The in vivo bioluminescence imaging (BLI) was used to examine photon flux in the lung zone of mice. At the end of the experiments, mice were scarified and lungs were resected for BLI, followed by Bouin’s solution fixation for 7 days. H&E staining was conducted as previously reported [
18].
Quantitative real-time PCR (qRT-PCR)
According to the manufacturer’s instructions, Trizol (TransGen Biotch, Beijing, China) was used to extract the total RNA from CRC cells or lung tissues. Subsequently, one-step RT Kit (Thermo Fisher, Shanghai) was used for RNA reverse-transcribed into cDNA. The qRT-PCR reaction was conducted using a BioRAD Real-Time PCR System (Hercules, CA, USA). The qRT-PCR primers are listed in Table S
1.
Statistical analysis
The data analyses were managed using GraphPad Prism software and presented as the means ± SD. Before comparison for significant differences, the normality test was conducted. For normally distributed data, two-tailed Student’s t test was used for two-group comparisons and one-way ANOVA, post hoc intergroup comparison was used for comparisons of multiple groups. For non-normally distributed data, Wilcoxon signed-rank test was used for two-group comparisons and the Friedman test was used for comparisons of multiple groups. The log-rank test was used for Kaplan–Meier survival analysis. A P value < 0.05 was considered statistically significant.
Discussion
Understanding the underlying molecular mechanisms of CRC lung metastasis would greatly benefit diagnosis, management and prognosis evaluation. In the present study, we identified 57 highly expressed DEGs and 18 poorly expressed DEGs between primary CRC samples and lung metastatic CRC samples by analyzing microarrays in the GEO database. The enrichment of these deregulated genes revealed that core pathways and hub genes could lead to new insights into CRC lung metastasis.
As suggested by GO analysis, the dysregulated genes were mainly enriched in inflammatory response, chemotaxis, chemokine activity, immune response, immunoglobulin receptor binding, antigen binding, cell adhesion and positive regulation of ERK1/2 cascade. This is plausible since inflammatory responses are important in the progression of cancer, including tumour initiation and metastasis. As main inflammatory mediators, chemokine activity, chemotaxis or aberrant immune reactions are critical tumourigenic signals of CRC [
20,
21]. Cell adhesion molecules play a significant role in cell permeability, polarity and migration, which are the vital steps in CRC progression and metastasis [
22]. In addition, it has been reported that the dysfunction of ERK/MAPK pathway is a crucial trigger for the progression of most cancers [
23]. Moreover, the DEGs were also found to be enriched in the formation of ECM, extracellular exosomes, extracellular space and so on, indicating that the interaction with the extracellular environment could be triggered during the CRC lung metastatic process.
The KEGG and Reactome Gene Sets analyses of DEGs and module analyses of the PPI network suggested that surfactant metabolism, phagosomes, cell–cell communication, and ECM organization may be involved in CRC lung metastasis, except for cell adhesion molecules and chemokine signaling pathways, which have already been found in GO enrichment. To date, no direct evidence of the role of surfactant metabolism in CRC lung metastases has been presented. Several studies reported the interaction between cancer metastasis and surfactant metabolism. It has been demonstrated that pulmonary and extra-pulmonary existence of surfactant proteins play important roles in film stabilization, viral defense and modulation of immune responses [
24]. In the current study, the expression of SFTPB, SFTPC, SFTPD, and ABCA3, which are involved in the production, function, and metabolism of surfactant [
25], was shown to be highly expressed in CRC lung metastases, suggesting that they may promote CRC lung metastasis. Phagosomes are dynamic organelles generated within cells by the uptake of particles larger than 0.5 μm, which are essential for pathogen eradication and antigen presentation in the process of innate and adaptive immunity [
26]. Emerging evidence highlights the effect of immune microenvironment on colorectal metastasis [
27]. This implies that DEGs associated with phagosomes formation and maturation might participate in CRC lung metastasis by influencing immunity.
Cell–cell communication is crucial for several biological events, including cell fate determination, proliferation, migration, and homeostasis. It has been well recognized that cell–cell communication between tumour microenvironments (e.g., stromal fibroblasts, epithelial cells, and multiple immune cell-types) and cancer cells drives CRC metastasis [
28,
29]. ECM consists of various molecules, such as laminin, collagen, elastin and fibronectin, and plays a central role in tumour initiation, progression, and metastasis. Cross-talk between the ECM and CRC metastasis has been well clarified in the previous report [
30]. Dysregulated ECM-related proteins induce both biochemical and biomechanical changes to promote cancer metastasis [
18]. Herein, the upregulated expression of MGP, Biglycan, LTBP2 and PRELP may facilitate the interactions between CRC cells and ECM, and therefore promote cellular survival and colonization in CRC lung metastases. The enriched pathways modulated by DEGs in this study could provide some rationales for developing novel therapeutic targets in the treatment of CRC.
Of importance, the top 10 hub genes were identified in CRC lung metastases, including 8 upregulated genes and 2 downregulated genes. We validated the transcriptional expression of the hub genes in numerous primary and metastatic CRC cases in the GEO database. The expression of these hub genes was in accordance with data obtained through bioinformatics analysis in GSE41258 and GSE68468. The prognostic values of these hub genes were further analyzed in the TCGA and GEO database. High expression levels of CLU, SFTPD, CCL18, SPP1, APOE and BGN were positively associated with poor overall survival of CRC patients and low expression of MMP3 was associated with longer overall survival. Therefore, we hypothesized that CLU, SFTPD, CCL18, SPP1, APOE, BGN and MMP3 might be candidate biomarkers in CRC lung metastasis. To test this hypothesis, we examined protein levels of the seven genes in primary and highly lung metastatic MC38 cells, and paired CRC primary and lung metastatic tissues. Consistently, the protein expression levels of Clusterin, SFTPD, CCL18, Osteopontin, APOE, and Biglycan were significantly higher, and MMP3 was lower in lung metastatic CRC cells or tissues than in primary CRC cells or tissues.
Among seven core genes, the expression levels of
SPP1, APOE, and
BGN were found to be upregulated, while
MMP3 was downregulated in CRC lung metastases compared with primary CRC. Indeed, several studies have demonstrated that
SPP1, APOE, and
BGN could be involved in the CRC malignant phenotype [
31‐
33].
SPP1, encoding by Osteopontin, is an ECM protein which is reported to be overexpressed in a variety of malignancies such as ovarian cancer, breast cancer and CRC [
31,
34,
35]. Osteopontin has been reported to boost the abilities of cell survival, migration, and angiogenesis to drive tumourgenesis and metastasis in CRC [
31].
APOE, encoding Apolipoprotein E (APOE), is critical for lipoprotein metabolism [
36]. Recent studies have demonstrated that APOE also contributes to DNA synthesis, cell proliferation, angiogenesis, and metastasis to facilitate tumorigenesis and progression [
37]. Similar to previous reports that APOE was increased in CRC liver metastases [
32], we found that APOE was elevated in CRC lung metastases and was positively associated with advanced stages and poor overall survival in CRC.
BGN encodes Biglycan, which is a widely expressed ECM protein that provides stability and organization in tissues by interacting with other ECM proteins such as collagen and elastin [
38]. Biglycan has been reported to trigger the activation of several pathways involved in tumorigenesis by orchestrating growth factors/cytokines and cell surface receptors [
39]. In CRC, high level of Biglycan has been linked with metastatic progression, poor prognosis [
33]. MMP3, also commonly known as matrix metallopeptidase 3, is encoded by
MMP3 and belongs to a group of zinc-dependent proteolytic enzymes. Moran et al. reported that MMP3 expression was lower in CRC patients with high microsatellite instability (MSI) when compared with low or null MSI [
40]. However, compelling evidence has shown that MMP3 promotes cancer invasion and metastasis by cleaving E-cadherin and disrupting its interaction with β-catenin [
41,
42]. Some studies reporting that MMP3 exhibits anti-tumour activities depending in a substrate-depend manners [
43,
44]. For instance, MMP3-mediated cleavage of IGF-BP3 and IGF-BP5 inhibits tumorigenesis in breast cancer [
44]. Herein, MMP3 was shown to hamper CRC lung metastasis with unknown substrates, which needs further investigation.
Since retrospective clinical data reveal that 24.5% of metastatic CRC patients first develop lung metastases and lung metastases account for 32.9% of all metastatic CRCs [
4], we focused on CRC lung metastasis in the present study. Here, we found that the expression levels of
CLU,
CCL18, and
SFTPD were especially upregulated in CRC lung metastases instead of other metastases, and were positively associated with poor prognosis of CRC patients. Clusterin encoded by
CLU, functions as a stress-activated molecular chaperone that is highly expressed in aggressive cancers by modulating different signaling networks [
45]. It plays important roles in the regulation of protein homeostasis, pro-survival signaling and transcriptional networks [
46]. Studies have demonstrated that high Clusterin expression is associated with a shorter survival time and that could be the biomarker for CRC patients [
47,
48]. Therefore, targeting Clusterin might be a promising approach for the management of CRC.
CCL18 encodes CC chemokine ligand 18 (CCL18), which is mainly expressed by macrophages and dendritic cells. CCL18 has been implicated in the stimulation of angiogenesis as well as cancer cell migration, invasion, and epithelial-to-mesenchymal transition. Recent studies have demonstrated that high expression of CCL18 in CRC patients is correlated with advanced tumour staging and liver metastasis [
49,
50], which is similar to our findings in lung metastasis of CRC. Surfactant protein D (also known as SFTPD or SP-D), encoded by the
SFTPD gene, is a collagenous glycoprotein that resides in the lungs and extra-pulmonary tissues [
51]. To date, only one study has reported that SFTPD is negatively associated with pulmonary metastases in CRC [
52]. However, our in vitro and in vivo results showed that SFTPD promotes cellular proliferation, migration, and invasion and further enhanced CRC cell lung metastasis
. This inconsistent finding could be due to different cellular contexts and animal models.
In the current study, we highlighted that CLU, SFTPD, and CCL18 might serve as potential targets for the treatment of CRC lung metastasis. The effect of SFTPD on CRC lung metastasis was investigated through in vitro and in vivo experiments. Further investigation is warranted, especially to determine the precise mechanisms underlying the effect of these hub genes on CRC.