Background
Cervical cancers (CC) are the 3
rd most common cancer in women worldwide [
1], and human papillomavirus (HPV)16 and 18 account for more than 70% of CC [
2]. Clinically, the International Federation of Gynecology and Obstetrics (FIGO) cervical cancer staging system is the most powerful prognostic factor in patients with cervical cancer and useful guidance for treatment [
3,
4]. Stage-I CC cells have grown from the surface of the cervix into deeper tissues, while stage-II cancer is 4 cm or larger, and grows beyond the cervix and uterus, but hasn’t spread to the walls of the pelvis or the lower part of the vagina. Surgical, radio- and chemotherapy or a combination of these therapies are used for the treatment of CC [
5]. Immune checkpoint inhibitor therapy has been recommended as a second-line treatment for PD-L1 positive or MSI-h/dMMR tumours [
6]. However, the immunopathological profiles of these two stages have not been studied clearly, leading to the difficulty in the selection of stage-specific immunotherapy strategy.
Research over the last two decades has demonstrated that both innate and adaptive immune systems participate in the elimination, equilibrium, and escape stages of the immune editing process [
7,
8]. Immune surveillance and subsequent immune editing theory suggest that the immune system can either positively or negatively influence tumour development. Reduced immune recognition, increased tumour cell survival, or development of an immunosuppressive tumour microenvironment (TME) contribute to the tumour escape stage [
9]. The TME establishment is a slow process, and recent studies showed that the early- and late-stage TMEs of multiple cancers have cells with different cell heterogeneity and functions [
10‐
14]. Recent advances in multi-omics and single-cell RNA sequencing (scRNA-seq) techniques have contributed significantly to the characterisation of TME, which have resulted in the discovery of more cell types and their response to therapies [
15‐
18]. As an example, in a TC-1 tumour model, six tumour-associated macrophage populations with distinct genomic signatures were present in the TME, reflecting a complex macrophage development, compared with the traditional M1 and M2 paradigm [
19,
20].
In terms of the TME of CC, a recent scRNA-seq analysis has found the enrichment of PI3K/AKT pathway supported by differentially expressed genes between chemoresistant and chemosensitive patients [
17]. The mutation in
NFKB1 (G430E) was shown to significantly increase mutant allele frequency after radiotherapy, indicating its role as a potential molecular target in CC radiation therapy [
21]. Another study compared the CC and the adjacent normal tissues by scRNA-seq analysis, which discovered a subset of cancer stem cells (CSCs) significantly related to tumour progression; in addition, metabolism-related signalling pathway was enhanced in the endothelial cells of the TME, associated with upregulation of
TAGLN2,
KLF5,
STAT1, and
STAT2 [
15]. Besides, the immune cells and mesenchymal cells of the normal cervix, intraepithelial neoplasia, primary tumour, and metastatic lymph node tissues were comparatively investigated using scRNA-seq, which identified a low and late activated TME in intraepithelial neoplasia, whereas metastatic lymph node showed early activated immune response [
22]. However, the efforts in profiling all cell types of the TME of different stage CC to understand their cellular biology are still limited.
In this study, we comprehensively compared the cell heterogeneity in the TME of stage-I and II CC patients. We performed high-precision single-nucleus RNA sequencing (snRNA-seq) analysis with the surgical tissues isolated from four stage-I and three stage-II CC patients, respectively. We detected, on average, 1,900 genes at a depth of ~ 250,000 reads per nucleus in about 80,000 tumour cells. The tumour cells were clustered into different types according to their transcriptome profiles, and the subtypes of selected immune cells were identified. We then employed label-free quantitative proteomic analysis to compare the overall proteome profiles of the tumour tissues, and their correlation with snRNA-seq analysis was revealed to address the basic questions in cell type and function raised above. We identified the distinct phenotypes in the TMEs of the two stages, and the marker genes specific to stage-II CC included many collagens and a novel transcript AC244205.1. Survival analysis based on The Cancer Genome Atlas data further supported the correlation between the collagen expression and CC patient survival, suggesting that they can serve as candidate targets for tumour therapy of late-stage CC patients. Our work provides novel insights into the molecular characteristics of the progression of CC.
Methods
Patients
Pathological confirmed tumour samples of cervical cancer patients without chemo-, radiotherapy, HBV, EBV and HIV negative underwent surgical operation were collected and stored in liquid nitrogen till use. The patient information is summarised in Table S
1. The human ethnics number for conducting the current research is L2016.
Isolation of tumours and single-nucleus transcriptomics
Freshly obtained CC tissues were immediately processed for nucleus isolation, sequencing, and library preparation, according to the guidelines of 10 × Genomics (10X Genomics, USA). Approximately 500 mg of tumour tissue from each patient was dissociated into a single-nucleus suspension. The tumour tissue was homogenised in ice-cold homogenisation buffer (0.25 M sucrose, 5 mM CaCl2, 3 mM MgAc2, 10 mM Tris–HCl (pH = 8.0), 1 mM DTT, 0.1 mM EDTA, 1 × protease inhibitor (Thermo Scientific, cat no. 78425), and 1 U/μL RiboLock RNase Inhibitor (Thermo Scientific, cat no. O0381)) with pestle strokes. Next, the homogenates were filtered through a 70 μm cell strainer to collect the nuclear fraction in to 15 ml Falcon tube, with a volume about 1 ml. The nuclear fraction was mixed with an equal volume of 50% iodixanol solution (0.16 M sucrose, 1 0 mM NaCl2, 3 mM MgCl2, 10 mM Tris HCl (pH 7.4), 1 U/μl RiboLock R Nase Inhibitor, 1 mM DTT, 0.1 mM PMSF Protease Inhibitor (Thermo Scientific, cat no. 36978)), to a final concentration of 25%, then add 1 mL 33% iodixanol solution to the bottom of the tube, followed by adding on top of a 30% iodixanol solution. This solution was mixed by inverting for 10 times and then centrifuged for 8 min at 500 × g at 4 °C. After the myelin layer was removed from the top of the gradient, the nuclei were collected from the 30% iodixanol interface. The nuclei were resuspended in nuclear wash buffer and resuspension buffer (0.04% bovine serum albumin, 0.2 U/μL RiboLock RNase inhibitor, 500 mM mannitol and 0.1 mM PMSF protease inhibitor in PBS) and pelleted for 5 min at 500 × g and 4 °C. The nuclei were filtered through a 40 μm cell strainer to remove cell debris and large clumps. The nuclear concentration was assessed using trypan blue counterstaining by a Bio-rad TC20. Finally, the nuclear concentration was adjusted to 700–1200 nuclei/μl, and the nuclei were examined with a 10X Chromium platform. Reverse transcription, cDNA amplification and library preparation were performed based on the protocol from the manufacturer.
Raw reads were preprocessed using Cell Ranger (version 3.1.0) with the default parameters and aligned to the pre-mRNA reference (Ensemble_release 108.38,
Homo sapiens). For quality control, cells with UMI counts < 8,000 or a percentage of mitochondrial genes < 10%, and gene counts between 500 and 4,000 per nucleus were retained. Then, the global-scaling normalisation method “LogNormalise” was used to normalise the gene expression measurements for each cell by total expression, multiplied this by a scale factor (10,000 by default), and log-transformed the result with the following formula. Seurat [
23] was used to minimise the effects of batch effect and behavioural conditions, which identified 2,000 highly variable genes in each sample based on a variance stabilising transformation, to generate an integrated expression matrix.
$$\mathrm{gene \ expression \ level}\hspace{0.17em}=\hspace{0.17em}\mathrm{log}10\left(1 + \frac{{UMI}_{A}}{{UMI}_{Total}} \hspace{0.17em}\times \hspace{0.17em}10000\right)$$
After data integration and scaling, principal component analysis (PCA) was used dimensional reduction, and appropriate principal components were selected for clustering and subsequent analysis. The detailed method for clustering cells based on gene expression was described in detail elsewhere [
19,
20]. In brief, a shared-nearest neighbour (SNN) graph was used to draw edges around cells with similar gene expression based on the euclidean distance in PCA space. The edge weights were refined between any two cells according to their Jaccard distance. The modularity optimisation techniques were applied to iteratively group cells together [
24], for the purpose of optimising the standard modularity function.
Differentially expressed gene (DEP)
Expression value of each gene in given cluster were compared against the rest of cells using Wilcoxon rank sum test. Significant upregulated genes were identified using three criteria: (i) the expression of the genes ≥ 1.28-fold in the target cluster relative to other clusters; (ii) the genes were expressed in more than 25% of the cells of the target cluster; and (iii) P-value is < 0.05.
Cell cycle analysis
The Seurat R package was used to assign a cell cycle score to each cell based on the 100 marker genes for G1/S phase, 113 marker genes for S phase, 133 marker genes for G2/M phase, 151 marker genes for M phase and 106 marker genes for M/G1 phase, respectively [
25]. Cells with the highest score less than 0.3 was identified as non-cycling cells [
26].
Protein extraction and trypsin digestion
Tumour tissue samples were transferred into lysis buffer (2% SDS, 7 M urea, 1 mg/mL protease inhibitor cocktail), and homogenised for 3 min on ice using an ultrasonic homogeniser (Sonics & Materials Inc VCX130). The homogenate was centrifuged at 15,000 rpm for 15 min at 4℃, and the supernatant was collected. BCA Protein Assay Kit (ThermoFisher Scientific, MA, US) was used to determine the protein concentration of the supernatant. A total of 50 μg protein extracted from each sample was suspended in 50 μL solution, reduced by adding 1 μL 1 M dithiotreitolat 55 °C for 1 h, alkylated by adding 5 μL 20 mM iodoacetamide in the dark at 37 °C for 1 h. Then the sample was precipitated using 300 μL prechilled acetone at -20 ℃ overnight. The precipitate was washed twice with cold acetone and then resuspended in 50 mM ammonium bicarbonate, followed by digestion with sequence-grade modified trypsin (Promega, Madison, WI) at a substrate/enzyme ratio of 50:1 (w/w) at 37 °C for 16 h.
High pH reverse phase separation
The peptide mixture was resuspended in the buffer A (buffer A: 20 mM ammonium formate in water, pH 10.0, adjusted with ammonium hydroxide), and then fractionated by high pH separation using Ultimate 3000 system (ThermoFisher Scientific, MA, US) connected to a reverse phase column (XBridge C18 column, 4.6 mm × 250 mm, 5 μm (Waters Corporation, MA, USA). High pH separation was performed using a linear gradient, starting from 5% B to 45% B in 40 min (B: 20 mM ammonium formate in 80% acetonitrile (ACN), pH 10.0, adjusted with ammonium hydroxide). The column was re-equilibrated at the initial condition for 15 min. The column flow rate was maintained at 1 mL/min and the column temperature was maintained at 30℃. Twelve fractions were collected and lyophilised.
nano-HPLC–MS/MS analysis
The peptides were re-dissolved in 30 μL solvent A (A: 0.1% formic acid in water) and analysed by online nanospray LC–MS/MS on an Orbitrap Fusion Lumos coupled to EASY-nLC 1200 system (Thermo Fisher Scientific, MA, US). Briefly, 3μL peptide sample was loaded onto the analytical column (Acclaim PepMapC18, 75 μm × 25 cm) with a 120-min gradient, from 5 to 35% B (B: 0.1% formic acid in ACN). The column flow rate was maintained at 200 nL/min with a column temperature of 40 °C. The electrospray voltage of 2 kV versus the inlet of the mass spectrometer was used. The mass spectrometer was run under data-independent acquisition (DIA) mode, and automatically switched between MS and MS/MS mode. The parameters were: (1) MS: scan range (m/z) = 350–1200, resolution = 120,000, AGC target = 1E6 and maximum injection time = 50 ms; (2) HCD-MS/MS: resolution = 30,000, AGC target = 1E6, collision energy = 32 and stepped CE = 5%; (3) DIA was performed with variable isolation window, and each window overlapped 1 m/z, and the window number was set to 60.
Protein identification and quantification
Raw data of DIA were processed and analysed by Spectronaut X (Biognosys AG, Switzerland) with default parameters. The protein database derived from Homo sapiens genome was downloaded from NCBI (March 2021). Retention time prediction type was set to dynamic iRT. Data extraction was determined based on the extensive mass calibration. Q-value (FDR) cutoff on precursor and protein level was applied at 1%. Decoy generation was set to mutate, scrambled with a random number of AA position swamps (min = 2, max = length/2). All selected precursors passing the filters were used for quantification. The average top 3 filtered peptides were used to calculate the major group quantities. Normalisation was performed on averaging the abundance of all peptides. Medians were used for averaging. After Student’s t-test, differently expressed proteins (DEPs) were filtered if their Q-value < 0.05 and absolute AVG log2 ratio > 0.58. Principal component analysis (PCA) and correlation analysis were performed with R package gmodels. The correlation coefficient between two replicas was calculated to evaluate repeatability between samples.
The protein domain and transcription factor (TF) analysis
The prediction of the protein domain used the Pfam_scan program [
27]. The protein sequence was compared with the Pfam database to obtain the relevant annotation information of protein structure. The predicted protein sequences were compared by hmmscan with the TF database (animalTFDB [
28]).
Protein–protein interaction (PPI) analysis
Interactions among significantly regulated proteins were predicted using STRING [
29]. All resources were selected to generate the network and ‘confidence’ was used as the meaning of network edges and the required interaction score of 0.700 was selected for all PPI, to highlight the most confident interactions. Neither the 1
st nor 2
nd shell of the PPI was included in this study. Protein without any interaction with other proteins was excluded from displaying in the network.
Functional annotation and enrichment analysis
DEGs and DEPs were annotated against GO, KEGG and COG/KOG database to obtain their functions. Significant GO functions and pathways were examined within differentially expressed proteins with Q-value < 0.05. The enrichment of the pathways was analysed by Gene Set Enrichment Analysis (GSEA) with
P-value < 0.05 using GSEA v4.1.0 [
30].
Survival analysis
The survival analyses were performed by the Cox proportional hazard model provided that the proportional hazard assumption was met based on weighted residuals using TIMER2.0 [
31]. Hazard ratio was estimated relative to the lowest-risk group and assessed by a two-sided Wald test,
P-value < 0.05 was significant.
Discussion
This study comparatively investigated the cell heterogeneity and functions within the TME of stage-I and II CC patients using snRNA-seq analysis. We found that the CCI patients had MΦs showing proinflammatory function, as indicated by significantly activated IFN-α and IFN-γ response signalling, whereas the MΦs of the CCII patients exhibited the activation of many pathways related to cell growth and tissue development. The CD8+ T cells appeared more activated in the CCI group with a lower population, and the populations of Treg and γδT were largely reduced in the CCII group. Immune response, particularly the MHC class-I pathway, was more pronounced in the DCs and B cells of the CCI group, whereas metabolic and developmental processes were enriched in the CCII group. The proportion of NK cells was reduced more than 60% in the CCII relative to the CCI group, which had the upregulation of many DEGs marking the activation of NK cell function, especially for the mature and CD56bright NK cells. The immune response relevant pathways were enriched in several stem cell types of the CCI group in general, while the cell and tissue growth, as well as metabolic pathways, were highly present in those of the CCII group. In addition, the quantitative proteomics revealed the activation of IFN-α and IFN-γ response signalling in the CCI group, which was accorded with the observation of snRNA-seq analysis.
Significant upregulations of more than 30 collagens were present in different cell types with snRNA-seq analysis. Collagens are major components of the TME and play roles in cancer fibrosis, as well as interacting with receptors, exosomes, and microRNAs to influence tumour cell behaviours [
61]. It has been previously found that
COL1A1 was significantly elevated at both mRNA and protein level in CC tissues relative to normal tissues, and correlated negatively to radiosensitivity [
62].
COL6A1 was suggested as an oncogene in the initiation and progression of CC and a predictor for poor prognosis [
63]. The high expression of
COL14A1 in residual CC after a 50-Gy dose of irradiation was detected by quantitative PCR [
64].
COL7A1 and
COL8A1 were identified bioinformatically as two of five genes associated with the collagen assembly that might be used as a single combinatorial prognostic marker for stage-II CC [
65]. The role of
COL10A1 in CC progression was suggested [
66]. The co-expression of
COL1A1,
COL4A1,
COL5A1,
COL5A2 and
COL7A1 with a potential therapeutic target for CC
P4HA2, was identified [
67]. However, the correlation between the other collagens and CC, and their roles in progression remains largely unclear. The novel transcript
AC244205.1 was found as one of the most upregulated DEGs in many cell types of the CCII group, such as DCs, MΦs, T cells, MSCs, NPCs, and GMPCs, implying it may play a role in the progression of CC. A recent study has revealed that the upregulation of
AC244205.1 was observed in cholangiocarcinoma patients with better overall survival [
68]. However, there is no report about the molecular function of
AC244205.1 in CC, which will be investigated in our future studies.
The expression of DEGs related to mitochondrial respiratory machinery, such as
MT-CYB,
MT-CO2,
MT-CO3,
MT-ND1,
MT-ND2, and
MT-ND4, were elevated in multiple cell types of the CCII group. This implied the activation of cytochromes on the respiratory chain, resulting in high level of mitochondrial biogenesis, which might be due to the significant energy consumption for tumour cell growth at stage-II with respect to stage-I, such as EMT, ‘myogenesis’, ‘hypoxia’, as well as multiple metabolic processes, suggested by the snRNA-seq analysis. A previous study showed that the inhibition of mitochondrial complex III, subsequently affecting mitochondrial respiration by atovaquone, can inhibit the proliferation and induce apoptosis in certain CC cell lines in vitro and in vivo on a mouse model [
69]. The impairment of mitochondrial function via interfering certain signalling pathways, such as mTOR and CaMKII/Parkin/mitophagy, was targeted by several studies to tackle metabolic stress in CC cell lines, to inhibit the cancer cell growth [
70‐
72]. Our study provided more target genes for interfering upregulated mitochondria activity in CC.
Compared with the CCI group, the snRNA-seq analysis found immunosuppression in nearly all immune cells of the CCII group, which led to an overall immunosuppressive TME collectively.
IGLL5 was the DEG upregulated in the MΦs of the CCII group following the collagens and mitochondrial genes. It has been recently shown to be closely correlated with tumour‐infiltrating immune cells, including MΦs, in clear cell renal cell carcinoma based the TCGA data [
73]. The fusion of
IGLL5 was suggested to promote metastasis of the lymph nodes and play a role in breast cancer development [
74], though its role in MΦs in the TME of CC remains unclear.
MMP11 was identified nearly unique to the MΦs of the CCII group. The immunotherapeutic role of
MMP11 in different cancers has been suggested previously [
75‐
77]. Its overexpression was characterised in CC cell lines [
78] and in cervical precursor lesions [
79]. In breast cancer,
MMP11 expression was considered as a biomarker for prognosis [
80], and correlated with a high CD68/(CD3 + CD20) ratio in CD68
+ macrophages [
81], which caused the polarisation of macrophages in the tumour centre, resulting in a higher metastatic phenotype [
82]. This implies a similar process involved
MMP11 might occur in the TME of the CCII group.
Besides,
CCL19,
IL7R,
SPARC,
MGP,
LUM, and
CXCR4 were highly expressed in the MΦs of the CCII group, and in each MΦ subtype. The overexpression of
CCL19 was found in CC cell lines and patient tissues, and its knockdown via siRNA inhibited the proliferation of CC cells in vitro via apoptosis pathway [
83].
SPARC was associated with epithelial-mesenchymal transition and overexpressed in CC patients with poor prognosis [
84], and was highly elevated in the CCII group. Increased MGP was significantly observed in high-grade cervical premalignant lesions with elevated hTERT mRNA expression [
85]. In uterine CC tissues,
LUM was expressed in most cancer cells and stromal fibroblasts, indicating its roles in the growth or invasion of CC [
86]. The CXCL12/CXCR4 chemokine pathway was targeted for improving the therapeutic ratio in patient-derived CC models with radio-chemotherapeutic treatment [
87].
Although there were higher number of CD8
+ T cells in the CCII group, their function was largely suppressed compared with the CCI group. Many interferon-induced genes and receptors were expressed by γδT cells of the CCI group, while the population of γδT cells was very small in the CCII group. γδT cells link the innate with adaptive immunity, to protect the epithelium from trauma and infection, which have been suggested as potential therapeutics against HPV in patients [
88]. The number of γδT cells was negatively associated with the progression of CC [
89]. γδT cells alone were found to inhibit tumour growth, and if combined with galectin-1 antibody, they could provide a more effective immunotherapy for CC [
90]. However, highly expressed HPV16 oncoproteins at the cancer stage induced a reorganisation of the local epithelial-associated γδT cell subpopulations, to promote angiogenesis and cancer development [
91]. Thus, the significant reduction in the number of γδT cells may be associated with the high degree of tumorigenesis in stage-II CC.
The population of NK cells was remarkably reduced in the CCII group, so was the immune response, compared with those of the CCI group, which showed remarkably high activation of ‘INF-γ response’ and ‘allograft rejection’ pathways. The most enriched pathways in the CCII group were associated with cell organisation and tissue development, possibly correlated to more active tumour growth. This was particularly significant for the state-2 NK cells of the NK development, which showed the remarkable activation of EMT pathway (
P-value
≈ 0). This was largely attributed to the upregulation of
FBN1,
FN1,
SERPINE1,
VCAN, and many collagens in the CCII group. Previous study discovered that collagens promote the accumulation of NK cells in Foci of infection near the lymph node capsule [
92]. We thus speculate that the high abundance of collagens in NK cells might be cellular mechanism responding to extremely low NK cell number in the TME of stage-II CC, seeking to recruit more active NK cells.
The proteomic analysis revealed distinct protein profiles in the CCI and CCII groups. The activation of IFN-α and γ response in the CCI group was accorded with the comparative observation in multiple cells, especially immune cell types, by snRNA-seq analysis. The DEPs upregulated in the CCII group were largely associated with extracellular matrix, cell and tissue development, and metabolism, suggesting the higher degree of tumorigenesis. AKT1 was the node DEP with the highest degree of interactions. The signalling of PI3K/AKT/mTOR was found to regulate the virus/host cell crosstalk in HPV-positive CC [
93], and the elevated level of pAKT is associated with radiation resistance in CC [
94]. The inhibition of AKT, subsequently disrupting the signalling with mTOR, induced higher degree of cell death and decreased glucose uptake in CC [
95]. FN1 was the most upregulated DEP interacting with multiple other DEPs of the CCII group, it was shown to promote the tumorigenesis of CC via activating FAK signalling pathway [
96]. FN1 interacts with MPO, which has controversial role in different diseases [
97], and a type of its gene polymorphism leads to reduced anti-tumour activity that may play a role in development of CC [
98]. In terms of DEPs upregulation in the CCI group, there was a big cluster of histones intensively interacting with each other, including H2A/H2B and H4 members, which play important roles in transcription, DNA replication and repair [
99]. It was identified that histone genes can be used as independent prognostic factors for survival prediction among CC patients, including HIST1H4A, HIST1H4E and HIST1H4K that were detected DEPs in this study. This implies that other histones might also be considered as markers, in combination with other DEPs upregulated in the CCI group, such as HSPs and interferon response relevant proteins, for the early detection of CC.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.