Introduction
Gastric cancer (GC) is a major global disease, and it is the fifth most common cancer and the fourth most lethal malignancy. There were more than one million new cases and an estimated 769,000 deaths in 2020 [
1], and more than 40% of the new cases and deaths occurred in China [
2,
3]. In addition, 80% of patients with GC are diagnosed at an advanced stage [
4]. Notably, the 5-year mortality rate for advanced GC is between 30 and 50% [
5]. Overall, the prognosis of GC is not very optimistic, and it is necessary to identify novel biomarkers to reliably predict the survival outcomes of GC patients.
Of the over 160 RNA post-transcriptional regulatory marks in multiple RNA species, N6-methyladenosine (m6A) is the most common form modification on mRNA in higher eukaryotes, and it plays a vital role in RNA splicing, export, stability and translation [
6]. Recently, accumulating studies have revealed that m6A modification is involved in multiple processes of tumorigenesis [
7‐
11], and m6A modification, which is a reversible and dynamic process, is regulated by m6A regulators, including “writers” (methyltransferases), “readers” (signal transducers) and “erasers” (demethylases) [
12]. Writers, including METTL3, METTL16, KIAA1429, WTAP, RBM15, RBM15B, and ZC3H13, mediate the RNA methylation modification process. Erasers include FTO and ALKBH5, and mediate the RNA demethylation process. In addition, signal transducers, including YT521-B homology (YTH) domain family members (YTHDF1, YTHDF2, and YTHDF3), YTH domain-containing proteins (YTHDC1 and YTHDC2), heterogeneous nuclear ribonucleoproteins family members (HNRNP and HNRNPA2B1), and insulin-like growth factor 2 mRNA-binding proteins (IGF2BPs; including IGF2BP1, IGF2BP2, and IGF2BP3), affect the reading of RNA methylation information, translation, stability and degradation of downstream RNAs [
4,
13,
14]. In summary, m6A RNA methylation has a significant impact on RNA production and metabolism and is involved in the pathogenesis of multiple diseases, including GC [
15].
Long non-coding RNAs (lncRNAs) represent the largest group of non-coding RNAs produced from the genome [
16], and they are more than 200 nucleotides in length. Accumulating evidence has revealed that various lncRNAs contribute to gene expression at both the post-transcriptional and transcriptional levels. Additionally, aberrant lncRNA expression is strongly related to multiple cancers [
12,
17] and serves as a diagnostic and prognostic marker for tumours [
18]. Furthermore, lncRNAs can direct the expression of genes related to the activation of immune cells, thus altering the immune microenvironment and further contributing to the malignant phenotypes of some cancers [
17,
19]. m6A-related lncRNAs are potential biomarkers for predicting the overall survival (OS) of lower-grade glioma patients and might be novel therapeutic targets [
12]. However, m6A-related lncRNA signatures in GC patients need further exploration.
Epithelial–mesenchymal transition (EMT) is a process that enables polarized epithelial cells to transition towards a mesenchymal phenotype with increased cellular motility, and EMT occurs in many types of cancers [
20]. In GC, the loss of E-cadherin expression stimulates cell transformation into a more invasive and less differentiated state through the EMT process [
21]. However, the association between m6A-related lncRNAs and EMT factors in GC is not entirely clear.
In the present study, we analysed the value of a m6A-related lncRNA pair signature (m6A-LPS) in predicting the OS of GC patients and further validated the m6A-LPS in the testing dataset and the whole dataset. Notably, m6A-LPS served as an independent prognostic marker for GC independent of other clinical variables. Additionally, we identified differences in the expression of EMT biomarkers and immune cell infiltration between the high-risk and low-risk groups.
Materials and methods
Data collection and preparation, correlation analysis and differential expression analysis
All data, including the RNA-seq reads per kilobase per million (FPKM) data and clinical information of GC samples, were downloaded from The Cancer Genome Atlas (TCGA) database. By using GTF file annotation, mRNAs and lncRNAs were distinguished. m6A-related lncRNAs were defined as those with Pearson correlation coefficient > 0.4 and p < 0.001. Additionally, differential expression analysis of m6A-related lncRNAs between normal and adjacent tissue was performed using the R package limma, including thresholds of |log fold change (FC)| > 1.5 and false discovery rate (FDR) < 0.05.
lncRNA pairs
The differentially expressed m6A-related lncRNAs were cyclically single paired, and a lncRNA pair matrix was constructed. Briefly, if the expression level of the first lncRNA was higher than that of the second lncRNA, the expression was assigned as 1; otherwise, the output was 0. In addition, the lncRNA pair was identified as a valid match when the number of pairs with an expression quantity of 0 or 1 accounted for more than 20% of the total lncRNA pairs.
Construction of a m6A-LPS and evaluation of the relative risk score
First, we utilized univariate survival analysis based on the Kaplan–Meier method with the log-rank test to identify prognostic m6A-related lncRNA pairs, and a
p value < 0.05 was considered to indicate statistical significance. To avoid overfitting, least absolute shrinkage and selection operator (LASSO)-penalized regression analysis was used to construct the best model. The following formula was used to calculate the risk score of each GC patient.
$${\text{m6A}} - {\text{LPS}} = \left( {{\text{Expr}}_{{\text{genepair - 1}}} \times {\text{ Coef}}_{{\text{genepair - 1}}} } \right) + \left( {{\text{Expr}}_{{\text{genepair - 2}}} \times {\text{ Coef }} -_{{\text{genepair - 2}}} } \right) + \, \cdots \, + ({\text{Expr}}_{{\text{genepair - n}}} \, \times \,{\text{Coef}}_{{\text{genepair - n}}} ),$$
where “n” means the total number of lncRNA pairs included in the signature, “Expr” is the matrix value of the lncRNA (either 1 or 0), and “Coef” is the coefficient of the lncRNA pair estimated from the LASSO regression model. All of the GC patients were randomly divided into a training dataset and a testing dataset. Then, the patients were divided into a high-risk group and a low-risk group based on the median risk score. Kaplan–Meier analysis and ROC curve analysis were used to evaluate the OS prediction ability and prognostic accuracy of m6A-LPS in the training dataset, the testing dataset, and the whole dataset. The sensitivity and specificity of m6A-LPS for GC patients was compared with those of other clinicopathological characteristics using ROC curve analysis and decision curve analysis (DCA) [
22].
Validation of the model and predictive nomogram
The chi-square test was used to confirm the relationship between the m6A-LPS and clinicopathological characteristics, and univariate and multivariate Cox regression analyses were used to determine whether the m6A-LPS was an independent prognostic predictor. Kaplan–Meier analysis was used to confirm the predictive value of the risk score in different clinicopathological feature subgroups. Additionally, a nomogram was constructed by integrating the m6A-LPS and clinicopathological features to predict the 1-, 3-, and 5-year OS of GC patients.
Investigation of tumour-infiltrating immune cells
We used CIBERSORT to analyse the relationship between the risk score and immune cells. The relationships were analysed by Spearman correlation analysis, and p < 0.05 was considered to indicate statistical significance. The procedure used the R ggplot 2 package.
Gene set enrichment analysis (GSEA)
GSEA was used to quantify the underlying Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways associated with the m6A-LPS, and p < 0.05 and FDR < 0.05 were used as the criteria to identify significant pathways.
Statistical analysis
All primary data were downloaded from TCGA, and all statistical analyses were performed using R (version 4.0.4) and PERL (version 5.32.1). Survival differences were determined using Kaplan–Meier curve and log-rank test analyses, and the survival curves were plotted with the R package survmine. Multivariate analyses were conducted using the Cox proportional hazard regression model. Clinical data were analysed using the chi-square test or Fisher’s exact test. For all results, a p value < 0.05 was considered to indicate statistical significance.
Discussion
With the rapid development of high-throughput sequencing and bioinformatics analyses, we are entering a new era of biological big data. A tremendous amount of genomic information, including potential biomarkers, can be detected in clinical samples, promoting the diagnosis, prognostication and prediction of diseases [
26]. Genomic signatures are novel biomarkers in which genomic data are combined in a defined manner and have been proven to be able to predict the prognosis of patients with diseases, especially those with malignant tumours [
27]. GC remains one of the most prevalent and deadly cancers worldwide, especially in China. Due to the lack of diagnostic biomarkers, most patients are diagnosed at an advanced stage, and not all patients benefit equally from surgical resection, chemotherapy or chemoradiotherapy because of disease heterogeneity [
28]. In recent years, an increasing number of studies have focused on establishing signatures with both coding genes and noncoding RNAs to evaluate the prognosis of patients with cancer [
29]. Several studies have revealed that m6A-related lncRNAs participate in the development of various cancers, including GC. Thus, exploring the role of lncRNAs in the prognosis and diagnosis of GC will contribute to a better understanding of the molecular mechanism of GC [
13]. However, most prognostic signatures published to date [
13,
23‐
25,
30,
31] require proper standardization of gene expression profile data for further analysis, which is a major limitation in clinical application. In the current study, we employed a strategy considering immune-related gene pairs [
28] and attempted to construct an efficient model with two-lncRNA combinations regardless of exact expression levels [
32]; this strategy not only eliminates batch effects among different platforms but also lacks the need for the normalization and scaling of data, thus successfully solving the problems surrounding the use of different data platforms to determine expression [
33,
34].
First, raw lncRNA data were downloaded from the GC project of TCGA. By performing Pearson correlation coefficient analysis, iteration loop, 0-or-1 matrix, univariate Cox regression and LASSO-penalized regression analyses, we constructed an m6A-LPS (containing 14 m6A-related lncRNA pairs consisting of 25 unique lncRNAs). Based on the median risk score, patients were divided into high-risk and low-risk groups, and Kaplan–Meier curve analysis revealed that the high-risk group had shorter OS. Further ROC analysis revealed that the m6A-LPS had a higher accuracy in predicting the 5-year OS of GC than other traditional clinicopathological features. Moreover, multivariate Cox regression analysis revealed that m6A-LPS was an independent risk factor for GC. Notably, we also compared the accuracy of our model with that of other reported models. The AUC values of the Lv et al. seven-mRNA signature in predicting the 1-, 3-, and 5-year OS were 0.682, 0.603, and 0.630, respectively, and the AUC values of the Liu et al. four-gene signature in predicting the 1-, 3-, and 5-year OS were 0.535, 0.617, and 0.675, respectively. The AUC values of the Mao et al. six-gene signature in predicting the 1-, 3-, and 5-year OS were 0.557, 0.615, and 0.577, respectively, while the AUCs for our m6A-LPS model in predicting the OS at 1, 3, and 5 years were 0.795, 0.818, and 0.882, respectively. All of the above results demonstrate that our m6A-LPS provides efficient and robust prognostic prediction and might serve as an efficient biomarker for the prognosis of GC. In addition, a nomogram based on the m6A-LPS and clinicopathological factors may be applied in the clinical management of GC patients.
Furthermore, the GSEA results showed that patients in the high-risk group mainly showed enrichment of ECM receptor interactions and focal adhesion. Notably, previous studies have demonstrated that the ECM plays a vital role in cancer progression, and focal adhesion kinase (FAK) is often associated with poor clinical outcome, highlighting FAK as a potential determinant of tumour progression and metastasis [
35]. The above results provide new directions for exploring the potential molecular mechanisms of GC.
Moreover, previous studies revealed that tumour-infiltrating immune cells can be used as independent prognostic markers in GC [
36]. Therefore, we used CIBERSORT to explore the relationship between the risk score and tumour-infiltrating immune cells. The results showed that resting memory CD4 T cells, resting dendritic cells, monocytes, and M2 macrophages were positively related to the risk score, while activated memory CD4 T cells were inversely correlated with the risk score. Published studies have shown that increased monocytes and activated memory CD4 T cells are related to the poor prognosis of GC [
37,
38], which is consistent with our research.
Finally, we also analysed the differential expression of EMT biomarkers between the high-risk and low-risk groups because the EMT process is a key molecular step in distant metastasis and is associated with poor prognosis [
39]. The results showed that
N-cadherin and vimentin, biomarkers of mesenchymal cells, were abundantly expressed in the high-risk group of patients. These results may provide new ideas for individualized treatment of GC patients.
Overall, we developed a prognostic model based on 14 m6A-related lncRNA pairs, and only the relative expression of the pairs had to be detected instead of examining specific expression values of every lncRNA, significantly lowering the cost of sequencing and carrying high clinical practicability. Furthermore, the prognostic model showed a robust, high value for predicting the survival of GC. However, this study has several limitations that need to be addressed. First, our prognostic model was constructed based only on TCGA data, and we failed to use other public databases or patient cohorts for further validation. Second, the relationship between m6A regulators and lncRNAs should be further explored in experiments in vitro and in vivo.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.