Introduction
In the Western world, epithelial ovarian cancer (EOC) is one of the major contributors to gynecological mortalities [
1]. EOC, a heterogeneous tissue consisting of several tumor subtypes, shows different genetic risks, pathophysiology, clinical behaviors, responses to treatment, and prognosis. High-grade serous ovarian cancer (HGSOC) constitutes 60% -70% of all EOC [
2], and the majority of the EOC deaths are caused by HGSOC [
3]. Currently, BRCA1/BRCA2 gene mutation, family history, non-fertility, use of oral contraceptives, fallopian tube ligation, pregnancy, and lactation are seen as risk factors for ovarian cancer [
4]. Tumor resection, platinum, and taxane chemotherapy are common options for treating ovarian cancer [
5]. Since a significant number of HGSOC patients are identified at advanced stages, they have a higher recurrence rate and the 5-year rate of survival for these patients is < 40% [
6,
7]. Identifying non-responders and patients with primary platinum resistance plays a crucial role in achieving a better survival of HGSOC patients [
7]. As a result, it is critical to identify prognostic biomarkers to provide a reference for personalized medicine and improve the prediction of clinical outcomes.
With advances in sequencing technology, it has been possible to explore the molecular mechanisms of disease by mapping the genomes of cancer cases [
8]. Many of the biomarkers and mechanisms have contributed to a deeper understanding of cancer [
9,
10]. Numerous studies have been conducted to develop biomarkers for survival prediction and the long-term prognosis of HGSOC. By analyzing high-throughput gene expression profiles, genetic markers constructed with several to dozens of prognostic genes could effectively predict total survival [
11,
12], reduce the status of the product [
13] and platinum treatments [
14]. For HGSOC patients with extreme chemical reactions, Wisman GBA et al. [
15] applied genome-wide analysis of DNA methylation to construct new HGSOC platin-sensitive epigenetic markers. According to the transcriptome data, Liu L et al. [
16] screened seven genes of new signal prediction based on high IIIc serous ovarian cancer clinical outcome and cisplatin sensitivity. However, there are currently no effective clinical biomarkers for predicting HGSOC patients’ response to treatment. Even with relevant research, there are too many biomarkers identified, and there is a certain operational complexity in clinical application. Thus, identifying genetic signals related to the prognosis of HGSOC by analyzing its biological functions through bioinformatics should be studied.
In the present research, to effectively construct a reliable gene signature for predicting the prognosis of patients with HGSOC, a systematic pipeline was proposed to screen HGSOC-related genetic markers, and gene expression profiles of HGSOC patients were obtained from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases. Screening of prognostic markers was performed combining transcriptome and genomics data, eventually constructing a 2-gene signature. It was found that performance in predicting survival rate was validated by external validation sets and test sets. The current findings revealed that the 2-gene signatures were involved in important pathways and biological processes of HGSOC, indicating that the 2-gene signature can be utilized in the prediction of prognostic risk among HGSOC patients, and provision of baseline information for molecular mechanism comprehension of the prognosis of patients with HGSOC. We provided a prospective scientific basis for prognostic guidance and in-deep exploration of the pathogenesis of HGSOC.
Discussion
Ovarian cancer is an extremely heterogeneous illness and patients with comparable TNM stages of ovarian cancer show different survival outcomes. Currently, demand for early screening to detected and treat ovarian cancer makes it difficult to predict individual outcomes using the conventional clinicopathological indicators, such as portal venous thromboembolism, vascular invasion, size of tumors, and TNM staging, especially in risk stratification, because “one-size-fits-all” treatment strategy has been found to be ineffective [
31,
32]. The identification of prognostic molecular markers indicative of tumor biological characteristics has significance in the prevention and treatment of ovarian tumors. This study examined the expression profiles of the 822 HGSOC samples from five research cohorts of TCGA, ICGC, and GEO. We examined the OS of patients with KM curves in various data queues (Additional file
1: Fig. S1), although the median survival time was different. In general, apart from GSE102073 datasets, the overall survival rates of these datasets were similar. These differences may result from differences in living standards, medical conditions, such as varied follow-up periods. GSE102073 showed the optimal prognosis, while GSE102073 and GSE17260 had the shortest follow-up time. Variations in study cohorts, follow-up time, and environmental differences are always difficult to overcome in multi-data integration analysis, and due to the heterogeneity of tumors, these differences also have a great impact on the generalization ability of the model. Moreover, overfitting problems will also occur when different data sets are combined to form a large data set. Therefore, in this study, GSE102073 was selected as the training set, and the other four data sets served as the external validation set to evaluate the robustness and universality of the 2-gene model.
The functions of prognostic genes were analyzed with the aid of the R package Clusterprofiler to carry out GO and KEGG functional enrichment analysis on these 148 genes. The findings from the KEGG enrichment analysis confirmed that the genes were enriched to biological pathways such as fatty acid degradation, cholesterol metabolism, tyrosine metabolism, and the AMPK signaling pathway (Additional file
2: Fig. S2A). Biological process category, genes were mainly enriched to negative regulation of endopeptidase activity, organic hydroxy compound catabolic process, cholesterol transport, regulation of cholesterol transport, and other GO Terms (Additional file
2: Fig. S2B). Moreover, further study was performed to analyze the difference in the KEGG pathway between Cluster1 and Cluster2. The expression patterns of all the genes obtained in different KEGG pathways were analyzed by GSEA. Cluster1 samples with poor prognosis were significantly activated in the METABOLISM, DRUG METABOLISM CYTOCHROME P450, and Cluster2 samples with favorable prognosis were significantly activated in the CIRCADIAN RHYTHM MAMMAL pathway (Additional file
2: Fig. S2C).
Currently, gene signatures, such as Oncotype DX expressing 21 genes [
33‐
35], and an 18-gene expression signature of coloprint in colon cancer, have been applied in clinical practice [
36‐
38]. Gene expression profiling has evolved as a viable tool of high-throughput molecular identification for the purpose of identifying new prognostic indicators in cancer. Ding Q et al. [
39] developed a 9-gene signature for evaluating the prognosis of patients with ovarian cancer by LASSO to analyze tumor microenvironment-associated genes. Wang R et al. [
29] screened differentially expressed genes to develop a 5-gene signature, which was verified as an independent prognostic factor. Sun H et al. [
40] identified 28 DNA repair genes related to the prognosis of patients with ovarian cancer by performing cluster analysis, univariate analysis, and stepwise regression. Although a variety of prognostic markers have been studied, there is currently a lack of prognostic markers directly available for ovarian cancer in clinical practice. The inclusion of multiple genes will increase detection troubles of a signature, which also proves the applicability and detection convenience of the 2-gene signature in clinical practice.
Tumor heterogeneity is one of the important reasons leading to different clinical outcomes of tumor patients. Therefore, there are molecular differences between different tumor patients than cell lines. LASSO is a dimension reduction method to find a relative optimal solution from high dimension to low dimension. Its principle also involves cross-validation and re-sampling. Therefore, different results will be obtained even if the same data set is used with the same LASSO method (known as “optimal solution” in the optimization method). This study used 100, 200, 500, 1000, 2000, 5000, 10,000 repetitions to perform LASSO regression on 80% of the samples randomly chosen from the training set, and analyzed the frequency of the top 10 genes with the greatest frequency (Additional file
3: Fig. S3). The results demonstrated that AKR1B10 and ANGPT4 genes showed the highest frequency in the seven repetitions.
AKR1B10 and ANGPT4 in our 2-gene signatures were risk factors. AKR1B10 is member B10 of Aldo–Keto Reductase family 1. The glycolysis ability of tumor cells with high-expressed AKR1B10 was reduced. Glucose is a cellular source, and an increase in oxidative utilization of fatty acids will enhance the metastasis and colonization of tumor cells [
41]. AKR1B10 has been identified as a tumor proliferation and metastasis marker in multiple tumors, for example, AKR1B10 expression is predictive of the treatment response of locally advanced stomach cancer [
42], and its expression is associated with poor prognosis and lymph node metastasis. Qi Wang et al. [
43] found that serum expression of AKR1B10 is a diagnostic biomarker, as its expression is significantly up-regulated in patients with lung cancer that has metastasized to the brain, thus, determining the level of AKR1B10 can predict lung cancer patients with brain metastasis. Many experimental studies also proved AKR1B10 role in the pathogenesis of liver cancer, development, and resistance to chemotherapeutic drugs [
44‐
46]. Oral squamous cell carcinoma patients with a high level of AKR1B10 in the saliva are often related to poor prognosis and progression [
47]. AKR1B10 expression is remarkably downregulated in colorectal cancer, and its low expression is highly correlated with the unfavorable prognosis of patients with colorectal cancer [
48]. These findings confirmed that the abnormal expression of AKR1B10 is closely associated with the occurrence and development of tumors. At present, the relationship between AKR1B10 expression and prognosis in HGSOC is rarely reported. The current findings confirmed that high-expressed AKR1B10 was related to a poor prognosis of HGSOC, and we also found that the high expression of ANGPT4, a member of the angiogenin family, led to an unfavorable prognosis of patients with HGSOC, which is consistent with the research conclusion proposed by Qin Yu et al. [
49]. Regarding the expression of AKR1B10 and ANGPT4 genes, a 2-gene signature was established and verified to have the ability to stratify the prognosis of patients in the training set, TCGA test set, and the GEO verification set. GSEA revealed that the 2-gene signature-enriched pathway was strongly correlated with the pathways and biological processes involved in the occurrence and progression of tumors. These findings suggested that this model has clinical utility and can serve as a possible target for clinical patient diagnosis.
Nevertheless, several limitations remained. Firstly, a lack of certain clinical follow-up information excluded the possibility to take factors, including the existence of other health conditions of the patients, into consideration when distinguishing prognostic biomarkers. Secondly, the results acquired from bioinformatics analysis were not fully reliable, necessitating further experimental confirmation. Therefore, experimental and genetic studies involving a larger sample size and experimental verification need to be conducted in the future.
In this study, bioinformatics techniques were employed in this study for the purpose of identifying possible candidate genes for cancer prognosis from large samples. In conclusion, we constructed a 2-gene prognostic stratification system, with a low AUC in the validation and the training sets. The 2-gene signature was independent of clinical manifestations. Gene classifiers can optimize survival risk prediction compared with clinical characteristics. As a result, the adoption of the 2-gene signature as a molecular diagnostic test with a view of determining prognostic risk in patients with HGSOC could be promoted.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.