Introduction
Remnant gastric cancer (RGC), also known as gastric stump cancer, was initially reported by Balfour in 1922 as a cancer developing in the remnant stomach following previous gastric surgery for peptic ulcer disease (PUD)[
1,
2]. More recently, the definition of RGC has evolved, and it is now described as any cancer occurring in the residual stomach following a previous partial gastrectomy for benign or malignant conditions[
3]. In literature, the incidence of RGC ranges approximately from 1 to 7%[
4‐
8]. Due to the absence of specific symptoms, RGC is often diagnosed at an advanced stage, resulting in low surgical resection rates and poor prognoses, making it an important clinical concern[
4,
5]. The surgical outcomes for RGC vary across studies, with 5-year survival rates ranging from 7 to 80%[
6,
9‐
12].
As the number of gastrectomies continues to rise, the incidence of RGC is escalating annually[
13]. It’s crucial to identify relevant prognostic factors for RGC and develop effective follow-up treatment strategies. In clinical practice, the adjacent gastric mucosa in RGC demonstrates a lower degree of atrophy when compared to cases of primary gastric cancer (GC), which suggests a unique underlying pathological mechanism[
14]. Furthermore, there is a significantly heightened incidence of serosal tumor invasion in RGC, affecting between 37 to 48% of patients, contrasting sharply with the rate of 19% seen in primary GC[
15]. Additionally, surgical procedures for RGC result in a notably smaller total number of harvested lymph nodes compared to those in primary GC, particularly when the preceding surgery was for gastric malignancy, since the nodes would have already been removed. As such, the lymph node grouping applied in the TNM classification system for primary GC may not be suitable for staging RGC[
16]. Moreover, RGC shows a significantly higher overall frequency of splenic hilar lymph node involvement when compared to primary GC. It is worth noting that jejunal mesentery lymph node involvement is predominantly observed following Billroth II reconstruction surgeries[
17,
18].
RGC often exhibits a higher rate of invasion into adjacent organs, and lymph node metastasis is frequently observed[
19], which can lead to a worse prognosis than primary GC[
20]. However, some studies suggest that RGC prognoses are similar to primary GC[
21]. Prior research has investigated the clinical characteristics of resectable RGC in small case studies, but the factors influencing patient outcomes remain unclear or controversial[
22‐
24]. A meta-analysis disclosed that the significance of tumor location on survival varies among studies. Some literature indicates that tumor location does not significantly impact survival rates[
25,
26], while other research reports that anastomotic site tumors may be a favorable prognostic factor[
27]. Nonetheless, patients with anastomotic site tumors experience worse outcomes[
23]. Thus, additional research is necessary to resolve this discrepancy.
Machine learning (ML) constitutes the bedrock of contemporary artificial intelligence advancements[
28]. Although these algorithms have demonstrated substantial triumphs across various disciplines, their integration into the realms of medicine and healthcare is still in its nascent stages. The non-linear nature of real-world data impacts often challenges the effectiveness of traditional models like Linear Regression for classification forecasts and Cox Regression for predicting survival outcomes, as they are confined within a linear framework[
29,
30]. In comparison with traditional mathematical models, ML excels notably in handling tasks related to classification and regression, finding broad application in developing predictive frameworks, determining tumor stages, and prognostic groupings[
31‐
34].
ML can facilitate various problems, from patient-level observations to employing algorithms with numerous variables, seeking combinations, and ultimately reliably predicting risks and outcomes[
35]. Numerous studies have developed valuable models utilizing ML techniques[
36‐
39]. However, there is a dearth of research exploring the application of ML for predicting survival outcomes in RGC patients. Although ML presents significant benefits in constructing models to identify risk factors, the “black-box” nature of ML algorithms poses challenges in explaining why specific predictions are made for patients. In pursuit of these objectives, the SHapley Additive exPlanations (SHAP) methodology has recently been introduced[
40,
41]. The SHAP method allows for the recognition and prioritization of attributes that influence complex classification and activity forecasting utilizing any ML model. Developing a visual predictive model to assist healthcare professionals in identifying individuals with poor prognoses would be advantageous.
Consequently, a central objective of our research was to construct and evaluate ML-based survival prediction models for patients with remnant stomach cancer over a five-year period. This endeavor encompassed not only the development of multiple ML algorithms but also an emphasis on visualizing these models to gain deeper insights into their inner workings. Furthermore, our study aimed to juxtapose the efficacy of these ML models against that of traditional linear regression models, thereby shedding light on the distinctive contributions and potential superiority of ML approaches in forecasting survival probabilities for this patient population. Through visualization, we sought to enhance interpretability and transparency, enabling a comprehensive evaluation and understanding of the complex relationships learned by the ML models in the context of RGC survival prediction.
Discussion
Our research harnessed ML techniques to create a set of ML models skilled at forecasting five-year survival prognoses for RGC following surgery. This is the first investigation to examine prognostic risk factors for RGC utilizing ML models. Through the development and validation of this model, we have showcased its consistent performance and superior reproducibility. Significantly, our risk model not only demonstrates robust stability compared to conventional techniques but also addresses the ‘black box’ issue associated with ML models by incorporating model visualization techniques. By visualizing the model, we enable healthcare professionals to more effectively discern post-surgery survival outcomes. These predictive indicators potentially grant clinicians an enhanced ability to tailor care strategies, thereby optimizing risk factor management for high-risk patients.
The proficiency, user-friendliness, and resilience of ML models in recognizing complex data significantly surpass traditional statistical models, overcoming their limitations regarding statistical efficiency[
49]. In ML models, classes can be utilized for feature selection or dimensionality reduction to enhance the model’s accuracy score or improve its performance on high-dimensional datasets[
52]. Gradient boosted decision trees (GBDTs), including XGBoost, LightGBM, and CatBoost, are potent tools for big data classification tasks. Our method provides not only a precise and clinically feasible technique for predicting RGC patient survival outcomes but also enhances the interpretability of the predictions. The SHAP value quantifies each feature marker’s contribution to the model’s identification results, enabling comprehensive global explanations[
46,
53,
54]. The predictive capacity of a clinical factor in the XGBoost model elevates as the average absolute SHAP value of each factor rises. To obtain a uniform perspective, these factors were consolidated, and SHAP interpretation drew from individual patients. SHAP effectively addresses multicollinearity issues and determines whether an influence is beneficial, thanks to its ability to consider both individual factor effects and their synergies[
41]. According to the SHAP values, LNR, T stage, tumor size, resection margins, perineural invasion, and distant metastasis were determined as the most crucial factors in identifying five-year survival prognoses for RGC. In essence, these factors can be considered an optimal subset representing the key players in survival risk assessment for RGC patients. The interpretability of the optimal subset stems from capturing and visualizing the effect direction of each feature and its contribution size to the prediction. This enables clinicians to gain specific insights into how individual predictions are influenced by various variables, affording a personalized, fine-grained understanding of different patients’ prognoses.
Most reports indicate that RGC is often diagnosed at an advanced stage, leading to a relatively low rate of curative resection and unfavorable prognosis. This suggests that RGC may possess distinct biological characteristics from primary GC[
1,
55,
56]. However, some researchers have compared RGC to primary GC and found no significant difference in survival rates between the two[
57‐
59]. A few studies have investigated the clinicopathologic features and prognosis of RGC, but consensus has not been reached yet[
1,
60,
61]. Similar to prior research[
56,
62,
63], our study noted that more than 80% of RGC patients were male. This may be attributed to the fact that men are more susceptible to developing both gastroduodenal ulcers and GC[
64,
65].
In the majority of studies, RGC lymph node staging adheres to the UICC/AJCC grading criteria. However, in first-time GC patients, postoperative lymph node drainage changes and the lymph nodes detected by RGC cannot comprehensively determine the N stage, particularly given the occurrence of RGC after GC. The total number of postoperative lymph node dissections during re-surgery typically does not exceed 10, which is significantly fewer than the number of lymph nodes dissected by RGC after surgery for benign lesions. This may lead to inaccurate staging. A study analyzed the prognostic significance of LNR in resectable RGC using retrospective propensity score matching and found that LNR served as an independent prognostic factor for RGC, while the number of positive lymph nodes did not act as an independent prognostic factor[
42]. Our study reinforced this notion using an ML method. Therefore, LNR may be a more dependable prognostic factor for RGC patients. However, some studies suggest that LNR is not superior to the number of positive lymph nodes[
66]. Further analysis incorporating data from multiple centers with larger sample sizes is necessary.
Another study identified lymphatic invasion and pathological T stage as risk factors for lymph node metastasis in RGC[
67]. Many researchers have proposed that high rates of adjacent organ invasion and lymph node metastasis contribute to RGC’s poorer prognosis[
19,
20]. Nonetheless, one study found pathological T stage and venous invasion to be significant independent risk factors for survival among RGC patients[
68]; however, pathological N stage showed no significant association with long-term survival[
68]. This contradicts our study’s findings. In our research, venous infiltration was not included in the prognostic model, suggesting it is not an independent prognostic factor, and nerve invasion plays a crucial role. Given their small sample size (65 cases) and single-center retrospective study, the prognostic value of venous infiltration deserves further examination. It has been demonstrated that tumor site affects RGC’s prognosis[
22,
23,
27]. RGC’s tumor location is a vital factor for predicting recurrence patterns and overall survival[
69]. However, in our study, tumor location at the anastomotic site did not act as an independent prognostic factor, which aligns with previous reports[
70,
71].
The current study unavoidably has several limitations. Firstly, due to its retrospective nature, there was selection bias. Secondly, the sample size was relatively small. Thirdly, some crucial information was incomplete or missing, likely caused by difficulties in gathering data about the initial operation. Further prospective studies involving RGC patients are necessary to comprehensively explore the clinicopathological characteristics of RGC.
Given the primary aim of our research to optimize the use of pathological features in predicting mortality risks for post-gastrectomy GC patients, we intentionally confined our analysis to these specific characteristics. Consequently, we did not incorporate other potentially influential mortality risk factors, such as comorbidities, laboratory indices, and other clinical attributes for stratification purposes. This deliberate focus on pathology data alone may have limited the model's ability to achieve its maximum predictive capacity. Nonetheless, this study serves as a foundational step towards refining risk prediction. Moving forward, we plan to extend our work by integrating additional clinical indicators and biomarkers to construct a more refined and comprehensive predictive model. Such a holistic approach will likely enhance the precision and practicality of risk assessment in this patient population.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.