Introduction

Globally, colon cancer (CC) is one of the most common cancers worldwide and the major causes of cancer-related mortality1,2. For resectable CC, surgery combined with systematic lymph node dissection is considered as the primary treatment3. Although many prognostic markers have been identified to date, tumor stage is the most widely used prognostic factor4. The American Joint Commission on Cancer (AJCC) tumor-node-metastasis (TNM) classification, which is based on the depth of tumor invasion of the intestinal wall and the number of positive lymph nodes, is the most important factor in determining prognosis and subsequent therapeutic methods.

In recent years, the number of lymph nodes examined (LNE) for pN0 CC patients has attracted substantial attention due to its unique prognostic value5. Studies have shown that the greater the number of LNE, the better the disease-free survival (DFS) and overall survival (OS), especially in pN0 patients6,7,8. LNE is an independent risk factor for survival in patients with CC. Moreover, the LNE is an important indicator to ensure accurate staging of lymph nodes because it helps to assess the extent of lymph node involvement9,10. The National Comprehensive Cancer Network (NCCN) guidelines recommend that at least 12 lymph nodes need to be dissected intraoperatively for CC patients to effectively assess postoperative pathological staging11. In recent clinical practice, about 30–50% of CC patients still have inadequate lymph node dissection12,13.

However, the prognostic stratification for CC patients with negative node metastasis diseases has been only determined by T stage, regardless of the nodal information. In other words, the conventional staging system might be inappropriate for pN0 patients and the number of LNE could be taken into consideration to better stratify patients with different prognosis. Therefore, this study used data from the SEER database to determine the optimal stratification of LNE for pN0 CC patients and subsequently, construct a modified stage (mStage) for this special population based on conventional T stage and novel N stage (nN stage). In addition, our departmental data was used to further validated the capability of the mStage.

Methods

Patients

CC cases were collected from the SEER database between January 2010 and December 2015, and treatment data were acquired from SEER custom data via further application.

Inclusion criteria included: (1) The pathological diagnosis was CC without positive lymph nodes and distant metastasis; (2) aged ≥ 18 years old; (3) patients with complete records of cancer-specific survival months and vital status; (4) CC was the only primary malignancy. Exclusion criteria included: (1) patient received neoadjuvant and adjuvant therapy; (2) patients without complete follow-up data; (3) the basic information of the patient is incomplete.

In addition, 445 CC cases from the Second Affiliated Hospital of Harbin Medical University between January 2011 and December 2015 were also enrolled in this research as a validation cohort. The last follow-up was in October 2021. Inclusion and exclusion criteria for validation cohort were the same as those for development cohort (SEER).

Statistical analysis

All the statistical analyses were calculated in statistical software package SPSS 22.0 (IBM Corp, Armonk, NY, USA) and R software (version 3.6.1 https://www.r-proje ct.org/). The clinical characteristics of patients were summarized by number and percentage. In order to obtain the new N stage, the most appropriate cut-off value of LNE for CSS were obtained by X-tile software (version 3.6.1 https://medicine.yale.edu/lab/rimm/research/software/). Cox proportional hazard regression was applied to investigate the relationship between mStage and CSS. Concordance index (C-index) and receiver operating characteristic (ROC) curve were used to determine the efficiency of mStage. Kaplan–Meier curves were generated and analyzed using log-rank tests. The difference was considered statistically significant for a two-sided P < 0.05.

Result

Patient characteristics

According to the screening criteria, 39,637 patients from the SEER database (development cohort) and 455 patients from the Chinese population (validation cohort) were identified in this study. In the development cohort, female (51.0%), older than 65 years (65.0%), accounted for a higher proportion of patients, while male (60.5%), less than 65 years (54.5%), accounted for a higher proportion of patients in the validation cohort. In all patients, most proportions were found in right colon (64.2% and 50.5%), adenocarcinoma (92.5% and 77.4%), grade I/II (87.8% and 89.75%). The mean number of LNE in the development and validation cohorts was 18.98 ± 9.52 and 16.94 ± 7.77, respectively. The detailed data was summarized in Table 1.

Table 1 Characteristics of patients in the development and validation cohorts.

Construction of the modified TNM stage

In the development cohort, the optimal stratification of LNE for CSS was achieved by the X-tile software and was applied to build the novel N stage (N0a: LNE ≥ 26, N0b: LNE = 11–25 and N0c: LNE ≤ 10) (Table 2) (Fig. 1a,b). Kaplan–Meier survival analysis results showed that there were significant differences in prognosis among the three LNE groups (P < 0.001) (Suppl. Fig. S1). Then, patients were redivided into 15 subgroups by combining the conventional T stage (T1, T2, T3, T4a and T4b) with the nN stage (N0a, N0b and N0c) and the prognosis of these subgroups were further compared (Table 3; Fig. 2). Using T1N0a as a reference, all subgroups were redivided into six modified stages (mStage) based on the 5-year CSS rates and HRs. The mStage include mStageA (T1N0a, T1N0b, T1N0c and T2N0a), mStageB (T2N0b, T2N0c and T3N0a), mStageC (T3N0b), mStageD (T3N0c, T4aN0a and T4bN0a), mStageE (T4aN0b and T4bN0b) and mStageF (T4aN0c, and T4bN0c) (Fig. 3). The 5-year CSS rates for mStageA, B, C, D, E and F were 96.5%, 92.3%, 86.6%, 76.4%, 61.8% and 40.9%, respectively (P < 0.001).

Table 2 Grouping of nN stage.
Figure 1
figure 1

X-tile analysis of CSS in the development cohort. a X-tile plot. b histogram plot.

Table 3 Survival analysis among different subgroups.
Figure 2
figure 2

Kaplan–Meier curves for patients in different subgroups in the development cohort.

Figure 3
figure 3

Modified TNM staging system.

Superiority of the modified TNM staging system

Cox proportional hazard regression model showed that mStage was still an independent prognostic factor of CSS after eliminating confounding factors (Table 4). In addition, mStage was also found to be an independent prognostic factor for OS and CSS excluding those died from other causes (Suppl. Table S1), 2). Figures 4a,b and 5a,b show survival curves stratified by conventional TNM stage and mStage and prognostic stratification using the mStage is much clearer than with conventional TNM stage in the development and validation cohorts.

Table 4 Cox regression analyses of factors related to CSS in the development cohort.
Figure 4
figure 4

Kaplan–Meier curves stratified by mStage (a) and conventional TNM stage (b) in the development cohort.

Figure 5
figure 5

Kaplan–Meier curves stratified by mStage (a) and conventional TNM stage (b) in the validation cohort.

In the development cohort, the C-indices of the mStage and conventional TNM stage were 0.699 (95%CI = 0.695–0704) and 0.678 (95%CI = 0.674–0.682) (P < 0.001), respectively, also indicating the better discrimination ability of the mStage compared with conventional TNM stage. The AUCs of the mStage and TNM stage at 5-year were 0.700 (95%CI = 0.691–0709) and 0.678 (95%CI = 0.670–0687) (P < 0.001) (Fig. 6a), respectively.

Figure 6
figure 6

The AUCs of the mStage and conventional TNM stage. (a) Comparison of the 5-year AUCs in the development cohort. (b) Comparison of the 5-year AUCs in the validation cohort. (c) Comparison of the 3-year AUCs in the development cohort. (d) Comparison of the 3-year AUCs in the validation cohort.

In validation cohort, the C-indices of the mStage and conventional TNM stage were 0.644 (95%CI = 0.632–0.697) and 0.613 (95%CI = 0.587–0.640) (P < 0.001) and the AUCs of the mStage and TNM stage at 5-year were 0.649 (95%CI = 0.579–0.719) and 0.603 (95%CI = 0.546–0.661), respectively (p = 0.018) (Fig. 6b).

In addition, AUCs of the mStage and TNM stage at 3-year were drawn based on the new staging also indicating the better discrimination ability of the mStage in the development and validation cohort (Fig. 6c,d).

What’ s more, the calibration curves for 3-year and 5-year CSS also showed a satisfactory predictive accuracy in the development and validation cohorts (Fig. 7a–d).

Figure 7
figure 7

The calibration curves of the mStage. (a) Calibration curves for 5-year CSS in the development cohort. (b) Calibration curves for 5-year CSS in the validation cohort. (c) Calibration curves for 3-year CSS in the development cohort. (d) Calibration curves for 3-year CSS in the validation cohort.

Discussion

Nowadays, CC is associated with a higher incidence of gastrointestinal cancers and poses a major public health challenge due to its high mortality rate1. The AJCC TNM staging system is the most widely applied system in clinical practice to evaluate the survival status, treatment and prognosis of patients. Among them, N stage was divided mainly according to whether there was lymph node metastasis or the number of positive lymph nodes: N0 (no metastatic LNE), N1 (N1a: 1 metastatic LNE; N1b: 2–3 metastatic LNE; N1c: cancer nodule formation) and N2 (N2a: 4–6 metastatic LNE; N2a ≥ 7 metastatic LNE). It can be seen that there is no further stratification in N0 stage. Hence, pN0 stage patients were only stratified according to the T stage, remains a controversial issue.

At present, the number of LNE has been shown to be an independent prognostic factor in multiple cancer types, especially in CC. Higher LNE has been associated with improved survival of pN0 CC patients but the mechanism of the relationship between the two is unclear6,9,14. Several hypotheses have been proposed. One possible reason is that the greater the number of LNE is associated with a greater chance of a positive node being examined and a more accurate tumor stage15,16. Assessing the number of LNE helps with reducing the likelihood of misclassifying stage III disease as stage I or II and improve prognosis, particularly for pN0 CC patients17,18,19. In addition, an increase in the number of LNE may be an indicator of better treatment, including complete tumor resection and adequate pathological evaluation. Another explanation is that the increase in the number of negative lymph nodes indicates a stronger immune response. Once the immune system detects the presence of tumor cells, local lymph nodes will increase, and more lymph nodes will be easier to be examined in postoperative pathology. Studies have found that LNE are correlated with local neutrophil and lymphocyte infiltration by analyzing the tumor microenvironment5. All the above studies proved the relationship between LNE and prognosis through data analysis, but did not specify the optimal stratification of LNE in pN0 CC patients. In this study, the optimal stratification of LNE for CSS was achieved by the X-tile software (N0a: LNE ≥ 26, N0b: LNE = 11–25 and N0c: LNE ≤ 10) and the Kaplan–Meier survival analysis results showed that there were significant differences in prognosis among the three LNE groups (P < 0.001) that proves that our results are meaningful.

The AJCC 8th TNM classification system recommends a minimum of 12 lymph nodes to effectively assess patient survival benefits. The number of LNE can be used effectively as a marker of surgical and pathological adequacy. But LNE are often influenced by tumor location, tumor size and patient age, and especially by the skill of the surgeon and the diligence of the pathologist12,20,21,22. When the number of LNE is insufficient, the conventional TNM system is used for staging, and patients may be misjudged, especially for those determined as N0 stage cases. The inclusion of the number of LNE in the modified staging system could better stratify patients compared with conventional method to some extent.

In addition, there is a great deal of debate about the number of LNE at least 12. Ning et al. found that the optimal cut-off value of LNE should be 18 in pN0 CC patients23.Therefore, the cut-off value of the number of LNE is still controversial. We urgently need a new and convincing staging system for clinical use.

In this study, the optimal stratification of LNE was achieved by the X-tile software (nN stage: (N0a: LNE ≥ 26, N0b: LNE = 11–25 and N0c: LNE ≤ 10) and there were significant statistical differences between the three groups. Subsequently, a modified TNM stage was constructed based on conventional T stage and nN stage. To make the new system more rational in distinguishing patients with different outcomes, all patients were unified into six modified stages (mStage) according to the HRs and survival curves. The KM CSS curves show that the mStage can better classify patients with similar prognosis than the conventional stage. In addition, the AUC and C-index of mStage were significantly higher than those of conventional TNM staging system in both development and validation cohorts, indicating that the mStage has potential advantages over conventional stage in predicting survival.

There are several innovations in our research. First of all, the selection of LNE cut-off value took into account the patients with insufficient LNE, making the nN stage system more universal. Then, we further analyzed the prognostic interaction between nN stage and conventional T stage and constructed a modified staging system for pN0 CC patients, which showed superior predictive power compared with conventional TNM staging system. Finally, we did validation cohort to make our results more convincing.

This study has several limitations. Firstly, we proposed stratification of LNE for the first time, while there was no consensus on stratification results, which may limit the application and promotion of the mStage system. Secondly, this study is a retrospective analysis, which needs to be further verified by some prospective clinical studies. Thirdly, the sample size of the validation cohort seems to be insufficient, requiring a larger sample analysis to verify the accuracy of the modified staging system in the future.

In conclusion, the mStage system could predict the prognosis of pN0 CC patients and showed superior predictive power compared with conventional TNM staging system.

Ethical approval

This study received ethical approval from the Second Affiliated Hospital of Harbin Medical University. The study used de-identified data and adhered to World Medical Association’s Declaration of Helsinki for Ethical Human Research. SEER is a publicly available database with anonymized data; no ethical review was required.

Informed consent

Informed consent has been obtained from 455 colorectal cancer patients and their families.