Introduction

As the early diagnosis rate and systemic treatment of breast cancer are improving, breast cancer surgery is becoming less traumatic and more individualized. The axillary lymph node (ALN) status is one of the most critical prognostic factors in patients with breast cancer. In the last century, sentinel lymph node biopsy (SLNB) has been indicated to be a reliable method in the axilla stage1. Currently, axillary lymph node dissection (ALND) can be safely avoided in breast cancer patients with negative sentinel lymph nodes (SLNs)2. However, ALND remains the standard management strategy when breast cancer patients are determined to have positive SLNs3. In recent years, data from the American College of Surgeons Oncology Group Z0011 (ACOSOG Z0011) and the International Breast Cancer Study Group 23-01 (IBCSG 23-01) trials showed that further ALND did not result in additional benefit in terms of locoregional recurrence (LRR), disease-free survival (DFS) or overall survival (OS) in patients with limited SLN involvement4,5. However, most patients in these studies underwent breast-conserving surgery with whole-breast irradiation, and the ALND group included fewer patients with good prognostic features of a smaller tumour diameter, hormone receptor positivity, a lower number of positive SLNs, and the absence of lymphovascular invasion (LVI). In addition, information regarding the specific status of human epidermal growth factor receptor 2 (HER2) and KI67 and strict quality control of radiotherapy were lacking in these studies. In most developing countries, such as China, ALND is still recommended by most clinicians for patients with positive SLNs for the following reasons: (1) the differences between Eastern and Western populations; (2) a lack of evidence-based medical research in Eastern populations; (3) the uneven distribution of medical resources; and (4) the low rate of breast-conserving surgery (BCR) in China6. However, ALND is associated with increased morbidity, such as lymphoedema, paraesthesia and decreased mobility7. Moreover, clinical evidence has shown that only 40% of patients with positive SLNs have further axillary involvement, highlighting the importance of a prediction model to evaluate the risk of non-sentinel lymph node (NSLN) metastasis in breast cancer patients with 1–2 positive SLNs8. The Memorial Sloan Kettering Cancer Center (MSKCC) nomogram proposed by Van Zee et al. is the first model to predict NSLN metastasis in patients with positive SLNs. This nomogram contains the following nine independent variables: frozen section performed, pathological size, tumor type and grade, number of positive SLNs, SLN method of detection, number of negative SLNs, LVI, multifocality, and estrogen receptor(ER) status. This nomogram performed well in the training cohort (702 cases) and the validation cohort (373 cases) with AUCs of 0.76 and 0.77, respectively9. Other prediction models, such as the Louisville models, MD Anderson Cancer Center score and Tenon score, were developed based on Western populations10,11,12. A study conducted in Japan validated the performance of the MSKCC nomogram and Stanford nomogram in Asian populations. The results showed that the AUCs in macrometastasis and micrometastasis/ITC groups were 0.680 and 0.469 with the MSKCC nomogram and 0.676 and 0.574 with the Stanford nomogram, respectively, suggesting that these nomograms could not reliably predict the metastasis of non-sentinel lymph node in an Eastern population13. A prediction model designed specifically for the Eastern population is lacking. Therefore, we sought to develop a new intraoperative mathematical prediction model based on a Chinese population to evaluate the risk of NSLN metastasis in Chinese breast cancer patients with 1–2 positive SLNs. Least absolute shrinkage and selection operator (LASSO) regression was used to construct the model.

Methods

Patients

Patients were included in our study according to the following eligibility criteria: (1) diagnosis of primary breast cancer; (2) no signs of ALN involvement discernible by physical examination or imaging; (3) the presence of only one or two positive SLNs; (4) treatment with further ALND; and (5) the availability of complete clinicopathological data. We excluded patients who had undergone neoadjuvant therapy, patients with inflammatory breast cancer or bilateral breast cancer, and patients with a history of breast cancer. The clinicopathological data of breast cancer patients who underwent SLNB and ALND at The First Affiliated Hospital of Chongqing Medical University between January 2013 and December 2018 were retrospectively collected and analysed. A mathematical prediction model was developed. The data of patients treated between January 2019 and December 2019 were used for validation.

Surgical procedures

SLNB was performed by injecting a radiocolloid (99mTc) combined with methylene blue. First, the 99mTc-labelled sulfur colloid (Xinke, Beijing, China) was injected into the subareolar area 3–18 h before surgery. Then, the surgeon injected methylene blue (Jumpcan, Nanjing, China) into the subareolar area and massaged the breast for 5–10 min before the operation. Lymph nodes, any blue-stained nodes and nodes with a high radioactive count (at least 10%) as measured by a gamma detector (neo2000@, Johnson & Johnson, US) were classified as SLNs and removed for the preparation of intraoperative frozen sections. All clinicopathological data were collected from preoperative biopsy and intraoperative frozen sections. All SLNs and ALNs dissected during surgery were routinely submitted for postoperative pathological sections and immunohistochemical (IHC) staining.

Diagnostic criteria

The tumours were classified by size as T1, T2 and T3 according to the 8th edition of the American Joint Committee on Cancer (AJCC) staging guidelines14. IHC staining was performed to determine the estrogen receptor (ER), progesterone receptor (PR) and Ki67 status. The HER2 status was determined by IHC staining combined with fluorescence in situ hybridization (FISH). The samples that were IHC (−) and IHC (1+) were considered HER2-negative, and the samples that were IHC (3+) and FISH (+) were considered HER2-positive. All cases in the study were classified as the luminal A, luminal B, HER2 overexpression or Triple negative subtype by the 2013 St Gallen International Expert Consensus15.

Statistical analysis

The χ2 test and logistic regression were utilized for the univariate and multivariate analyses, respectively. The analyses were performed using SPSS 23.0 software. The LASSO regression was performed using the glmnet package in R version 3.6.2 to establish a mathematical prediction model calculating the risk scores (RS) of the patients. The RS of each patient was calculated by the mathematical formula. We used SPSS 23.0 software to generate the receiver operating characteristic (ROC) curve. The performance of the prediction model was assessed by the area under the ROC curve (AUC) in the training cohort and the validation cohort. P < 0.05 was considered significant.

Ethical approval and consent to participate

This study was approved by the ethics committee of the First Affiliated Hospital of Chongqing Medical University (2020-309), and each participating patient provided written informed consent. All methods were confirmed to be performed in accordance with the relevant guidelines and regulations.

Results

General demographic and characteristics

Ultimately, in total, 845 patients with 1–2 positive SLNs were enrolled in our study; 714 patients were included in the training cohort and 131 patients were included in the validation cohort. All patients were females who ranged in age from 22 to 88 years. The median age was 49 years in both groups. In the training cohort, 266 patients (37.3%) were demonstrated to have NSLN metastasis by an analysis of postoperative pathological sections, indicating that the other 448 patients (62.7%) underwent unnecessary ALND. Then, the patients in the training cohort were divided into the low-RS group and the high-RS group, in which 26 patients (10.8%) and 240 patients (50.6%), respectively, were observed to have further NSLN metastasis. Additional clinicopathological parameters in the training cohort are shown in Table 1. The clinicopathological characteristics of patients in the validation cohort are shown in Table 2.

Table 1 Clinicopathological characteristics of the 714 patients in the training cohort.
Table 2 Clinicopathological characteristics of the 131 patients in the validation cohort.

Univariate and multivariate analyses in the training cohort

The univariate analysis showed that the histologic grade (P = 0.010), number of positive SLNs (P < 0.001), number of negative SLNs (P < 0.001), number of SLNs dissected (P < 0.001), SLN metastasis ratio (P < 0.001), LVI status (P < 0.001), ER status (P = 0.011), HER2 status (P = 0.005), molecular subtype (P = 0.001), and RS (P < 0.001) were associated with NSLN involvement in breast cancer patients with 1–2 positive SLNs (Table 1). In further logistic regression multivariate analysis, the histologic grade (P = 0.026), LVI status (P = 0.005), number of positive SLNs (P = 0.001), number of negative SLNs (P = 0.005), SLN metastasis ratio (P = 0.005), and molecular subtype (P = 0.007) were identified as independent predictive factors for NSLN metastasis (Table 3). Compared with the triple negative breast cancer patients, patients with the HER2 overexpression subtype were more likely to have positive NSLNs, whereas the patients with the luminal A and luminal B subtypes did not significantly differ in NSLN metastasis.

Table 3 Multivariate analysis of variables associated with NSLN metastasis.

Development of a mathematical prediction model based on LASSO regression

In our study, LASSO regression was used to develop a mathematical prediction model. Finally, the LASSO regression analysis identified the following 13 most powerful factors: the age group, clinical tumour stage, histologic type, number of positive SLNs, number of negative SLNs, number of SLNs dissected, SLN metastasis ratio, ER status, PR status, HER2 status, Ki67 staining percentage, molecular subtype and P53 status (Fig. 1). Among these factors, the SLN metastasis ratio was the most influential factor in the RS, with the maximum absolute value of the coefficient. The regression coefficients of each factor are shown in Table 4. The RS of each patient in the study was calculated using the following model equation:

$$ {\text{RS}} =\upbeta _{1} {\text{X}}_{1} +\upbeta _{2} {\text{X}}_{2} + \cdots +\upbeta _{{\text{n}}} {\text{X}}_{{\text{n}}} \left( {{\text{X}}:{\text{a\;factor}};\upbeta :{\text{the\;regression\;coefficient\;of\;that\;factor}}} \right). $$
Figure 1
figure 1

Identification of the influencing factors by LASSO regression. LASSO regression identified the following 13 most powerful predictors: (A) age group, clinical tumour stage, histologic type, number of positive SLNs, number of negative SLNs, number of SLNs dissected, SLN metastasis ratio, ER status, PR status, HER2 status, Ki67 staining percentage, molecular subtype and P53 status. Plot (B) shows the coefficients of each predictor when the 13 predictors were included in the LASSO regression model.

Table 4 Regression coefficients of the 13 most powerful factors identified by the LASSO regression analysis.

Performance of the prediction model

Subsequently, we generated the ROC curve of the prediction model. The AUC was 0.764 (95% CI 0.729–0.798), highlighting the excellent diagnostic performance of this model (Fig. 2). The RS value of 1.87239924305 was identified as the cut-off value with the highest Youden index. The sensitivity, specificity and overall accuracy of the model were 74.1%, 69.6% and 71.3%, respectively. More than 30% of the patients in the study would avoid unnecessary ALND with the prediction model. In the validation cohort of 131 patients, the AUC was 0.777 (95% CI 0.692–0.862) (Fig. 3), showing a satisfying predictive value.

Figure 2
figure 2

ROC curve in the training cohort. The AUC was 0.764 (95% CI 0.729–0.798).

Figure 3
figure 3

ROC curve in the validation cohort. The AUC was 0.777 (95% CI 0.692–0.862).

Discussion

In recent years, the results of the ACOSOG Z0011 and IBCSG 23-01 trials showed that neither the DFS nor the OS significantly differed between the SLNB-only and ALND groups among breast cancer patients with limited SLN involvement. Based on the results of these trials, the latest NCCN guidelines also recommended not performing ALND in patients with 1–2 involved SLNs who are planning to undergo breast-conserving surgery and subsequent radiotherapy16. However, in most developing countries, such as China, the BCR is only approximately 20%, while that in Western countries is 50–80%6,17,18. Even in some leading centres in China, the BCR is only 30%19. Because of the low BCR in developing countries and the absence of evidence of ALND omission in Eastern populations, most clinicians in developing countries such as China still hold a conservative view and recommend ALND for patients with positive SLNs17. In addition, the ALN status remains one of the most important prognostic factors. In our present study, only 266 (37.3%) of the patients with 1–2 SLN metastases in the training cohort were demonstrated to have NSLN metastasis after ALND, which is consistent with the results of previous studies8,20. Thus, more than 60% of the patients received unnecessary ALND. Therefore, it is of great importance to accurately predict NSLN metastasis either intraoperatively or preoperatively. Our study retrospectively analysed the clinicopathological data of the 714 breast cancer patients with 1–2 positive SLNs in the training cohort to determine the factors associated with axillary involvement. We further developed a new mathematical prediction model based on this Chinese population to evaluate the risk of NSLN metastasis.

Previous studies have shown that LVI is a feature related to a poor prognosis and that promotes local recurrence and distant metastasis of tumours21. Several recent studies have recognized LVI as an independent predictor of NSLN metastasis in patients with 1–2 positive SLNs22,23. We drew the same conclusion. In our study, 76.0% and 35.8% of the patients with LVI and without LVI, respectively, were found to have NSLN involvement, and this difference was significant. Whether histologic grade is associated with NSLN metastasis remains controversial, and the conclusions reported by Maimaitiaili A and Wang XY are inconsistent24,25. Our univariate analysis showed that the patients with higher histologic grades were more likely to have at least one positive ALN, and the histologic grade remained an independent predictor of NSLN metastasis in the subsequent multivariate analysis (OR 1.630; 95% CI 1.061–2.505; P = 0.026).

Moreover, we divided all patients in the training cohort into the luminal A, luminal B, HER2 overexpression and triple negative subtypes according to the St Gallen International Expert Consensus (2013 edition)15. In these respective molecular subtype groups, 32.0%, 35.1%, 56.5% and 37.8% of the patients exhibited NSLN involvement. Compared to the triple negative type, the HER2 overexpression type, but not the luminal A and luminal B subtypes, was associated with a statistically higher risk of positive NSLNs. Whether NSLN metastasis is associated with the molecular subtype remains controversial. The results of a recent single-centre study involving 291 patients demonstrated that patients with luminal B and HER2 overexpression breast cancer had a significantly higher possibility of having at least one positive NSLN than patients with luminal A breast cancer26. However, in another retrospective study, investigators failed to identify the molecular subtype as an independent predictor of NSLN metastasis. The patients with positive SLNs had the same risk of axillary involvement regardless of their molecular subtypes22.

The number of positive SLNs, number of negative SLNs, number of SLNs dissected and SLN metastasis ratio were important predictors of NSLN metastasis in breast cancer patients with 1–2 positive SLNs. These factors heavily rely on assessments of intraoperative frozen sections. Thus, these values are unclear prior to surgery. Two publications considered the numbers of positive and negative SLNs independent risk factors and included these factors in their prediction models27,28. The SLN metastasis ratio was incorporated into the model predictions in another clinical study29. However, the value of the number of positive SLNs, number of negative SLNs, number of SLNs dissected and SLN metastasis ratio in predicting NSLN metastasis has not been fully clarified because of the collinearity among these factors. In our study, a LASSO regression was used to construct the mathematical model, thereby effectively solving the problem of collinearity among these factors.

The MSKCC nomogram, which was the first model to predict NSLN metastasis in patients with positive SLNs, performed well in the original population, with an AUC of 0.769. The results of a previous study showed that the AUC of the MSKCC nomogram was less than 0.7, proving that the performance of the MSKCC nomogram was inferior to that of other models in other populations30,31,32. In addition, the MSKCC nomogram and other previous models were developed based on Western populations in developed countries, and thus hardly apply to the Eastern population. In the present retrospective analysis, we developed a new, LASSO algorithm-based intraoperative mathematical prediction model based on the clinical data of 714 patients in China for evaluating the risk of NSLN metastasis in Chinese breast cancer patients with 1–2 positive SLNs. The LASSO algorithm forces the sum of the absolute value of the regression coefficients to be less than a fixed value by reducing certain coefficients to zero, which helps to effectively construct a simpler model that includes only the most meaningful predictive factors. Ultimately, the LASSO regression identified the following 13 most powerful predictors: age group, clinical tumour stage, histologic type, number of positive SLNs, number of negative SLNs, number of SLNs dissected, SLN metastasis ratio, ER status, PR status, HER2 status, Ki67 staining percentage, molecular subtype and P53 status. The coefficients of each predictor are shown in Table 4. The higher the absolute value of a regression coefficient, the greater is its influence on the model. In our prediction model, the most powerful predictor was the SLN metastasis ratio, with a coefficient of 0.7672377992. This factor was also included in the Cambridge model and another recent model33,34.

Previous evidence showed that the absolute agreement rates of histologic grade and LVI between specimens obtained by core needle biopsy (CNB) and those obtained by surgical excision were only 75% and 69%, respectively35,36. The small number of specimens from CNB and intratumoural heterogeneity are possible reasons for the low concordance rate of the histologic grade and LVI status between the preoperative CNB and postoperative pathology results. Considering that the aim of our study was to develop an intraoperative prediction model, the histologic grade and LVI status, which are not entirely available via preoperative or intraoperative evaluation, were excluded from our prediction model, although these factors were identified as independent risk factors in the retrospective multivariate analysis.

Furthermore, we calculated the RS of each patient in the training cohort according to the model equation. The ROC curve of the prediction model was then generated and shown to have an AUC of 0.764 (95% CI 0.729–0.798), which is comparable to that of the MSKCC nomogram9. Thus, the predictive power of the model is acceptable. Finally, the ROC curve analysis confirmed the cut-off value of the RS to be 1.87239924305. The sensitivity, specificity and total accuracy were 74.1%, 69.6% and 71.3%, respectively. More than 30% of the patients in the study would avoid unnecessary ALND with the prediction model. Therefore, we believe that ALND may be safely ignored when the RS of a patient, as calculated by the model equation, is less than the cut-off value of 1.87239924305. We divided the patients in the training cohort into a low-RS group and a high-RS group according to the cut-off value. Significantly more patients had positive NSLNs in the high-RS group than in the low-RS group, further confirming the predictive power of our model.

To evaluate the clinical applicability of the prediction model, a subsequent independent cohort of 131 patients was used for validation. The model still showed impressive performance, with an AUC of 0.777 (95% CI 0.692–0.862). Tus, the present prediction model can be considered an intraoperative clinical tool for clinicians to predict the risk of NSLN metastasis in Chinese breast cancer patients with 1–2 positive SLNs and make decisions regarding ALND.

To the best of our knowledge, this study is the first to use the LASSO algorithm to develop a prediction model based on an Eastern population. As more than 13 factors were included in our model, this model offers a more personalized assessment for breast cancer patients. However, there are a few limitations in our study. First, this was a retrospective study, and a prospective clinical trial is greatly needed. Second, this study was a single-centre study. Our prediction model should be validated with population data from other centres.

Conclusion

In conclusion, we developed a new intraoperative mathematical prediction model with 13 predictors based on the LASSO algorithm to evaluate the risk of NSLN metastasis in Chinese breast cancer patients with 1–2 positive SLNs. The model performed well in both the training cohort and validation cohort and has good clinical applicability.