Sentinel lymph node status is a strong prognostic indicator for patients with melanoma and is regularly used to select patients for adjuvant therapy.1 However, a negative sentinel lymph node biopsy (SLNB) does not lead to intensification of management for melanoma. A systematic review of 21 studies evaluating SLNB complications found that 11.3% (95% confidence interval [CI]: 8.1–15.0) of patients undergoing SLNB for melanoma had a complication, including infection (2.9%), hematoma (0.5%), seroma (5.1%), lymphedema (1.3%), and nerve injury (0.3%).2 Current National Comprehensive Cancer Network (NCCN) guidelines recommend against SLNB for patients with pretest probability <5%, and SLNB is best utilized with a >5% pretest probability of identifying regional metastasis.3 American Joint Committee on Cancer (AJCC) staging is currently used to assess the pretest probability of positive SLNB through depth and ulceration status; per American Society of Clinical Oncology and Society of Surgical Oncology guidelines, routine SLNBs are not recommended for patients with T1a melanoma.4 SLNB can be considered for patients with T1b melanoma after discussion and is recommended for patients with T2, T3, or T4 melanoma.1 However, binning patients into these discrete groups may not accurately reflect individual risk and discourages shared decision making based on individual patient preferences.

Previous attempts to predict SLNB outcomes by using clinical factors have relied primarily on single-institution data sets, including the Memorial Sloan-Kettering Cancer Center (MSKCC) model of 979 patients, the Melanoma Institute Australia (MIA) model of 3477 patients, and for T1 melanomas, the Istituto Nazionale Tumori model of 3666 patients.5,6,7 A nomogram to predict SLNB positivity for patients with thin (0.5–1.0 mm) melanoma was created by using the National Cancer Database (NCDB) from 2012 to 2015 (21,971 patients). This nomogram was suggested to decrease the indication for SLNB by 81.3% in patients with thin melanomas.8 Gene expression tests also have been developed to improve the prediction rate of SLNB positivity; however, these tests are expensive, and the appropriate use of these tests is still being studied.9,10,11,12,13,14,15

In this study, we built a personalized, clinical decision tool to calculate the pretest probability of sentinel lymph node metastasis in primary cutaneous melanoma using a large, nationally representative database. We present Expected Lymphatic Metastasis Outcome (ELMO; https://melanoma-sentinel.herokuapp.com/), a predictive model developed by using nationally representative data with clinically actionable probability thresholds based purely on clinical factors without any additional testing.16 ELMO is a free resource.

Methods

Patient Population

Data from the Surveillance, Epidemiology, and End Results (SEER) Registry from 2000 to 2017 and the National Cancer Database (NCDB) from 2004 to 2015 were used to develop a predictive model (for each database, these were the widest timeframes for which all variables were available).17,18 SEER contains greater than 28% of the U.S. population and is comparable in representation to the general population.19 NCDB collects hospital registry data on more than 1,500 treatment facilities in the United States (largest clinical cancer registry in the world with >70% of newly diagnosed cancers in the United States).

Patients with melanoma who underwent SLNB were included. Patients were excluded if critical data was missing, including melanoma thickness, ulceration status, anatomic site, Clark’s level, or SLNB status. Patients with melanoma in situ also were excluded. SEER and NCDB data were combined and split into development (70%), test (10%), and internal validation (20%) cohorts by random assignment.

Algorithm Development

Two models were constructed and compared to predict sentinel lymph node status: one using logistic ridge regression and another using random forest classification. Model building and analysis were completed in Python using scikit-learn. Logistic regression was chosen as the final model for ELMO as it results in a highly interpretable model; ridge regression controls for variance in the dataset.20 Logistic ridge regression calculates the probability that a patient has SLN metastasis and then uses a probability threshold to assign a positive or negative prediction.

We also analyzed a random forest classification model. These models often have more predictive power than logistic regression models, because they do not assume data linearity.21 Random forests are an ensemble of decision trees that assign a probability to an outcome based on the number of decision trees that predict that outcome. We chose a probability threshold of 5% for both models, which matches the current standard of care.7,10,22

Clinical attributes that have been shown in previous work to have independent, biological relevance with regards to patient outcomes were included in our model, including Breslow depth, stage, ulceration, mitotic rate, age, sex, and primary site.

The development set was used to create and train the models. The tuning set was used to select the appropriate hyperparameters (such as the amount of regularization, a parameter that assists in tuning the model to avoid overfitting) and to select the best performing model.23 The validation cohort was kept separate and was only used to test the final model. Final model parameters were compared to the MSKCC and clinicopathological and gene expression profile (CP-GEP model) parameters through direct comparison to their published validation statistics.5,7,9 Sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and AUC were calculated for the logistic regression and random forest models and compared to previously published model parameters. SLNB reduction rate was calculated by using ELMO to evaluate the percent of patients with ≥T1b melanoma who would not be recommended for SLNB, as previously described ([TN+FN]/[TP+TN+FN+FP]).10 All code necessary to reproduce this model is available at the following link: https://github.com/karenlarson/MelanomaSentinel.

External Validation

To evaluate real-world performance, ELMO was externally validated using 1568 patients with cutaneous melanoma and known SLNB status collected retrospectively from 2005 to 2021 at Oregon Health & Science University (OHSU). OHSU is a large tertiary referral center in Portland, Oregon, serving as Oregon’s only academic health center (city population 645,291, metropolitan area 2.4 million). Missing Clark’s Levels were assigned using available data for Breslow depth and stage, as described in the SEER Staging Procedures for Clark’s Level assignment.24

Results

The development cohort included 134,809 patients, and the internal validation cohort included 38,518 patients with cutaneous melanoma, both of which were 65% NCDB and 35% SEER (Figure S1). Sociodemographic and tumor characteristics did not differ significantly between the development/internal validation cohorts, and SLNB positivity was 6.52% in both cohorts (14.73% in external validation; Table 1).

Table 1 Clinicopathologic features of patients in development and validation cohorts

The logistic regression and random forest models were compared to assess the likelihood of sentinel lymph node metastasis (Table 2; Table S1). Both models demonstrated strong ability to predict SLNB results (AUC 0.85 for logistic regression and 0.86 for random forest; Fig. 1). We chose the logistic regression model as the basis for ELMO due to improved interpretability and generalizability. The logistic regression model determined that depth, mitotic rate, and Clark’s level are three of the most significant features in determining if a patient had a positive SLNB, with larger values indicating that they are more likely to have a positive test. Model coefficients are provided in Table S2. Polynomial features were included to allow for combined terms (Table S2). Relative feature importance analysis was used to assess which values should be included in the final model (Fig. S2). Negative importance values (e.g. “[mitotic rate]2” and [depth]2”) indicate that including the feature in the model worsens model performance.25 For ≥T1b melanoma, ELMO performed with sensitivity of 0.9204, specificity of 0.3258, NPV of 0.967, and PPV of 0.162 (Table 2).

Table 2 Performance of the logistic regression model
Fig. 1
figure 1

Plot of ROC for our model, ELMO (Expected Lymphatic Metastasis and Outcomes)

Our model resulted in a 29.54% SLNB reduction rate for ≥T1b tumors compared with the current recommendations of performing SLNB for ≥T1b tumors (Fig. 2). ELMO had relatively high sensitivity in predicting SLNB status for T1b (0.69), T2a (0.82), and T2b tumors (0.97; Table 3).9 The sensitivity and specificity of ELMO in predicting SLNB status for T1a tumors were 0.33 and 0.96, respectively. The overall AUC of our model was 0.85, indicating a high discriminatory ability.

Fig. 2
figure 2

Plot of the sentinel lymph node biopsy reduction rates (SLNB RRs) for T1b and higher

Table 3 Comparison of ELMO and CP-GEP model in predicting sentinel lymph node status

When externally validated with cutaneous melanoma patients from OHSU using the logistic ridge regression model, ELMO performed with an accuracy of 0.7586, positive predictive value (PPV) of 0.31, sensitivity of 0.46, specificity of 0.8108, and AUC of 0.7218.

Calibration plots were produced comparing the expected and observed probabilities of positive SLNB using the development/training, internal validation, and external validation datasets to determine whether the predicted probabilities were systematically skewed (Fig. S3) and to provide further validation of this clinical prediction model as previously described.26 Using percentiles and 20 bins (yielding 5% of the data in each plot), the betas (slopes) for the training, internal validation, and external validation datasets were 1.004, 1.053, and 0.38 respectively. The alphas (intercepts) were − 0.00027, − 0.0029, and − 0.22, respectively.

Discussion

In this study, we present the Expected Lymphatic Metastasis Outcome (ELMO), a free-to-use predictive model with clinically actionable probability thresholds based purely on clinical factors without any additional testing (https://melanoma-sentinel.herokuapp.com/).16 Using both the SEER and NCDB databases, this is the largest predictive model available for evaluation of SLNB efficacy in patients with melanoma.

ELMO builds on the success of previous predictive models. The MSKCC model was developed by using a cohort of 979 patients at a single institution from 1991 to 2003; this model was found to have a predictive value (AUC) of 0.694 when evaluated with the validation dataset.7 CP-GEP uses a gene expression profile test to identify melanoma patients who are good candidates for SLNB.9 The ELMO model has high discriminatory ability with an AUC of 0.85 on internal validation and 0.72 on external validation; the internal validation metrics of ELMO are greater than those reported for both the CP-GEP model (0.82) and the MSKCC (0.77).7 Similarly, the ELMO model had a comparable predictive accuracy compared with the recently published MIA model of 3477 melanoma patients (0.74, a 9.2% increase in accuracy over the MSKCC model).6 A model of patients with T1 melanoma containing 3666 patients in Italy in the development cohort utilized a random forest procedure to determine that age, growth phase, Breslow depth, ulceration, mitotic rate, regression, and lymphovascular invasion were associated with sentinel node status, and had good discriminatory ability (C index of 95.8%).5 However, collection of data from a single institution in Europe, inclusion of only T1 melanoma patients, and the range of requisite data for the nomogram limit its applicability to the general American population. ELMO also was more predictive than a prior nomogram using NCDB from 2012 to 2015 in thin melanomas (AUC 0.67).8 This is likely due to ELMO’s inclusion of a significantly larger population (integrating more years of NCDB, as well as SEER), analysis of all melanoma patients (as opposed to only those with thin melanomas), as well as due to the iterative process of internal validation used to create ELMO. Ultimately, to directly compare the performance of ELMO to the MIA and MSKCC models, the next step is to assess the head-to-head performance of the MIA, MSKCC, and ELMO nomograms on the same database, as well as to further validate ELMO on large external databases (as has been done in previous studies for the MSKCC model).7,27,28,29 Importantly, the AUCs derived from development models are not definitive in determining which algorithm is more accurate due to potential overfitting issues; future research should compare ELMO, MIA, and MSKCC on multiple independent sample in order to evaluate comparative accuracy in different populations. Although mitotic rate is no longer part of AJCC staging, it was found to be a significant predictor of SLNB positivity using both this model and ELMO; this finding is important in refining development of future guidelines for SLNB.

The data utilized to create ELMO are nationally representative and contains the largest number of melanoma cases available in the United States. When externally validated using retrospective data from a large tertiary referral center, ELMO (external AUC 0.72) performed better than the MSKCC model and was comparable to the MIA model (external AUC of 0.69–0.75).6,30 In addition, ELMO is able to achieve this AUC with fewer predictive factors and does not require melanoma subtype or lymphovascular invasion, which can be subjective and are not always reported.

The external validation cohort in this study is comprised of more advanced cases than the development and internal validation cohorts, with significantly higher SLNB positivity (14.7% in the external validation dataset vs. 6.5% in dataset development), median Breslow depth, and proportion of >T1a melanoma (Table 1). This is due to the use of external validation data from a large tertiary referral center at the only academic health center in the state, which disproportionately includes more advanced melanomas than the nationally representative data used for model design and optimization. Although external validation is beneficial in contextualizing the results, this ultimately led to a skew in the prediction characteristics of the external validation curve (Supplemental Fig. 3C), as well as the lower AUC seen on external validation. This suggests that this model may be less sensitive and more specific in populations with higher pretest probability of positive SLNB (e.g., large referral centers). Although the nationally representative nature of the development/internal validation cohort suggests broad applicability, future research is necessary to externally validate this model in a variety of settings.

The creation of ELMO has substantial implications for both the prevention of unnecessary SLNB in patients with T1b or higher melanoma and early detection of metastatic melanoma. ELMO is able to discriminate patients who require SLNB with high accuracy. Patients with a negative result with ≥ T1b melanoma using our test may be able to forego a SLNB as they have a < 5% risk for a positive biopsy (NPV 96.7% for ≥ T1b melanoma; Table 2). For example, using current staging and NCCN guidelines for a group of 1,000 melanoma patients, approximately 500 patients would have T1a disease and would not receive SLNB while the other 500 would have ≥ T1b disease and would undergo SLNB. This would result in 438 negative (and potentially unnecessary) SLNBs in the ≥T1b group and four patients with undetected sentinel node metastasis in the T1a group. If ELMO were used on these patients, negative (and potentially unnecessary) biopsies would be reduced by 144 in the ≥T1b group.

A meta-analysis of three studies evaluating the predictive value of 31-gene expression profile (GEP) showed the sensitivity and specificity of the test for both recurrence and distant metastasis to be 76%.12 Similarly, a meta-analysis of six articles on a 31-gene signature found that high-risk GEP status is associated with increased odds of SLNB positivity (odds ratio [OR] 2.99).11 Although the CP-GEP test is better able to reduce the number of negative tests (192 vs. ELMO’s 236 of every 1000 melanoma patients), CP-GEP also misses more positive cases (6.5 vs. ELMO’s 4.7 of every 1000 patients) and requires genomic testing, which is not readily available for widespread use.9

ELMO is particularly valuable given its integration of readily available prognostic features into a predictive model. The open access model is available to anyone worldwide with internet access and can be used by patients, dermatologists, and other healthcare providers for informed decision-making regarding the use of SLNB. In addition, the model provides a personalized probability score, allowing for nuanced and individualized discussions, consistent with recent consensus guidelines from the Melanoma Prevention Working Group (MPWG).31 This personalized probability is not available from any current commercial tests.

Although SLNB status is used to determine eligibility for adjuvant therapies, such as immunotherapy in patients with cutaneous melanoma, SLNB bears significant cost burden on the patient/healthcare system and has risk for complications that do not differ by anatomic site, including infection, seroma, lymphedema, and nerve injury.2 Further research is important in evaluating the PPV of ELMO and other clinical decision-making tools to investigative their efficacy as potential stand-alone tests to select patients for adjuvant therapy.

The open access and free-to-use model of ELMO is critical in light of the substantial cost of genomic testing (e.g., $5800 for FoundationOne testing, $6500 for Caris Molecular Intelligence, etc.).32 The cost of SLNB is more than $11,000 on average.33 The COVID-19 pandemic has highlighted the importance of widespread access to teledermatology and an additional risk posed by SLNB with possible subsequent hospitalization.34 The broad accessibility of ELMO to all patients and providers with internet access provides a centralized resource, which can be used in diverse hospital-based settings across the nation, given its internal validation using the two largest cancer databases and external validation using data from a large tertiary referral center. Importantly, the clinical detection of adenopathy is vital to the management of patients with melanoma. This model should be used to supplement (not to replace) a thorough physical examination, current NCCN guidelines, and shared, individualized decision-making between the patient and physician.

In total, 5525 (0.64%) patients from the initial combined database of 860,205 patients had missing data, and 345,182 (40.13%) had “missing” encoded into a variable (not including patients for whom lymph node positivity was not available), causing them to be excluded from our analysis (Supplemental Fig. 1). The relatively large proportion of missing data has been documented extensively in the past for a variety of cancers, primarily with regards to NCDB.35,36,37,38,39 Although the total population size could have been increased through imputation, we chose not to impute data in order to ensure model training on real-world data and preserve the accuracy and precision of this model.36,39 Despite the missing data, ELMO was still trained on the largest dataset of melanoma patients. Additionally, there was no indication in preliminary analyses that data was systematically missing, and imputation would likely not have improved upon model performance. The inclusion of a large range of years of data collection minimized the chance of systematically missing data.37,38,39 Finally, the iterative process of internal validation and analysis using an external database mitigated the cost of missing data.

There are several limitations to this study inherent to the use of observational retrospective data, including potential miscoding, unaccounted confounders, effect modification, and potentially double-counted patients. SEER is subject to variations in data reporting, migration of patients through SEER registry regions, and selection bias. In total, 48.5% of all melanomas in the United States in 2005, and 52% of melanoma from 2012 to 2014 were included in NCDB; however, it may not be nationally generalizable given that it is a hospital-based registry.40,41 Another limitation is the relatively low SLNB positivity rate of 6.5% on internal validation (external validation: 14.7%); the lack of SLNB pathology status data stratified by all covariates limits the interpretability of the present data and should be explored further. Furthermore, it is known that significant miscoding of thin (particularly ultrathin) melanomas exists in SEER; some SEER patients with positive SLNB and thicker melanomas were likely erroneously included in our patient population as ultrathin melanomas, which may have artificially increased the rate of SLNB positivity in patients lower Breslow Depth.42 The relatively high specificity of ELMO (compared with the PPV) is due to the proportionally higher number of true negatives compared with true positives. The next step is to validate utilization of the model in clinical practice and other institutions prospectively to evaluate reduction in the number of unnecessary biopsies in patients with ≥T1b melanoma. Future research should specifically evaluate the reliability, accuracy, and SLNB RR of ELMO in thin and nonulcerated melanoma—these are patients for whom SLNB often is discussed, but remains negative in the large majority.1,43 Ultimately, more prospective data are needed to evaluate the utility of SLNB in patients with T1a melanoma and determine which patients with ≥ T1b do not need SLNB; in an era of expanding adjuvant therapies for patients with Stage IIC and Stage III/IV melanoma, this prediction tool should undergo further evaluation before being used as a substitute for current NCCN guidelines.

Conclusions

We developed a free-to-use predictive model for SLNB positivity for patients with melanoma. This model has been successfully internally validated using the largest publicly available dataset of melanoma patients and was found to be more accurate and discriminating than other published models. Given the cost and morbidity burden posed by SLNBs, individualized risk estimates for SLNB positivity are critical in facilitating thorough decision making for healthcare providers and patients with melanoma.