In this study, the SNACOR score was able to differentiate between low-, intermediate-, and high-risk patients, who respectively showed a median OS of 31.5 months, 19.9 months, and 9.2 months. However, the original SNACOR publication reported respective median OS values of 49.8 months, 30.7 months, and 12.4 months for these groups. Hence, the discriminative ability of the SNACOR score between the three risk groups with respect to OS was inferior in our study compared to the original one. We observed considerable overlap in the survival time distribution. Accordingly, the Harrell’s C-index was 0.59 and the IBS was 0.175. AUROCs for overall survival were 0.641 at 1 year, 0.633 at 3 years, and 0.609 at 6 years; in the original SNACOR study, the comparable AUROC values were 0.756, 0.754, and 0.742, respectively. In summary, SNACOR does not perform well enough to be used alone to make clear-cut clinical decisions.
In the multivariate analysis, and in contrast to the original SNACOR study, we were only able to confirm the predictive value of tumour size, baseline alpha-fetoprotein level, and Child-Pugh class. Thus, two of the five parameters for calculating the SNACOR score were not predictive in our analysis, which may at least in part be due to the moderate sample size. The objective radiological response and tumour number at baseline failed to show a significant impact on survival. Notably, tumour size and tumour number reflect a patient’s tumour burden, and tumour size correlates with a higher risk of vascular invasion and distant metastasis [
24,
25]. As tumour size is a known independent risk factor of survival [
26,
27], it is part of several risk prediction models that have been published in recent years. We confirmed that tumour size is an independent predictor of survival. However, as noted above, tumour number was not an additional independent predictor of survival in our analysis. Whether or not tumour number is a significant prognostic factor is unclear in the literature; some series found it to have predictive value [
27‐
30], while others did not [
5,
26]. The fact that tumour number was not an independent predictor of survival in our study collective might be attributable to the moderate size of the final patient group of 268 patients. However, this validation group was considerably bigger than the validation cohort in the original SNACOR publication, which comprised 145 patients. Furthermore, it might be explained at least in part by the phenomenon of collinearity; we observed some positive correlation between tumour size and tumour number (Spearman
r = 0.165). Alpha-fetoprotein level (AFP) was an independent predictor of survival in our analysis, which is in accordance with the majority of publications [
27‐
29,
31], since AFP may be a surrogate marker for tumour burden and tumour aggressiveness [
32,
33]. Therefore, AFP is part of several prediction scores [
6,
26,
30]. The Child-Pugh score describes liver function and has shown significant prognostic value in several studies [
28,
34‐
36]. Objective radiological response was not an additional independent predictor of survival in our analysis. Although it was not predictive in several other studies as well [
10,
37], most authors regard objective radiological response as an important predictor [
5,
6,
31,
38]. The fact that objective radiological response was not an independent predictor in our study might also be attributable to the moderate sample size and the phenomenon of collinearity, at least in part. We observed a weak negative correlation between tumour size and the objective radiological response (Spearman
r = − 0.172). One important reason why the SNACOR score did not show the same predictive power in our study as in the original publication might be the so-called “overfitting” effect. This has been described as “a phenomenon occurring when a model maximizes its performance on some set of data but its predictive performance is not confirmed elsewhere due to random fluctuations of patients’ characteristics in different clinical and demographical backgrounds [
8]”. Our patients differed significantly from the patients in the original SNACOR study in terms of tumour number, Child-Pugh class, and aetiology [
7]. For example, alcoholic cirrhosis was the main reason for hepatocellular carcinoma in our study, whereas in the study by Kim et al., 71.2% of patients had hepatitis-B-related hepatocellular carcinoma, and 12.9% of patients had hepatitis-C-related hepatocellular carcinoma [
7].
Our analysis has several limitations. The most important ones are that our validation was conducted in a retrospective manner and that the final sample size (
n = 268) was only moderate. Ideally, prospective validation would be performed with a sufficiently large patient cohort using a multicentre approach. As recommended by the authors of the original SNACOR publication, which only included patients who underwent cTACE, in this study TACE was performed as cTACE or using DEB-TACE. Differences in TACE techniques might influence the applicability of the SNACOR system. cTACE and DEB-TACE have been compared multiple times in the last decade, but these comparisons have never shown a significant influence on survival [
18,
39,
40]. Indeed, we drew the same conclusion when we analysed our own data [
41]. Patients who underwent liver transplantation or surgery after TACE were excluded in the present analysis in order to ensure comparability with the original SNACOR data. However, from a statistical point of view, such patients should not be excluded; rather, they should be censored at the time of treatment change in order to eliminate immortal time bias.