Introduction
Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of aggressive non-Hodgkin lymphoma (NHL) in adults. Up to one third of these patients fail to achieve complete remission during first-line treatment or experience relapse, and salvage treatment regimens lead to modest cure rates [
1,
2]. Identification of high-risk patients with the current prognostic scoring systems, such as the international prognostic index (IPI), is limited [
3,
4]. Therefore, more accurate prognostic markers are essential to identify patients at high risk for progression or relapse. These poor responders might benefit from an early switch to novel therapies aiming to improve outcome.
Quantitative
18F-fluorodeoxyglucose positron emission tomography (
18F-FDG PET) parameters, especially baseline metabolic tumor volume (MTV), have shown to be predictive of outcome in DLBCL [
5‐
9]. MTV reflects the
18F-FDG-avid tumor burden, but it does not comprise phenotypical aspects like spatial distribution, heterogeneity, and shape of lesions. Recently developed quantitative
18F-FDG PET image features, also referred to as radiomics, reveal biological characteristics of disease and could help to improve outcome prediction in DLBCL at baseline. Radiomics features capture detailed and quantitative information on, e.g., texture and shape of lesions. In several solid tumors, radiomics features provide prognostically relevant information [
10‐
13]. Evidence is emerging to suggest that such parameters may also have predictive value in DLBCL [
14,
15]. However, these parameters have not yet been successfully integrated with IPI components. The objective of this study was to assess the added value of baseline quantitative radiomics features in DLBCL patients compared the currently used IPI score. Secondary objectives were to assess the added value of radiomics to other clinical characteristics and MTV.
Discussion
Results from study indicate that baseline radiomics features are predictive of outcome and have added value compared to currently used clinical parameters. Adding radiomics features can significantly increase the efficiency of clinical trials.
Currently used clinical scoring systems, such as the IPI, fail to identify a high-risk group for which novel treatment approaches are most needed [
3,
4]. Combining clinical predictors and radiomics features improved model performance significantly, from an AUC of 0.68 to an AUC of 0.79. Age and WHO performance status were the only clinical predictors that remained significant. In this model, disease burden is expressed as MTV, dissemination, and intensity and combined with physical capacity to tolerate therapy, expressed as age and WHO performance status; the risk of relapse was predicted most accurately. Radiomics features had higher relative effect on the prediction of relapse compared to the clinical parameters (
Supplemental data). Contrary to our results, a recent study showed that in a multivariate analysis with age-adjusted IPI (aaIPI) and radiomics feature, aaIPI was no longer a significant predictor of outcome [
14], which could be caused by the smaller sample size or their choice to add aaIPI, instead of individual predictors.
The PPV increased with 15% when adding radiomics features compared the IPI model but still remained under 50%. Because of effective treatment regimens, event rates in DLBCL are low. In our database, the prior probability (i.e., the prevalence) of an event was 16%. By selecting high-risk patients with our combined prediction model, the posterior probability (i.e., PPV) of an event in this group increased to 44%. There are more high-risk patients included in the high-risk group identified using radiomics features combined with clinical parameters compared to the IPI model, as shown by higher progression rate at 2-year TTP (44% vs 28%, respectively). These survival rates are still rather high, meaning that even the best model poorly identifies real high-risk patients; this may be partly caused by our choice of outcome parameter. We chose TTP instead of the more commonly used PFS and overall survival (OS), because unlike TTP, both PFS and OS are affected by age [
5]. Patients with DLBCL are generally older, and outcome of these elderly patients is not only determined by lymphoma but also by age-related comorbidities, adverse treatment effects, and limited life expectancy in general. In our dataset, 14 patients died within 2 years without signs of progression (i.e., 21.2% of PFS events). Death is a competing risk for progression. Our sensitivity analysis showed that 2-year PFS as outcome parameter showed lower predictive performance compared to 2-year TTP for all models, which could indicate that the outcome of these 14 patients is indeed unrelated to lymphoma.
Radiomics features could increase the efficiency of the design of future clinical trials for new therapies. By only selecting the high-risk patients according to our proposed prediction model, fewer patients that will not experience an event will be included. Since about 44% of the patients will experience progression, depending on the expected effectivity of the proposed drug, the difference between standard and new therapies can be studied under optimal power conditions. This allows for smaller sample sizes and thus lower costs.
MTV is one of the most studied radiomics features in DLBCL [
5‐
9,
27]. In our study, the AUC for MTV was 0.66, which was similar to the AUC of other recent studies (range 0.64–0.66) [
14,
15,
28]. These studies mainly included advanced stage DLBCL patients, making stratification more difficult and possibly explaining the relatively low, AUCs. It should be noted that these studies used different outcome parameters (PFS) and segmentation methods (41% max and 1.5 × liver SUVmax). However, the choice of segmentation method probably does not influence the predictive value of MTV [
20,
29]. Schmitz et al. [
5] reported an AUC of 0.78 using the same segmentation methods and outcome parameters as in the present study. Their higher AUC may be explained by the inclusion of more low-intermediate/low-risk IPI patients in their study.
Relatively few studies have investigated the predictive value of other radiomics features in DLBCL. Moreover, due to the different features that were extracted and different numbers of features extracted, it is hard to perform a direct comparison between studies. Generally speaking, our results confirm the findings of Parvez et al., who found that radiomics features of the hottest lesion have limited predictive value [
30]. Aide et al. reported that the size of regions with similar intensity in the largest lesion (long-zone high grey-level emphasis) had highest accuracy and that this was the only predictor of 2-year event-free survival in a multivariate analysis [
14]. In our data, 48 out of 485 radiomics features of the largest lesion predicted 2-year TTP in univariate logistic regression models after Bonferroni-correction (data not shown), and indeed, long-zone high grey-level emphasis was one of them. Our study confirms that radiomics features of the largest lesion are predictive of outcome, albeit not as predictive as radiomics features at patient level, involving all lesions. In our study, the radiomics model with preselected conventional PET features and dissemination features had higher discriminative power than the models that included all 490 radiomics features, indicating that more complex radiomics features did not have additional predictive abilities compared to simpler radiomics features.
Cottereau et al. [
15] were the first and to our knowledge the only ones to investigate the predictive value of dissemination features. They reported that Dmax
patient and Dmax
bulk were significantly associated with outcome and that Dmax
patient was the only predictor of outcome in multivariate analysis. In our analysis, the predictive performance of Dmax
patient and Dmax
bulk was similar, but the discriminative power for Dmax
bulk exceeded that of Dmax
patient, so that Dmax
patient was not included in our multivariate model with backward selection. We found that adding Dmax
bulk and SUV
peak to MTV significantly improved model performance (raising AUC from 0.66 to 0.76).
Risk stratification significantly improved when combining radiomics features with clinical parameters [
15,
31,
32]. Baseline
18F-FDG PET/CTs are already part of clinical practice; therefore, radiomics features can be calculated at no additional costs. With software becoming available that easily and reliably calculate radiomics features [
18,
33], adding radiomics features to clinical scoring systems should seriously be considered. Significant efforts have been made to standardize FDG scanning, including initiatives by the European Association for Nuclear Medicine Research Limited and the US Society of Nuclear Medicine [
34,
35]. However, the absence of standardized methodology hampers the use of quantitative PET parameters. The optimal cut-off of MTV and other radiomics features heavily rely on segmentation method and underlying patient data. Work is in progress to solve these methodological problems.
This study is the first to investigate the predictive value of radiomics features at patient level, for the largest lesion and the hottest lesion while combining it with currently used clinical predictors, making it the most comprehensive study so far. Even though this is the largest study that examined the predictive value of radiomics features, with 18% of the patients that were included in the prediction model having progression, this study had limited power to test more complex prediction models that included more features or to make a distinction between refractory patients and relapsed patients. Another limitation of this study is that we used a single method to segment the lymphoma lesions. Due to the large heterogeneity of tracer uptake in DLBCL lesions, choosing a single segmentation method for the whole cohort could have caused suboptimal segmentation of lesions for some patients. However, literature suggests that the fixed SUV4.0 segmentation method is successful in 78% of DLBCL patients without editing and is acceptable in 98% of patients after manual editing [
20]. Moreover, the majority of our patients had advanced stage disease and were classified as high-intermediate or high risk by the IPI score. The relative lack of limited stage and low-risk DLBCL patients could influence the generalizability of our results. Lastly, harmonization methods such as ComBat have shown to be definitely worthwhile to retrospectively increase uniformity in large multicenter datasets. Therefore, ComBat-based data alignment would be a very successful approach to harmonize radiomics features between centers. However, in our study, the number of included patients per center was too small to apply ComBat..
To further investigate the predictive value of radiomics features in DLBCL, these results will be validated in a large cohort of DLBCL patients treated in different clinical trials (the PETRA cohort,
https://petralymphoma.org). Moreover, the combination of radiomics and genomic features could be investigated, since both have promising results, and by combining these biomarkers, the identification of high-risk DLBCL patients could be further improved.
In conclusion, prediction models combining quantitative radiomics features extracted from baseline 18F-FDG PET/CT scans with components of the IPI score significantly improved identification of patients at risk of relapse at baseline compared to the currently used IPI score. Adding radiomics features can significantly increase the efficiency of clinical trials.
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.