Introduction
[
18F]FDG PET/CT is an imaging modality of high performance in the management of patients with multiple myeloma (MM) [
1‐
4]). Foremost, due to its ability to reliably differentiate metabolically active from inactive lesions, [
18F]FDG PET/CT is considered the appropriate method for treatment response evaluation in the disease [
3,
4]. On the other hand, [
18F]FDG PET/CT carries some limitations in MM evaluation. Some of these limitations are rather general, derived from the non-specific nature of the tracer, such as several false-positive findings [
5], while some are more specific for MM, including a non-negligible (11%) incidence of false-negative results [
1,
6]. Moreover, one particular challenge that clinical specialists and radiologists commonly face in MM is the standardization of the evaluation of PET/CT scans. This issue is mainly due to the different patterns of bone marrow (BM) infiltration in the disease, which, in turn, may hamper inter-observer reproducibility in interpreting scan results [
7].
In recent years, many different approaches have been developed in order to address the issue of standardization of [
18F]FDG PET/CT evaluation in MM. These approaches have made use of visual [
7,
8] as well as semi-quantitative and quantitative [
1‐
12] approaches. Although all these attempts seem promising, none can yet be considered a standard and widely accepted method in the interpretation of PET/CT. In this context, visual evaluation of the PET/CT scans remains the mainstay in clinical routine.
Deep learning, a subfield of artificial intelligence (AI), has nowadays become the method of choice for automated image analysis [
13]. This method provides new opportunities for the development of automated analysis tools for CT, PET/CT, and MRI, which have the potential to improve or replace current methods for the evaluation of these imaging modalities [
14]. Still, although the number of studies in this field is constantly growing, a large body of the literature is dominated by retrospective cohort studies with limited external validation and a high probability of bias [
15‐
19]. Particularly with regard to MM, there are no data on the application of deep learning tools for the assessment of malignancy using PET/CT.
Accordingly, the aim of this prospective study is to evaluate a novel three-dimensional deep learning-based tool on PET/CT images for automated assessment of the intensity of BM metabolism in MM patients.
Discussion
The interpretation of [
18F]FDG PET/CT in MM may prove particularly challenging since both focal and diffuse bone lesions may coexist with varying degrees of [
18F]FDG uptake. In clinical routine, the evaluation of BM involvement is primarily visual and subjective in nature, with quantitative—thus more objective—assessments being mainly restricted in the calculation of the semi-quantitative parameter, SUV, which is, however, susceptible to several factors, such as the reconstruction and acquisition parameters, partial-volume correction, blood glucose, and time between [
18F]FDG injection and image acquisition [
25], which affect its reliable and reproducible measurement. However, the standardized and reproducible interpretation of [
18F]FDG PET/CT scans is clinically relevant in both the pre- and posttreatment settings of MM. Especially, the identification of robust positivity cut-offs for outcome prediction would have beneficial implications in the management of the disease. In this context, MTV and TLG have been proposed as promising metabolic parameters for the quantification of tumor burden and outcome prediction in MM [
9,
10,
12]. At the same time, however, the accurate calculation of these parameters can be a very demanding task since it requires great computing power as well as fast and reproducible computer programs, enabling proper segmentation and correction of the background activity and partial volume effect [
25]. The herein proposed approach, involving a combination of AI-based segmentation of the skeleton and subsequent thresholding of metabolic activity, aimed to objectively address these issues, enabling an automated, volumetric assessment of the BM metabolism in MM patients.
There are three major findings after the initial application of our deep learning-based tool in MM: firstly, the automated, volumetric, whole-body assessment of the intensity of BM metabolic activity in PET/CT images is feasible. Secondly, the AI-derived PET/CT biomarkers MTV and TLG are significantly correlated with the visual (subjective) analysis of the extent of BM involvement in [18F]FDG PET/CT images. Thirdly, automatically based MTV and TLG values are also significantly correlated with the degree of BM plasma cell infiltration rate and the independent prognostic factor β2-microglobulin after the application of certain [18F]FDG uptake thresholds.
The herein applied deep learning, whole body, volumetric quantification method of [
18F]FDG metabolism in the BM is based on the initial CT-based segmentation of the skeleton, its transfer in the PET images, the application of different thresholds of tracer uptake, and the subsequent refinement of the resulting regions using postprocessing. Global thresholding for bone segmentation has only recently been applied in the setting of MM with promising results. Takahashi et al. developed a semi-automated, quantitative parameter, defined as the intensity of bone involvement (IBI), for the assessment of the amount and extent of [
18F]FDG uptake based on SUV metrics, using liver SUV as a threshold to determine metabolically active volumes in the skeleton. After the categorization of MM patients into three groups, based on the degree of visually assessed bone involvement in PET/CT, which served as a reference, the authors found significant differences between the three groups regarding the median IBI score [
25]. The same group evaluated the parameter IBI for monitoring outcomes in MM. Again, after categorization of patients into three groups based on the visual analysis of PET/CT (PET-remission, PET-stable, and PET-progression), the authors found that the IBI variation (ΔIBI) between two consecutive scans was related to the outcome in PET/CT as evaluated visually, while, moreover, significant differences in ΔIBI were found between the three groups [
29]. In our study, patients were also classified into three groups based on visual and semi-quantitative evaluation of the PET/CT scans, after taking into account parameters suggested by the literature [
8,
24,
25,
30]. We could, similarly, demonstrate a significant positive correlation between automatically derived PET parameters for all six thresholds and the degree of BM involvement in PET/CT as assessed by visual analysis. Moreover, significant differences were highlighted between the three patient groups regarding the MTV and TLG values for all applied thresholds. Of note is the—partly marked—variance in the yielded MTV and TLG values between the different approaches (Approaches 1–6) employed, which highlights the sensitivity of whole body calculations depending on the applied [
18F]FDG uptake threshold, thus calling for caution in the routine use of the tool depending on the respective clinical setting.
Another distinguishing point between this work and previous ones in the field is that in our study, we went one step further and managed to show a significant moderate correlation between the AI-derived MTV and TLG and two clinically relevant biomarkers in MM. In specific, the demonstration of the correlation between the automated, volumetric PET parameters—derived by four of the evaluated approaches (Approaches 1, 2, 4, and 5)—and the percentage of BM plasma cells derived from biopsies, a main histopathological biomarker in the disease, and the levels of β2-microglobulin, a powerful predictor of survival and a key variable of ISS [
31‐
33], significantly enhanced the robustness of our analysis, suggesting four of the applied thresholds as potentially useful cut-off values for reliable segmentation of the pathological skeleton. Moreover, the application of these four thresholds provided the best results in terms of discrimination of the studied population according to the degree of disease severity, using as a cut-off the BM plasma cell infiltration of 60% [
20,
34]. These approaches were based on the comparison of [
18F]FDG metabolism in the BM either with the tracer activity in reference organs which show very low variability and a narrow range in tracer uptake (liver and gluteal muscles) [
35‐
37] or with absolute SUV values [
12]. The reason for the partial use of different pathological uptake thresholds for the axial skeleton and the long bones is based on the fact that [
18F]FDG uptake in the skeleton is not uniform, gradually decreasing from the axial to the appendicular skeleton [
38]. Based on the present findings, these approaches will be further evaluated in future studies with larger patient cohorts. On the other hand, two of the applied thresholds (Approaches 4 and 6) failed to either demonstrate statistically significant correlations with the abovementioned clinical parameters or to discriminate the patient population based on the degree of BM plasma cell infiltration, which is attributed to the use of high [
18F]FDG uptake thresholds, leading to rather low whole body MTV and TLG values.
The interest in volumetric PET measurements in MM is not new. Fonti et al. were the first to explore the predictive role of MTV and TLG in a mixed group of 47 MM patients who received various therapies. Their analysis was based on the identification of focal lesions and the calculation of SUV
max. Afterwards, MTV was calculated in those lesions with a SUV
max > 2.5, which was almost the same as one of the thresholds applied in our analysis (SUV
max ≥ 2.5, Approaches 5 and 6) that led to a significant correlation between the automated MTV and TLG values and the percentage of BM plasma cells and β2-microglobulin. Similarly to our results, the authors noted that MTV positively correlated with the percentage of BM infiltration by plasma cells (
r = 0.46), while TLG correlated significantly with β2-microglobulin levels (
r = 0.38). They could, moreover, show that an MTV value of 77.6 mL and a TLG value of 201.4 g predicted patients with a good OS [
9]. In line with this, in a larger and more homogeneous MM cohort, McDonald et al. found that baseline TLG > 620 g and total MTV > 210 mL of MM lesions were significant factors in poor PFS and OS. In that study, MM lesions were defined as foci of increased [
18F]FDG uptake exhibiting a peak SUV (SUV
peak) greater than that of background BM assessed in the most inferior vertebral body [
10]. These findings are in agreement with the ones in the present analysis. However, an essential difference between the aforementioned studies and ours is that these approaches were not automated, and were, thus, dependent on ROI definition, which was not the case in our analysis.
Recently, in a retrospective analysis of 185 patients with newly diagnosed MM, Terao et al. investigated the predictive value of pre-treatment MTV and TLG, as assessed by a semi-automated, computer-aided analysis of the PET/CT images, and compared it with conventional PET/CT variables. The authors could show that the high-burden MTV and TLG findings were superior to the conventional high-risk PET/CT variables for outcome prediction, as assessed by PFS and OS [
12]. Similarly to our results, in another study of the same group, a significant correlation between TLG and the percentage of plasma cells in the BM was demonstrated, rendering this PET parameter potentially suitable for evaluating the histopathological tumor burden in MM [
39]. Notably, in the studies by Terao et al., MTV was defined as the volume of myeloma lesions with SUV ≥ 2.5, a threshold also herein applied (Approaches 5 and 6) that led to significant correlations between the PET, histopathological, and clinical parameters.
We note some limitations in our study. Foremost, the number of patients enrolled and PET/CT scans analyzed was relatively small. However, the studied cohort is homogeneous, consisting of treatment-naive, symptomatic MM patients examined in terms of an ongoing prospective study. Therefore, the presented findings can only be considered the preliminary results of an ongoing study. Secondly, the vast majority of PET/CT findings were not histopathologically confirmed, which is, obviously, not possible in the clinical setting. However, the demonstration of a significant correlation with two commonly accepted reference standards, namely, the percentage of BM infiltration by malignant plasma cells as derived from biopsies of the iliac crest and the plasma levels of β2-microglobulin, essentially contributed to the validation of the results. Moreover, especially with regard to the diffuse BM uptake pattern, in an effort to reduce the incidence of false positive findings, it was ensured that no included patient had previously received agents or medications, which could lead to a diffusely increased tracer accumulation in the BM, at least one month before the PET/CT study [
40]. Furthermore, limitations exist with regard to the applied segmentation method: the calculation of MTV and TLG is SUV-dependent, meaning that every factor affecting SUV calculations may also affect the evaluation of these parameters. Moreover, the patient’s skull was excluded from the segmentation analysis and subsequent metabolic parameters’ calculation due to the very high-lying diffuse [
18F]FDG uptake of the brain, rendering the skull as an “obscured site” [
1]. Although in our sample, no patient had metabolically active, focal, cranial [
18F]FDG-avid lesions, this anatomical area must be analyzed independently, inevitably making the method more operator-dependent in selected MM cases with cranial involvement. Finally, extensive lytic or paramedullary lesions, i.e., soft tissue/extraosseous lesions originating from bone lesions (Fig.
3), may be an additional source of error, subsequently leading to the need for manual corrections; since the AI tool initially makes a CT-based identification of the skeleton based on the HU scale of each region, it may be possible that large osteolytic lesions or soft tissue infiltrations linked to skeletal involvement are excluded from the BM segmentation. These issues will be specifically investigated in the future in a larger patient cohort in the context of this multicenter, randomized phase 3 trial, with the goal of validating the AI-based automated PET results in comparison to patient outcome data as well as the findings of whole-body MRI, which is considered the modality of choice for bone marrow evaluation and assessment of disease extent in MM patients [
41].
Declarations
Competing interests
Hartmut Goldschmidt declares the following:
Grants and/or provision of Investigational Medicinal Product:
Amgen, Array Biopharma/Pfizer, BMS/Celgene, Chugai, Dietmar-Hopp-Foundation, Janssen, Johns Hopkins University, Mundipharma GmbH, Sanofi.
Research support: Amgen, BMS, Celgene, GlycoMimetics Inc., GSK, Heidelberg Pharma, Hoffmann-La Roche, Karyopharm, Janssen, Incyte Corporation, Millenium Pharmaceuticals Inc., Molecular Partners, Merck Sharp and Dohme (MSD), MorphoSys AG, Pfizer, Sanofi, Takeda, Novartis.
Advisory boards: Amgen, BMS, Janssen, Sanofi, Adaptive Biotechnology.
Honoraria: Amgen, BMS, Chugai, GlaxoSmithKline (GSK), Janssen, Novartis, Sanofi, Pfizer.
Support for attending meetings and/or travel: Amgen, BMS, GlaxoSmithKline (GSK), Janssen, Novartis, Sanofi, Pfizer.
All other authors declare no conflicts of interest.
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.