Introduction
Radiomics refers to a workflow consisting conversion of digital medical images to mineable high-dimensional data, and whose subsequent analysis aims to support clinical decision-making [
1‐
3]. The potential of radiomics in precision medicine has been pointed out [
2], but the generalizability of the model and robustness of radiomics features were the main concern [
4‐
7]. In contrast to other omics data, the robustness of radiomics features is influenced by multiple factors through the workflow, including data acquisition, image reconstruction, segmentation, image processing, and radiomics feature computation [
5,
6]. Indeed, imaging devices and protocols have been demonstrated to significantly affect radiomic features in single-energy CT (SECT), MRI, and PET [
8‐
10].
Dual-energy CT (DECT), with a second x-ray spectrum, allows the differentiation of multiple materials and generation of a set of virtual monochromatic images (VMIs) with an additional attenuation measurement, which makes possible several new and clinically relevant CT applications [
11]. Radiomics has been applied to analyze the images from DECT, and showed convincible diagnostic and prognostic performance in oncology settings [
12‐
14]. However, the factors associated with robustness of radiomics features in DECT have not been fully investigated. Only intensity discretization [
15] and the energy levels of VMIs [
16] were demonstrated as sources of uncertainty of radiomics features in DECT. It is necessary to systematically evaluate the inter- and intra-scanner robustness of radiomics features in both SECT and DECT modes to allow further multi-scanner investigations. For prospective studies with various DECT scanners, harmonizing upstream acquisition parameters can minimize the impact of imaging protocols [
17]. Meanwhile, retrospective studies usually based on archived images from various SECT and DECT scans. It is important to determine whether those images are comparable enough as a basis for generating radiomics models for clinical decision-making.
Therefore, we aimed to evaluate inter- and intra-scan mode and scanner repeatability and reproducibility of radiomics features within and between SECT and DECT.
Discussion
Our study, for the first time, evaluated the test-retest repeatability, intra-scanner reproducibility between different scan modes, and inter-scanner reproducibility of radiomics features in SECT and DECT, by using a phantom with rods of clinical-relevant multiple densities. Our results demonstrated that the test-retest repeatability was acceptable, but the inter- and intra-scan mode and scanner reproducibility were relatively low. The intra-scanner reproducibility analysis demonstrated that the radiomics features extracted from SECT 120-kVp images and DECT 120 kVp-like VMIs did not match each other, even though they were acquired on the same scanner with fixed parameters, and images had similar average photon energy. The inter-scanner reproducibility suggested wide variation of radiomics features extracted from both SECT 120 kVp images and DECT 120 kVp-like VMIs among different scanners. However, correlations between inter-scanner reproducibility and material density were not detected. Additionally, we found that the first-order features were more likely to be reproducible than texture features (Supplementary Figures
S3 and
S4).
The intra-scanner reproducibility analysis indicated that SECT 120-kVp images and DECT 120 kVp-like VMIs were far from alike from the radiomics features point of view. The images generated from various DECT scanners differed from those from conventional SECT because of differences in their acquisition techniques, material decomposition methods, image reconstruction algorithms, and postprocessing methods [
25]. Although SECT-like images were generated in DECT to mimic the SECT images, the intra-scanner reproducibility of radiomics features was low between SECT images and corresponding SECT-like images in DECT. Regarding the fixed acquisition and processing parameters, the intra-scanner variation might reflect the influence of different technique approaches between SECT and DECT. Our analysis further indicated that CT number values varied significantly among scanners and scan modes, and the intra-scanner CT number value difference between SECT and DECT might be a source of variation (Supplementary Figures
S5 and
S6). Further investigations on the SECT and DECT energy dependency of radiomics features are needed. Considering the large variation of CT number values among scanners and scan modes, the small variations of raw input might not be the main source of radiomics variation. Investigations on the influence of the small variations of CT number values might be possible in the future, when stable CT number values were available among scanners and scan modes. Since the majority of SECT and DECT radiomics features were not reproducible in the same scanner, it is necessary to interpret them with caution, especially in retrospective studies where consistency of acquisition parameters was not available. Our results also provided insights for the adjustment of imaging protocols in prospective study design, that involvement of images from both SECT and DECT scanners might need extra correction procedure.
The inter-scanner reproducibility analysis mainly reflects the variations among vendors and scanners. Many steps in radiomics analysis have specific drawbacks that would need to be resolved. For instance, the robustness of radiomics features could vary due to data acquisition, image reconstruction, segmentation, and feature extraction [
8,
26‐
30]. The change of voxel size could lead to the increase of radiomics features variability [
26]. Therefore, in our study, we made the field of view, reconstruction matrix, and slice thickness the same for different scanners during acquisition, to keep the voxel the same. Since radiation dose influences on the reproducibility [
27], the tube voltage, milliamperage, and rotation time were carefully adjusted to maintain the volume CT dose index similar among scans. A rigid registration was employed to translate ROIs, avoiding the variation due to delineations [
28]. All the radiomics features were extracted via Pyradiomics, an Image Biomarker Standardisation Initiative compliant platform [
29,
31], with harmonized calculation settings, to minimize the influence of feature extraction platform. Unfortunately, several parameters could hardly be uniformed among different scanners. We selected reconstruction kernels and iteration method of a typical abdominal-pelvic examination, to allow comparable results among scanners [
27,
30], but most of them were vendor-dependent, and impossible to harmonize. Further, CT number values vary across scanners due to the different X-ray spectra of different scanners [
32], which might lead to differences in radiomics features. Additional slight differences of the images caused by different calibrations methods could be translated in radiomics variability [
8]. In addition, the introduction of DECT scanners made it more difficult to reach a high reproducibility among scanners. The best energy level for VMI reconstruction to match the SECT image differs among vendors. Therefore, corresponding DECT images have different imaging appearances, texture features, and quantitative capabilities [
25]. Further, different technical approaches to realize DECT, namely dual-source DECT, dual-layer detector DECT, and rapid kV-switching DECT, might potentially be unique sources of variability in our study [
11,
25], resulting in low inter-scanner reproducibility of radiomics features.
Acquisition parameters have greatly affected the reproducibility of radiomics features in SECT, MRI, and PET [
8‐
10]. Our study further showed that the approaches that generate similar DECT images corresponding to SECT images might yield images with different texture characteristics, because the imaging techniques used differ among vendors and scanners. The factors associated with the robustness of radiomics features in DECT have been rarely investigated. Chatterjee et al [
15] performed voxel intensity discretization through four binning algorithms, and showed the impact of HU value range on radiomics feature stability using DECT data. Baliyan et al [
16] demonstrated that the energy levels of VMIs have different impacts on the texture analysis. These sources of uncertainty are recommended to take into account when evaluating the robustness of radiomics features in DECT images in order to increase the likelihood of replicability. Overall, we consider that the main source of radiomics variation might be a combination of SECT and DECT difference, and varying CT number values among scanners.
Berenguer et al [
8] found that the reproducibility of radiomics features depended on the kind of material, in which the densest wood showed the highest reproducibility. Differences of reproducibility among sixteen rods were observed in our study, but the correlations between reproducibility and material density were not evident. Notably, two ROIs covering rods with various density showed higher intra- and inter-scanner reproducibility than those of sixteen uniform rods. As a phantom study, its non-validated nature causes concern. So far, the Credence Cartridge Radiomics phantom is the one most used for radiomics investigation [
33,
34], which provides cartridges with different textures and CT number values. However, all the scans of this phantom were performed on SECT scanners. In contrast, the phantom used in our study is dedicated for DECT quality assurance, and has been scanned on both SCET and DECT scanners. The Credence Cartridge Radiomics phantom is composed of acrylonitrile butadiene styrene, acrylic beads, and polyvinyl chloride, which might not be the best to present human body, while our model could present the physiological situation of multiple tissues using clinical-relevant densities. Further, we drew ROI 1 to 16 to present the homogeneous human tissues, and ROI 17 and 18 to present the human body with mixed densities. Radiomics features might be more robust in image with more obvious structural feature, which also matched our finding that first-order features were more likely to be reproducible than texture features. We hypothesize that small variations of input data might have greater influence on the homogenous ROIs. Further investigations are under consideration to validate this hypothesis.
There were several limitations in our study. First, our study did not test a wide range of acquisition parameters to be comprehensive and generalizable [
8], but rather chose the imaging protocol to present a typical abdomen-pelvic examination to be more translatable to the clinical practice. Second, we only compared the SECT 120-kVp images and DECT 120 kVp-like VMIs to present daily research practice. We selected vendor-recommended 120 kVp-like DECT images, and showed that their intra-scanner reproducibility was low, but it is worth investigating the true equivalent energy levels to generate VMIs in DECT, which could be object-dependent with high intra- and inter-scanner reproducibility with SECT images. Third, radiomics features can be expanded by extracting from images with wavelet or Laplacian of Gaussian transformations, but we only evaluated those extracted from the original images. We did not include the images with filtering or transformation, because of the image processing effects on the reproducibility of radiomics features [
35], which was not the aim of our work. Fourth, various feature extraction platforms have been developed for radiomics investigations; of those, we employed the Pyradiomics platform for radiomics feature extraction which is considered a reliable tool for radiomics feature extraction in phantom and clinical studies [
31]. We kept the settings harmonized during the feature extraction, but it is unknown how the feature extraction platforms influence the robustness in DECT radiomics. Fifth, we used a DECT phantom with homogenous rods for scanning. Comparing to the radiomics phantom [
33,
34], ours might lack texture. However, the phantom allows more specific results in human benefiting by its similarity to human density. Lastly, as a phantom study, our results could not be directly translated into clinical practice. Due to the highly homogenous nature of the phantom, our results could not fully reflect the characteristics of real disease. Moreover, our results must not be compared with those of clinical predictive studies. Nonetheless, our study emphasized the intra-scanner difference between SECT and DECT technique, to which attention should be paid in future investigations. Meanwhile, the reproducibility could be impaired if insufficient image processing were conducted to combat inter-scanner variability [
17].
In summary, our study indicated that the radiomics features extracted from SECT images and corresponding DECT images did not match each other, even if their average photon energy levels were considered alike. The majority of radiomics features were not reproducible among scanners, even if multiple acquisition parameters were fixed. The first-order features were more likely to be reproducible than texture features, and might provide an opportunity for improving robustness of radiomics models. Radiomics results from multiple CT scanners and with different scan techniques must be interpreted with caution because of potential risks of non-reproducible data.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.