Introduction
Image properties of clinical computed tomography (CT) images vary significantly due to differences between vendors, scanner generations, software versions, imaging techniques, and reconstruction methods. This diversity affects the diagnostic quality of CT images [
1], and differences are likely to increase further as CT techniques evolve. In light of this situation, it is of relevance to ensure objective assessment and comparison of the clinical performance of CT techniques [
2]. Task-based methods have been proposed for that purpose and should be applicable to evaluate the diagnostic performance of CT images regardless of the underlying imaging technology used [
3,
4].
Task-based assessment is typically used to test lesion detectability in CT images of uniform phantoms, and it is commonly assumed that the results can be transferred to CT images of patients acquired in the clinical setting. Yet, there is evidence that uniform phantoms may not reflect clinical performance adequately. First, previous X-ray studies have shown that background structure affects detectability and conclusions about dose effects on image quality [
5‐
7]. Second, background texture has also been identified to affect detectability and estimated dose reduction potential of an iterative reconstruction algorithm in a CT study [
8]. Conversely, the authors of another CT study report only negligible texture effects, concluding that uniform phantoms may allow sufficient assessment of clinical performance [
9]. Both of these CT studies investigated cropped images mimicking vessel-free liver textures. In order to better understand the validity of CT assessment with uniform phantoms for clinical imaging, it would be desirable to evaluate how such assessments relate to CT images obtained in phantoms with full anatomical detail.
A recent study introduced anatomically realistic neck phantoms that can be used for such purposes [
10]. The phantoms investigated in that study contained low-contrast lesions and were produced using radiopaque 3D printing based on a neck CT image of a patient. Another recent study used the same CT image as a template to produce a uniform neck phantom for low-contrast detectability experiments [
11]. The present study compares low-contrast detectability between these two types of phantoms to test the hypothesis that anatomical detail affects task-based CT assessment. CT images of the phantoms acquired at two dose levels and reconstructed with filtered back projection and an iterative reconstruction algorithm were analyzed. The overall aim was to evaluate the effects of anatomical background structure on task-based image quality assessment in comparison with a uniform phantom background.
Discussion
Task-based methods have been proposed to evaluate and compare CT techniques for their diagnostic performance in clinical practice. Task-based assessment is typically performed using CT images of uniform phantoms, and it is of interest to what extent evidence from uniform phantoms actually reflects detectability in clinical images with anatomical detail. The present study therefore compared low-contrast detectability between uniform and anatomically realistic phantoms. Our results show that anatomical phantom structure affects detection accuracy at all investigated lesion contrasts (p < 0.001), interferes with dose effects on detection and influences the assessment of AIDR 3D performance compared to FBP.
The image assessment results we obtained for the uniform phantom are in good agreement with previous reports of relatively high detection sensitivities of more than 87% for lesions of the same size as investigated in our study [
13,
14]. Anatomical phantom structure significantly impaired lesion detectability—a contrast increase to 30 HU was necessary to achieve similar detection accuracy as for 9 HU lesion contrast in uniform images. Near-perfect detectability was achieved at a markedly higher lesion contrast (38 HU) than with the uniform phantom (18 HU).
An impact of anatomical detail was expected because structured tissue patterns (anatomical noise) have psychophysical effects on humans that interfere with detection tasks. Previous X-ray studies found anatomical noise to have stronger effects than quantum noise and to impair and eventually limit human lesion perception [
5‐
7]. This, in turn, may influence how dose changes affect detection tasks [
6,
7]. Our experiments confirm the effects of anatomical patterns on noise characteristics and the assessment of dose and reconstruction methods. Anatomical images had a low-frequency noise component that was predominant regardless of dose and image reconstruction mode. This component was in good agreement with reports of high NPS values at low spatial frequencies in patients [
15]. Anatomical background structure also influenced the denoising power of AIDR 3D, which adds to reports on interactions between anatomical texture, noise, and spatial resolution when iterative reconstruction is applied [
16‐
18]. Lesion detectability was clearly affected by dose in uniform FBP images. However, the dose-detection relationship was less clear in images with anatomical noise. Consistent with published results, AIDR 3D maintained detectability and was superior to FBP at a lower dose in uniform phantom images [
19]. These advantages were lost when anatomical structures interfered with lesion detection.
Significant texture effects on detectability were also observed in a previous CT study that compared liver-mimicking textures with a uniform phantom background [
8]. In that study, structured background textures reduced the influence of dose changes on detection, similar to what we observed for FBP images. Another CT study came to different conclusions and reported only negligible effects of liver texture on detectability in comparison with a water background [
9]. However, liver and water textures in that study were visually quite similar, which explains why the results differ from our observations. However, it should also be noted that the comparability of our results with both of these CT studies is limited by differences in CT hardware and because both studies investigated cropped images with vessel-free liver textures. To the best of our knowledge, our study is the first to compare neck phantom images with full anatomical detail, which is relevant because anatomical detail adds complexity to CT images and has a relevant impact on human lesion perception [
20,
21].
The experiments we performed here do not provide an in-depth analysis of dose reduction and image reconstruction, which requires broader testing and can be found elsewhere [
22]. For example, AIDR 3D was reported to have similar performance as FBP at 120 kVp, which our experiments confirmed, and also to be superior at a lower tube voltage of 100 kVp, which we did not assess [
22]. Our study evaluated the effects of phantom background on task-based CT assessment, and we used two dose levels and reconstructions methods to illustrate such effects. Based on our results, we conclude that phantom background has a relevant influence and that transferability of CT assessment to clinical imaging can be expected to improve as the realism of the test environment increases. In view of the published evidence discussed above, we believe that this should apply beyond the CT scanner and imaging technologies used here.
The limitations of our study include the rather narrow study protocol, which was selected to investigate the effects of phantom background, but not to perform a comprehensive analysis of dose and image reconstruction methods. Results may differ in less complex anatomical regions than the neck. However, the generalizability of our results is supported by previous work in liver imaging, which has arrived at similar conclusions about the importance of phantom texture [
8]. It should also be noted that we deliberately chose a location-known-exactly experimental design in order to avoid introducing different lesion locations as another variable possibly influencing detectability. Yet, detection experiments with lesions in unknown locations can be considered to be more realistic and representative of clinical image interpretation [
4].
Uniform phantoms differ from patients and provide an idealized environment for evaluating CT systems. Our results provide evidence that lesion contrasts in CT images of uniform phantoms are below those that are clinically relevant and corroborate data indicating that anatomical phantom structure affects estimates of CT performance and reasonable dose selection. Investigations of CT assessment aimed at predicting and comparing clinical performance must take into account differences between phantoms and patients and should be performed in a setting that mimics clinical imaging as closely as possible.
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.