Introduction
Adequate image quality is the basis for a reliable diagnosis in computed tomography (CT). This means that the sum of all features of an image must enable radiologists to make an informed judgment. Yet, frequently used image quality metrics such as contrast-to-noise ratios (CNRs) only evaluate selected image features and not how well an image is suited altogether to make a diagnosis. Such metrics can therefore misleadingly indicate high image quality, although diagnostic image performance is actually lower [
1]. Image performance has gained in importance with the advent of iterative reconstruction (IR) techniques, which not only reduce noise very effectively but also affect image texture and spatial resolution, so that diagnostic performance can be compromised [
2‐
4]. In order to evaluate the diagnostic quality of an image in terms of how well it enables radiologists to perform diagnostic tasks, appropriate methods should ideally also test how well radiologists can perform diagnostic tasks with the image.
Detectability experiments are such a method. They assess image performance by testing how well detection tasks that are similar to clinical diagnostic tasks can be performed [
5,
6]. Most previous studies performed such experiments with low-contrast lesions in uniform phantoms. In other words, they performed detection tasks that mimicked the work of radiologists, but used uniform phantoms that did not. This limitation is of relevance because the texture of phantoms affects the detectability of low-contrast lesions. A previous IR study therefore concluded that image quality should be assessed in the most realistic clinical context possible, i.e., ideally with CT images of patients [
7].
3D printing provides novel opportunities to produce phantoms meeting such requirements. Previous work used 3D printing to create low-contrast lesions in cylindrical phantoms [
7]. However, no previous work attempted to create low-contrast lesions in phantoms that mimic patient anatomy. The present work therefore used radiopaque 3D printing, a method that was previously shown to provide flexibility and anatomic detail in producing patient-mimicking phantoms [
8,
9]. With this method, phantoms representing a patient’s neck and containing different low-contrast lesions were created. The phantoms were evaluated and used in a detectability experiment. The overall aim was to develop anatomically realistic phantoms with low-contrast lesions for detectability experiments.
Discussion
Detectability experiments use detection tasks to assess the diagnostic performance of CT images and should mimic the clinical situation realistically. To this end, anatomically realistic phantoms with low-contrast lesions of 10 to 40 HU contrast were developed and lesion contrasts and their impact on detectability by radiologists were evaluated. The developed approach creates a groundwork for the assessment of CT performance with methods that mimic the clinical work of radiologists.
Good agreement between target and measured lesion contrast values was achieved because gray scales of the printed images were correlated linearly with printer ink deposition and resulting HU values as previously described [
8]. Variations in contrasts measured with different scanner settings were expected, as CT settings were previously shown to affect CT numbers [
10,
11]. The observed contrast increase at a lower tube voltage was reinforced by the iodine content of the phantoms. The results are in line with previous observations in phantoms and patients after contrast medium administration [
12‐
14], underlining that the phantoms simulate the clinical situation adequately. Contrast increase at reduced tube voltage is less pronounced for soft tissues without contrast medium enhancement [
15], which may thus limit the suitability of the phantoms for studying tube voltage effects in situations where CT scans are acquired without contrast medium administration.
Detection accuracy and confidence scores increased significantly from 20 to 30 HU lesion contrast, but not from 10 to 20 HU contrast as would have been expected from previous studies using similar contrast levels [
16,
17]. However, these previous studies used uniform phantoms, and the results of the present study may thus be explained by the anatomical texture of the phantoms. This conclusion is to some extent supported by a previous model observer study, which reported detectability to increase between 10 and 14 HU lesion contrast for a uniform phantom, but not for a textured phantom with small-scale features and only slightly for two other textured phantoms with larger-scale features [
7]. However, comparability of our findings with this study is also limited because different phantoms, lesion sizes, and scan settings were used and because observer variability is lower for model observers than for human observers as in the present study.
Remarkably, the detection accuracy results for 10 and 20 HU contrast lesions were relatively high despite the low contrast and reader confidence. This can be explained by the location-known-exactly experimental design, where the task was rather simple as the participating radiologists were aware of the expected lesion position [
18]. Future work on evaluating CT techniques with the methodology presented here should consider a search task with lesions in unknown locations. The aim of the present study was to provide a groundwork for such studies by developing anatomical phantoms for detection tasks and providing an estimate of reasonable lesion contrast across different scanner settings to be used in such studies. The results suggest a contrast of 20 to 30 HU, where the participants’ confidence and detection success changed most significantly in the 2-AFC experiment.
The present study did not address the relationship between CT scan settings and resulting dose exposure and detectability scores. Studies aimed at investigating these relationships should consider that the scan length affects the dose-length product (DLP), notably the contribution of overscanning to DLPs with different pitch values. Furthermore, it should be considered that scan length and anatomical variation also affect tube current modulation behavior and resulting doses. For a realistic setup in such studies, the phantoms presented here could be inserted between anatomically realistic parts of a head-and-neck phantom. It could also be considered to provide data sets that study participants can scroll through. However, the appearance of the rod-shaped lesions would not change between images and the participants would be required to evaluate substantially more images, which is why scrolling was not considered for reading in the present study.
Previous work evaluated low-contrast lesions in cylindrical phantoms with textured background [
7] or in CT images with digitally inserted lesions [
19,
20]. However, to the authors’ knowledge, no previous work created anatomically realistic phantoms with low-contrast lesions. Such phantoms have the advantage of simulating the entire diagnostic process that patients undergo. They can repeatedly be scanned with the same or different CT systems to study inter- and intrascanner variations and acquisition techniques. The phantom images are similar to clinical images and can be used to perform detection tasks that are similar to clinical tasks of radiologists. They thus offer novel possibilities for investigating image quality more realistically than with uniform phantoms and for studying systematic scan parameter variation, which is precluded in clinical trials.
The limitations of this study include that only one patient was simulated and that only one lesion size was used. Detectability results may differ in other anatomical regions and with smaller or larger lesions. Conclusions regarding the influence of background texture are limited because there was no direct comparison with uniform phantoms. The impact of scan settings on dose and lesion detectability was beyond the scope of this work and therefore not analyzed. Also, the results we report here apply only to the CT system and the CT settings that were used in the present work.
The method we report here for the creation of phantoms to be used for detection tasks enables CT image quality to be evaluated with images and methods that mimic the clinical practice of radiologists. This is of relevance for a broad range of clinical and scientific applications including CT protocol optimization and the assessment of novel CT techniques. Such patient-mimicking phantoms have the potential to reduce patient exposure in clinical trials and to accelerate CT optimization for safer diagnostic patient imaging.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.