A post-processing algorithm for spectral CT material selective images using learned dictionaries

Korbinian Mechlem; Sebastian Allner; Sebastian Ehn; Kai Mei; Eva Braig; Daniela Münzel; Franz Pfeiffer; Peter B Noël

doi:10.1088/2057-1976/aa6045

1. Introduction

In spectral computed tomography (spectral CT), tomographic measurements of an object are conducted with two or more distinct photon energy spectra. Compared to conventional CT, additional information about the energy dependence of the linear attenuation coefficients is obtained. This information enables the generation of material selective images [1], which can be used for quantitative imaging [2], reducing beam hardening artifacts and improving the contrast-to-noise ratio [3, 4]. Clinical applications of spectral CT include identifying contrast agent enhanced vessels [5, 6], kidney stone characterization [7] and virtual nonenhanced imaging [8]. Most spectral CT algorithms separate the process of material decomposition and image reconstruction. Material decomposition is either performed in projection space prior to image reconstruction [1, 9–11] or an image-based decomposition algorithm is applied after reconstructing the images corresponding to different photon energy spectra [12–15]. Projection-based material decomposition has the advantage of removing beam hardening artifacts and is quantitatively more accurate than image-based material decomposition. However, image-based material decomposition is still used since it is easier to implement, especially if the spectral projections are not perfectly spatially registered (e.g. in a dual source dual energy system). Noise amplification on material decomposed images [1, 16] remains a fundamental problem because it limits the utility of material selective images. This applies in particular to medical imaging where the desire for low radiation exposure of the patient leads to increased noise levels. The importance of suppressing noise and improving the image quality is reflected by the development of several denoising algorithms for material decomposition in spectral CT [15, 17–19]. Another approach is to incorporate the structural correlations between different spectral images prior to material decomposition. This can be realized by creating a reference image (e.g. by summing up all spectral channels) which is then used for noise reduction in the individual spectral images. Different strategies based on similarity matrices [20], total-variation regularization [21] and the correlation coefficient [22], have been investigated. In the last years, dictionary-based denosing has received a lot of attention in the scientific community. The key idea is to model small image patches as a sparse linear combination of learned dictionary atoms. Motivated by the success of the method in fields like image/video denoising [23, 24] and magnetic resonance imaging [25], this technique has also been applied to CT image reconstruction. Dictionary-based denosing was used for artifact suppression [26] and few-view [27] as well as low-dose [28] image reconstruction in conventional CT. The unique capability of learned dictionaries to differentiate between noise and features leads to strong noise suppression. Recently, a tensor-based dictionary learning approach for spectral CT reconstruction has been proposed [29]. By generalizing the image patches to tensors which include the spectral dimension, the structural correlation between spectral images is taken into account. However, the strong noise correlations makes it difficult to translate this approach directly to material selective images. In this work, we present a new dictionary-based algorithm for denoising material selective images. It exploits the structural correlation between basis material images as well as the fact that noise in material selective images is typically highly anti-correlated [15]. Since the algorithm is applied to basis material images as a post-processing step, it is applicable to both image-based and projection-based material decomposition. We demonstrate, on simulated as well as real measurement data, that the denoising algorithm leads to strongly improved image quality. Moreover, our new algorithm achieves superior image quality compared to two other post-processing methods, namely conventional dictionary denoising and bilateral filtering [30].

2. Methods

Our algorithm for denoising of basis material images is a modification of the well established dictionary denoising process aimed at exploiting the structural correlation between material selective images as well as the anti-correlated noise. First we describe dictionary denoising, which was used as a basis for our new algorithm, before explaining the modifications made to achieve enhanced performance for basis material images.

2.1. Dictionary denoising

In dictionary denoising, the image is divided into small overlapping patches which are processed individually. The final denoised image is then compiled from the individually processed image patches. The degree of overlap of the patches can be described by the sliding distance, which is defined as the distance (measured by the number of voxels) between the centers of adjacent image patches. For illustration purposes, figure 1 shows the extraction of overlapping patches (size 3 × 3 pixels) from a 2D-image, using a sliding distance of two pixels. However, we chose to use 3D cubic patches for our algorithm in order to avoid horizontal streak artifacts in sagittal or coronal views [31]. The key assumption of dictionary denoising is that natural image patches have a sparse representation in a suitable basis. This means that they can be modeled as a linear combination of a small number n of basis functions, with $n\ll N$ , where N denotes the number of voxels in an image patch. To ensure an optimum sparse representation, the basis functions (also called dictionary atoms) are obtained from application-specific training images by a dictionary learning algorithm. The final dictionary represents an over-complete basis comprised of K ( $K\gg N$ ) dictionary atoms which reflect typical structures occurring in the training images. In general, learning the dictionary atoms from application-specific training images allows for a more sparse representation of the image patches compared to using a generic basis. Contrary to structures and image features, noise cannot be sparsely represented with the dictionary basis. By modeling the image patches as a linear combination of a few dictionary atoms, noise is therefore effectively suppressed. Since dictionary denoising operates on image patches instead of individual voxels and is able to 'recognize' image features, the tradeoff between noise and resolution can be partially mitigated. This tradeoff is typical for many denoising methods which operate on individual voxels (e.g. filtering, nearest-neighbor-based regularization). In the following, the mathematical description of dictionary denoising will be introduced. An image patch of $d\times d\times d$ voxels can be expressed as a N-dimensional vector $x\in {{\mathbb{R}}}^{N},N={d}^{3}$ . The dictionary comprised of K atoms is written as a matrix $D\in {{\mathbb{R}}}^{N\times K}$ where the columns of the matrix represent the dictionary atoms. The denoising problem for one patch can then be formulated as:

$\begin{eqnarray}&&\mathop{\min }\limits_{\alpha }\parallel \alpha {\parallel }_{0}{\rm{}}\,{\rm{s.t.}}\,{\rm{}}\parallel x-D\alpha {\parallel }_{2}^{2}\leqslant \epsilon ,\end{eqnarray} \tag{ 1 }$

where $\alpha \in {{\mathbb{R}}}^{K}$ is a vector with few nonzero entries, $\epsilon \gt 0$ is a small error tolerance and $\parallel \bullet {\parallel }_{0}$ is the l₀-norm. Because solving equation (1) is NP-hard, greedy algorithms are employed to compute an approximate solution in acceptable time. We chose to use the orthogonal matching pursuit algorithm [32]. The dictionary atoms were learned from training images before the denoising process by solving the following optimization problem:

$\begin{eqnarray}&&{\mathop{\min }\limits_{\alpha ,D}}_{}\displaystyle \sum _{s=1}^{S}\parallel {\alpha }_{s}{\parallel }_{0}{\rm{}}\,{\rm{s.t.}}\,{\rm{}}\displaystyle \sum _{s=1}^{S}\parallel {x}_{s}-D{\alpha }_{s}{\parallel }_{2}^{2}\leqslant S\epsilon ,\end{eqnarray} \tag{ 2 }$

with the help of the fast online learning method [33]. The index s indicates the different patches which were extracted from the training images. We chose to employ a 'global dictionary' approach. This means that the dictionary is fixed prior to the denoising process and not dynamically adapted during denoising ('adaptive dictionary'). Using a fixed dictionary saves computational time and the resulting image quality is comparable to adaptive dictionary denoising for conventional CT images [28].

**Figure 1.** Graphical illustration of covering a 2D-image with overlapping patches (size 3 × 3 pixels), using a sliding distance of two pixels.
Download figure:
Standard image High-resolution image

2.2. Modifications for denoising of basis material images

Although our algorithm theoretically works with more than two basis material images, we will focus on two-material decomposition in the following. The performance of the algorithm for three-material decomposition will be investigated in the future. Material decomposition algorithms for spectral CT typically produce two basis material images with highly anti-correlated noise. Suitable linear combinations of the basis material images yield virtual monochromatic images which represent the attenuation at a certain reference energy. The associated coefficients of the linear combination are positive for both basis materials and therefore anti-correlated noise is decreased. By choosing the reference energy at which the anti-correlated noise maximally cancels out, a virtual monochromatic image with substantially improved signal-to-noise ratio (SNR) compared to the basis material images is obtained. In general, it is much easier to separate image features from noise in this virtual monochromatic image (called minimum noise image in the following) than performing the same task for the basis material images. Furthermore, the minimum noise image shares the same structures and edges with the basis material images while the voxel values in all three images are different. The image patches x_s can be assumed to be the sum of their ground truth values ( ${x}_{s}^{t}$ ) and noise ( ${x}_{s}^{n}$ ):

$\begin{eqnarray}&&{x}_{s}={x}_{s}^{t}+{x}_{s}^{n},\end{eqnarray} \tag{ 3 }$

with s being the image patch index. A key assumption for the following steps is that the ground truth values of the minimum noise image patches and the basis material image patches are related by linear transformations:

$\begin{eqnarray}{x}_{s}^{a,t} & = & {m}_{s}^{a}+{\beta }_{s}^{a}{x}^{o,t},\\ {x}_{s}^{b,t} & = & {m}_{s}^{b}+{\beta }_{s}^{b}{x}^{o,t},\end{eqnarray} \tag{ 4 }$

where ${x}_{s}^{a,t}$ , ${x}_{s}^{b,t}$ and ${x}^{o,t}$ indicate the ground truth values of the sth image patch of basis material image a, b and the minimum noise image, respectively. The fit coefficients for the linear transformation from the minimum noise image patches to the image patches of basis material a and b are denoted by ${m}_{s}^{a},{\beta }_{s}^{a}$ and ${m}_{s}^{b},{\beta }_{s}^{b}$ , respectively. In the following, the noisy image patches will be denoted by ${x}_{s}^{a},{x}_{s}^{b}$ and ${x}_{s}^{o}$ .

The first step of our algorithm is to subtract the mean values ${m}_{s}^{o}$ from the minimum noise image patches ${x}_{s}^{o}$ in order to identify edges and structures independent of a constant offset. From now on, offset-corrected patches will be marked with a tilde (e.g. ${\tilde{x}}_{s}^{o}$ ). After the offset correction, we perform dictionary denoising on the image patches ${\tilde{x}}_{s}^{o}$ :

$\begin{eqnarray}\forall s:\quad {\alpha }_{s} & = & {\mathop{\min }\limits_{\alpha }}_{}\parallel \alpha {\parallel }_{0}{\rm{}}\,{\rm{s.t.}}\,{\rm{}}\parallel {\tilde{x}}_{s}^{o}-D\alpha {\parallel }_{2}^{2}\leqslant \epsilon ,\\ {\tilde{x}}_{s}^{o,d} & = & D{\alpha }_{s},\end{eqnarray} \tag{ 5 }$

where ${\tilde{x}}_{s}^{o}$ denotes the sth patch from the minimum noise image and ${\tilde{x}}_{s}^{o,d}$ denotes the corresponding denoised patch. Using the assumption of equation (4), we will compute the denoised basis material image patches ${x}_{s}^{a,d}$ and ${x}_{s}^{b,d}$ by applying linear transformations to the processed minimum noise image patches ${\tilde{x}}_{s}^{o,d}$ . Similarly to the minimum noise image, the mean values ${m}_{s}^{a}$ and ${m}_{s}^{b}$ are subtracted from the image patches ${x}_{s}^{a}$ and ${x}_{s}^{b}$ in order to obtain the offset-corrected image patches ${\tilde{x}}_{s}^{a}$ and ${\tilde{x}}_{s}^{b}$ . Adopting the same notation as in equation (4) implies that the mean values of the image patches will be used to approximate the corresponding coefficients of the linear transformations. The fit coefficients ${\beta }_{s}^{a}$ and ${\beta }_{s}^{b}$ are approximated by projecting the denoised and offset-corrected patches of the minimum noise image onto the offset-corrected basis material image patches:

$\begin{eqnarray}&&{\beta }_{s}^{r}=\displaystyle \frac{\langle {\tilde{x}}_{s}^{r},{\tilde{x}}_{s}^{o,d}\rangle }{\parallel {\tilde{x}}_{s}^{o,d}{\parallel }_{2}^{2}},\quad r=\{a,b\},\end{eqnarray} \tag{ 6 }$

where $\langle \bullet \rangle$ signifies the scalar product. We obtain the final denoised basis material image patches ${x}_{s}^{a,d}$ and ${x}_{s}^{b,d}$ by performing the following linear transformations with the denoised and offset-corrected minimum noise image patches ${\tilde{x}}_{s}^{o,d}$ :

$\begin{eqnarray}{x}_{s}^{a,d} & = & {m}_{s}^{a}+{\beta }_{s}^{a}{\tilde{x}}_{s}^{o,d},\\ {x}_{s}^{b,d} & = & {m}_{s}^{b}+{\beta }_{s}^{b}{\tilde{x}}_{s}^{o,d}.\end{eqnarray} \tag{ 7 }$

In summary, dictionary denoising is used to identify structures and edges in the minimum noise image. This process works much more efficiently and reliably on the minimum noise image compared to the basis material images because of the improved image quality and the reduced noise level. Exploiting the structural correlations of the basis material images and the minimum noise image, the denoised basis material images are calculated by applying local linear transformations to the processed minimum noise image. Figure 2 shows a graphical representation of the aforementioned joint dictionary denoising algorithm and summarizes the most important steps.

**Figure 2.** Graphical representation of the dictionary-based algorithm for denoising of basis material images, using one image patch as an example. The key steps of the algorithm are: (1) extract image patches from the same locations in the basis material images and the minimum noise image. (2) Subtract the mean values of the image patches. (3) Apply dictionary denoising to the offset-corrected minimum noise patch. (4) Calculate ${\beta }_{s}^{a}$ and ${\beta }_{s}^{b}$ using equation (6). (5) Linearly transform the processed minimum noise image patch to obtain the corresponding basis material image patches according to equation (7). (6) The final basis material images are compiled using all the denoised basis material image patches, compare equation (8).
Download figure:
Standard image High-resolution image

**Figure 2.** Graphical representation of the dictionary-based algorithm for denoising of basis material images, using one image patch as an example. The key steps of the algorithm are: (1) extract image patches from the same locations in the basis material images and the minimum noise image. (2) Subtract the mean values of the image patches. (3) Apply dictionary denoising to the offset-corrected minimum noise patch. (4) Calculate ${\beta }_{s}^{a}$ and ${\beta }_{s}^{b}$ using equation (6). (5) Linearly transform the processed minimum noise image patch to obtain the corresponding basis material image patches according to equation (7). (6) The final basis material images are compiled using all the denoised basis material image patches, compare equation (8).
Download figure:
Standard image High-resolution image

The size of the image patches plays an important role for the performance of the algorithm. On the one hand, larger patches lead to better dictionary denoising results and the fit coefficients for the linear transformations can be determined more reliably. On the other hand, the assumption of equation (4) might not be fulfilled anymore if the patch size is chosen too large. Furthermore, the computational complexity increases drastically with increasing patch size. We chose a patch size of $8\times 8\times 8$ voxels for our experiments to balance the aforementioned tradeoffs. Additionally, the computational complexity of the algorithm can be reduced by increasing the sliding distance d_s of the image patches, since the total number of image patches is proportional to $1/{d}_{s}^{3}$ . However, we found that the image quality decreases slightly with increasing sliding distance. We chose to use a sliding distance of two voxels for our experiments.

In order to reduce blocking artifacts [23], which become more prevalent with increasing sliding distance, we introduce a non-uniform weighting scheme for the contributions of different image patches to a certain voxel:

$\begin{eqnarray}&&{v}^{d}=\displaystyle \frac{{\displaystyle \sum }_{p=1}^{P}{w}_{p}{v}_{p}^{d}}{{\displaystyle \sum }_{p=1}^{P}{w}_{p}},\quad {w}_{p}={{\rm{e}}}^{-{r}_{p}^{2}/\delta },\end{eqnarray} \tag{ 8 }$

where v^d indicates the value of the voxel v after denoising. The summation index p includes all P image patches which contain the voxel v. The weights w_p decrease exponentially with the squared distance ${r}_{p}^{2}$ between the voxel v and the center of the image patch p. The quantity ${v}_{p}^{d}$ denotes the value of voxel v in image patch p after denoising and δ is an adjustable parameter. With this weighting scheme, voxels close to the patch centers are given more weight. This reflects the idea that structures at the centers of the image patches can be detected more reliably than structures at the boundaries of the patches. Heuristically, we found that $\delta =8.5$ gives the best results for a patch size of $8\times 8\times 8$ voxels and a sliding distance of two voxels. We use the FORBILD head phantom [34] (size 800³ voxels) as training image. The final dictionary contains 2048 atoms with 512 (= 8³) voxels.

2.3. Bilateral filtering

Bilateral filtering is a well known noise reducing and edge preserving post-processing method in the field of image processing. Similarly to our algorithm, an adapted version of the bilateral filter is able to exploit structural correlations between several aligned images. We therefore chose to compare our algorithm to this adapted version of the bilateral filter. Conventional bilateral filtering is a generalization of Gaussian smoothing and can be written as:

$\begin{eqnarray}&&{v}_{i}=\displaystyle \frac{1}{{N}_{i}}\displaystyle \sum _{j\in {N}_{i}}{v}_{j}f({r}_{{ij}}){{\rm{e}}}^{-{({v}_{i}-{v}_{j})}^{2}/2{\sigma }^{2}},\quad \end{eqnarray} \tag{ 9 }$

where N_i is a geometrical neighborhood of pixel v_i and $f({r}_{{ij}})$ is a distance dependent weighting factor (r_ij is the geometrical distance between the voxels v_i and v_j). The weighting factor ${{\rm{e}}}^{-{({v}_{i}-{v}_{j})}^{2}/2{\sigma }^{2}}$ controls the degree of averaging in dependence of the difference between the voxel values. The idea is to suppress the averaging process across edges and encourage averaging if the voxel values are close to each other. To ensure this, the tuning parameter σ is normally chosen to be comparable to the image noise level. A generalization of this idea is to suppress the averaging process if an edge is detected in either of the two images a and b:

$\begin{eqnarray}{a}_{i} & = & \displaystyle \frac{1}{{N}_{i}}\displaystyle \sum _{j\in {N}_{i}}{a}_{j}f({r}_{{ij}}){{\rm{e}}}^{-{({a}_{i}-{a}_{j})}^{2}/2{\sigma }_{a}^{2}}{{\rm{e}}}^{-{({b}_{i}-{b}_{j})}^{2}/2{\sigma }_{b}^{2}},\\ {b}_{i} & = & \displaystyle \frac{1}{{N}_{i}}\displaystyle \sum _{j\in {N}_{i}}{b}_{j}f({r}_{{ij}}){{\rm{e}}}^{-{({a}_{i}-{a}_{j})}^{2}/2{\sigma }_{a}^{2}}{{\rm{e}}}^{-{({b}_{i}-{b}_{j})}^{2}/2{\sigma }_{b}^{2}}.\end{eqnarray} \tag{ 10 }$

In order to make bilateral filtering and dictionary denoising more comparable, we used a 3D version of the adapted bilateral filter with a geometrical neighborhood of the same size as the dictionary patches ( $8\times 8\times 8$ voxels).

2.4. Numerical simulation

A spectral CT scan of the FORBILD thorax phantom [34] (size 768³ voxels) was simulated. We assumed acceleration voltages of 100 and $140\ \mathrm{kVp}$ for the low and high energy scan, respectively. In both cases, the spectrum was filtered with $0.2\ \mathrm{mm}$ of copper and an ideal energy-integrating detector with a CsI-based scintillation layer was assumed ( $1\ \mathrm{mm}$ thickness). After reconstruction of the low and high energy images via filtered backprojection an image-based material decomposition into a bone and soft tissue image was conducted using direct matrix inversion.

2.5. Experimental measurement

A CT scan of a human knee was conducted at an experimental setup. The usage of the knee specimen for research projects was approved by the institutional review board. The donor had dedicated its body for educational and research purposes, and provided written informed consent prior to death, in compliance with local institutional and legislative requirements. The ex vivo human knee specimen was fixed in formalin. The tube was operated at an acceleration voltage of $110\ \mathrm{kVp}$ and a CdTe-based photon-counting detector (XC-Flite FX1, XCounter AB, pixel size $200\ \mu {\rm{m}}\times 200\ \mu {\rm{m}}$ ) with thresholds set to 27 and $52\ \mathrm{keV}$ was used. In total, 1201 projections were taken and the tube loading was $131\ \mathrm{mAs}$ . A projection-based material decomposition algorithm [35] was applied in order to obtain basis material images representing Compton scattering and photoelectric absorption.

3. Results

To investigate the performance of the new algorithm, we applied it as a post-processing method to basis material images obtained from projection-based as well as image-based material decomposition techniques. Furthermore, we compare the performance of our new joint dictionary denoising algorithm to adapted bilateral filtering as well as conventional dictionary denoising. Before turning to experimental measurements, we first present the results of a numerical simulation since this gives the possibility to compare the denoised images with a ground truth image.

The bottom row of figure 3 shows the simulated bone (d) and soft tissue image (e) as well as the minimum noise image (f). In the top row of figure 3 the ground truth values for the bone (a) and soft tissue image (b) as well as the minimum noise image (c) are displayed. The top row of figure 4 shows the results of applying joint dictionary denoising (a), conventional dictionary denoising (b) and bilateral filtering (c) to the bone image. Similarly, soft tissue images processed with our new algorithm (d), conventional dictionary denoising (e) and bilateral filtering (f) are displayed in the bottom row of figure 4. The tuning parameters of the algorithms ( for dictionary-based denoising and σ for bilateral filtering) were optimized by maximizing the structural similarity index (SSI), using the ground truth images as references. Since only one parameter was tuned for all denoising algorithms, this approach is not prone to overfitting. In table 1, the mean squared error (MSE) and SSI compared to the ground truth are given for various denoising methods.

**Figure 4.** Comparison of different denoising methods for the numerical simulation. The bone and soft tissue images are shown in the top and bottom row, respectively. The columns of the figure represent the different denoising methods: joint dictionary denoising (a), (d), conventional dictionary denoising (b), (e) and bilateral filtering (c), (f). The range of the windows is [−0.19, 0.58 g cm⁻³] and [0.42, 1.38 g cm⁻³] for the bone and soft tissue images, respectively.
Download figure:
Standard image High-resolution image

Table 1. Mean squared error (MSE) and structural similarity index (SSI) for the bone and soft tissue image compared to the ground truth for various denoising methods. The MSE was normalized with respect to the unprocessed basis material images.

	No	Bilateral	Conventional	Joint
	denoising	filtering	dictionary denoising	dictionary denoising
MSE, bone image	1	0.746	0.337	0.296
MSE, soft tissue image	1	0.414	0.222	0.184
SSI, bone image	0.332	0.888	0.946	0.953
SSI, soft tissue image	0.249	0.807	0.906	0.942

Figure 5 compares the influence of the dictionary patch size and the sliding distance on the image quality for joint dictionary denoising. The same parameter as in figure 4 (scaled by the dictionary patch size) was used for all experiments.

Figure 6 shows images of a human knee in transverse slice orientation at the level of the patello-femoral joint. Figures 6(a) and (b) show the basis images for Compton and photo effect, respectively. In figure 6(c), the corresponding minimum noise image is displayed. Figure 7 compares the results of applying different denoising methods (bilaterial filtering, conventional and joint dictionary denoising) to the Compton and photo images displayed in figures 6(a) and (b). Since there was no reference image available, the denoising parameters were visually tuned to achieve a comparable edge sharpness for all methods. The following denosing parameters were used: joint dictionary denosing: $\epsilon =0.1807\ {{\rm{cm}}}^{-2}$ , conventional dictionary denoising: ${\epsilon }_{\mathrm{Compton}}=0.3315\ {{\rm{cm}}}^{-2}$ , ${\epsilon }_{\mathrm{photo}}=2.885\times {10}^{10}\ {{\rm{keV}}}^{6}\,{{\rm{cm}}}^{-2}$ , bilateral filtering: ${\sigma }_{\mathrm{Compton}}^{2}=1.004\times {10}^{-3}\ {{\rm{cm}}}^{-2}$ , ${\sigma }_{\mathrm{photo}}^{2}=8.084\times {10}^{7}\ {{\rm{keV}}}^{6}\,{{\rm{cm}}}^{-2}.$

**Figure 6.** Unprocessed Compton (a), photo (b) and minimum noise (c) image for the experimental measurement of a human knee. The range of the windows is [0.0, 68400 ${{\rm{keV}}}^{3}\,{{\rm{cm}}}^{-1}$ ], [0.035, 0.35 ${{\rm{cm}}}^{-1}]$ and [0.09, 0.59 ${{\rm{cm}}}^{-1}]$ for the photo, Compton and minimum noise image, respectively.
Download figure:
Standard image High-resolution image

**Figure 7.** Comparison of bilateral filtering (c), (f), joint (a), (d) and conventional (b), (e) dictionary denoising for the experimental measurement of a human knee. The top row shows Compton images whereas the bottom row shows photo images. The range of the windows is [0.0, 68400 ${{\rm{keV}}}^{3}\,{{\rm{cm}}}^{-1}]$ and [0.035, 0.35 ${{\rm{cm}}}^{-1}]$ for the photo and Compton images, respectively.
Download figure:
Standard image High-resolution image

4. Discussion

For the numerical simulation, material decomposition via matrix inversion leads to a strong degradation of the SNR and highly anti-correlated noise in the material selective images. Therefore, the noise level can be greatly reduced by calculating the virtual monochromatic image at which the anti-correlated noise cancels out maximally (see figure 3(f)). Joint dictionary denoising leads to basis material images with strongly improved image quality compared to the unprocessed images. The processed images look similar to the ground truth images. The most apparent differences are slightly blurred edges and the presence of a small amount of low frequency noise in the denoised images. Numerical experiments show that this low frequency noise is mostly caused by the uncertainties in determining the correct mean values ( ${m}_{s}^{a},{m}_{s}^{b}$ ) of the image patches. However, since the variation (due to noise) of the mean value of an image patch is much smaller than the variation of an individual voxel, the noise level is reduced compared to the unprocessed images. In order to achieve a similar noise reduction with conventional dictionary denoising, spatial resolution and edge sharpness have to be sacrificed (compare the zoomed region of figures 4(b) and (e)). The loss of edge sharpness only occurs for the lower contrast features in the soft tissue region. Conventional dictionary denoising is still able to accurately distinguish the edges between soft tissue and bone from noise. In the case of bilateral filtering, reduced edge sharpness in the soft tissue region and an increased noise level compared to conventional dictionary denoising can be observed. The aforementioned qualitative statements are supported by quantitative image quality measurements. Measured in terms of MSE and SSI, the images produced by our algorithm are notably closer to the ground truth compared to bilateral filtering and conventional dictionary denoising.

Depending on experimental parameters, one could imagine a scenario where structures of the basis material images cancel (almost) completely in the minimum noise image. In this case, structures and edges could get lost in the basis material images because they are compiled from linearly transformed minimum noise image patches. This is a limitation of the joint dictionary denoising algorithm in its current form. A possible extension of the algorithm could compare several virtual monochromatic images and locally choose the one with the best SNR. However, this would require a method to locally estimate the image signal.

As figure 5 shows, the image quality for both basis material images becomes worse if the dictionary patch size is reduced. This effect is more pronounced for the soft tissue image. Since the soft tissue image has a higher noise level, the uncertainties in determining the fit parameters for the linear transformations (compare equation (4)) grow faster with decreasing patch size. The image quality also deteriorates slightly with increasing sliding distance, especially if the silding distance is larger than three voxels. Up to a sliding distance of three voxels, the image quality is only marginally reduced. It is therefore reasonable to use a sliding distance of two or three voxels to save computational time.

The goal of the experimental measurement was to demonstrate that our algorithm achieves a strong improvement in image quality in case of a clinically relevant image with complicated structures. Compared to the minimum noise image (figure 6(c)), the basis material images (figures 6(a) and (b)) show a decreased SNR and the noise is anti-correlated. In contrary to the numerical simulation, where maximizing the SSI leads to a comparable noise level for all denoising methods, the denoising parameters for the numerical simulation were visually tuned to achieve comparable edge sharpness for all denoising methods. As can be seen from figures 7(a) and (d), joint dictionary denoising efficiently removes noise from the basis material images, while fine structures and features (visible in the minimum noise image) can be clearly identified. In the unprocessed basis material images, most of these structures vanish in the noise. Consequently, the image quality of the basis material images can be greatly improved by applying joint dictionary denoising. Bilateral filtering and conventional dictionary denoising leads to higher noise levels and some image features are lost (see for example the top-right region of the images in figure 7).

5. Conclusion

We have developed a new method for denoising of basis material images in spectral CT. As a post-processing method, it can be used for image-based as well as projection-based material decomposition techniques. The algorithm is based on the capability of learned dictionaries to preserve image features while suppressing noise. We have introduced several modifications of the conventional dictionary denoising algorithm in order to exploit the structural correlations of basis material images as well as the anti-correlated noise. Dictionary denoising is applied to the virtual monochromatic image at which the anti-correlated noise maximally cancels out. Suppressing noise and identifying image features is in general much more efficient and reliable for this minimum noise image compared to performing the same task directly on the basis material images. An exception to this occurs if structures cancel out in the minimum noise image, which is a potential limitation of the algorithm. The denoised basis material images are subsequently calculated by applying linear transformations [36] to the processed minimum noise image patches. We demonstrated that post-processing basis material images with the proposed algorithm leads to highly improved image quality. Noise is strongly suppressed while almost no blurring of edges and structures occurs. Furthermore, our joint dictionary denoising algorithm leads to superior image quality compared to conventional dictionary denoising and bilateral filtering. Improving the image quality of basis material images is an important goal because noise amplification on material decomposed images is a fundamental problem of spectral CT. This applies in particular to medical imaging where the desire for low radiation exposure of the patient leads to increased noise levels. Our algorithm therefore has the potential to improve the usability of basis material images for various tasks such as artifact reduction, quantitative imaging and clinical diagnosis.

Acknowledgments

We acknowledge financial support through the European Research Council (ERC, H2020, AdG 695045), the DFG Cluster of Excellence Munich-Centre for Advanced Photonics (MAP), the DFG Gottfried Wilhelm Leibniz program and the support of the TUM Institute for Advanced Study, funded by the German Excellence Initiative.

A post-processing algorithm for spectral CT material selective images using learned dictionaries

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

Dates

Peer review information

Abstract

1. Introduction