Background
Computed tomography (CT) is one of the most widely used imaging modalities in medicine and its usage has been consistently growing over the past decades [
1]. Although CT is an indispensable tool in modern medicine, one of its drawbacks is that the radiation used in this type of imaging can be harmful to the patient health. Reducing the radiation dose requires reducing the number of projection measurements or reducing the radiation intensity, which in turn results in noisier measurements. Reconstructing a high-quality CT image from such measurement is a great challenge. The traditional image reconstruction methods in CT are based on filtered back-projection (FBP). Although FBP-based methods are fast and relatively easy to implement, they perform very poorly when the number of projections is reduced or when the measurements are very noisy. With increased awareness of the harmful effects of excessive radiation, recent years have witnessed a new interest in statistical reconstruction methods in CT. These methods promise reconstructing a high-quality image from undersampled or noisy projections. Statistical image reconstruction methods in CT are based, as their name implies, on including the statistics of the detected photon counts in the reconstruction process. They can be classified into three categories [
2,
3]:
(A)
Methods that work on the raw data (sinogram)- these are mostly smoothing techniques that aim at reducing the Poisson noise in the projection data. The processed sinogram data are then used to reconstruct the image using an iterative or filtered backprojection (FBP) method.
(B)
Methods that work on the reconstructed image- these are also denoising techniques, but operate in the image domain.
(C)
Methods that use statistical modeling of the imaging process to devise iterative algorithms for image reconstruction.
Whereas the methods in category (C) usually lead to better image quality, the first two types of methods are usually easier to implement, faster, and independent of the image reconstruction method. Among the first two categories of methods, category (A) is usually much more effective. This is because while we have a good understanding of the properties of the noise in the raw (i.e. projection) data, the statistics of the noise in the reconstructed image are not as well understood. Hence, methods that filter the reconstructed image are post-processing algorithms that cannot use the photon statistics and their performance is in general inferior to filtering in the raw data domain.
Previous studies in sinogram denoising are numerous and they present a diverse set of possible approaches to the problem. Because the variance of the Poisson noise changes with the signal amplitude, adaptive filters are a natural choice and they have been researched extensively [
4,
5]. Bayesian approaches, usually maximizing a weighted sum of the data likelihood and a regularity term, have also been proposed in several studies [
6,
7]. These algorithms can be very elaborate because the resulting optimization problem is usually hard to solve. At the other end of the spectrum of algorithmic complexity, there exist very simple methods involving shift-invariant low-pass filters that reduce the high-frequency variations in the sinogram [
8,
9]. As one might expect, these methods are much less effective because they do not consider the signal-dependent nature of the noise. Another class of algorithms that have been used for sinogram denoising include multi-resolution methods involving short-time Fourier and wavelet transforms [
10,
11]. These methods are based on a standard signal processing approach of separating the noise from the signal in the transform domain. A number of studies have focused on denoising the projection data in the Radon space (i.e. after the logarithm transformation). Even though the photon counts follow a Poisson distribution, the noise in the Radon space follows an approximately Gaussian distribution [
12,
13]. Therefore, some studies have proposed algorithms for noise suppression in the Radon space based on the assumption that the noise is normally distributed [
14,
15]. The Gaussianity assumption significantly simplifies the problem and widens the range of tools that can be employed in algorithm development. However, as the photon counts decrease, the distribution of the noise in the Radon space can significantly depart from a Gaussian distribution [
13]. Therefore, these methods become less efficient in low-dose setting, which is exactly where noise suppression is mostly needed.
In the past decade, image denoising has witnessed a growing interest in patch-based methods. The two main classes of patch-based denoising methods include dictionary-based denoising methods and nonlocal-means methods. Dictionary-based denoising methods are based on the assumption that small patches of natural images have a sparse representation in a (usually overcomplete) dictionary that can be learned from training data [
16,
17]. As opposed to more traditional denoising methods that use an off-the-shelf basis (e.g., a wavelet basis), these dictionary-based methods adapt the dictionary to the specific class of images at hand. If the dictionary is well-trained, genuine image features will have a sparse representation in the dictionary, whereas noise will not have such a sparse representation. Therefore, sparse representation of image patches will lead to denoising. Dictionary-based denoising methods have shown to be highly successful in denoising of general natural images [
16,
17] as well as medical images [
18,
19]. The nonlocal-means methods, on the other hand, exploit the self-similarities in natural images. To estimate the denoised value of each pixel, they consider a small patch centered on that pixel and search the image for patches that are similar to this patch. The true (i.e., denoised) value of the pixel is estimated by some collaborative filtering of the found similar patches. This collaborative filtering was in the form of a weighted averaging in the original nonlocal-means algorithm [
20,
21], but has much more elaborate forms in more recent algorithms [
22]. Although the origins of these denoising methods go back approximately 10 years, they have been extended and improved in many ways and they present some of the best available image denoising methods. One of the limitations of patch-based methods, especially for large 3D images that are common in medical applications, is their high computational demands.
In this study, we suggest smoothing the noisy sinograms by minimizing a cost function that consists of a data likelihood term and a regularization term that promotes gradient sparsity. The resulting optimality condition suggests an adaptive filter that carries out a stronger denoising where the signal intensity is higher, which is the expected outcome under Poisson distribution. Instead of solving the resulting optimization problem directly, we suggest modifying an existing algorithm such that it approximates the exact solution locally. Therefore, our approach computes the denoised value of each sinogram pixel by solving a local optimization problem in a small neighborhood around that pixel. We will evaluate the performance of the suggested approach by applying it to noisy simulated projections and low-dose projections acquired from a micro-CT scanner.
Methods
Given a noisy image,
v, the goal is to find an image
u that ideally preserves all important features in the image while reducing the unwanted noise. Since this is an ill-posed inverse problem, additional information needs to be introduced to regularize the problem. The denoised image can then be found by minimizing a cost function of the form:
$$ E_{\lambda}(u)= \frac{1}{2} ||u-v||_{2}^{2} + \lambda R(u) $$
(1)
The first term in
E
λ
(
u) reflects the requirement that the denoised image should be close to the noisy image and the second term is the regularization. In the context of denoising problem formulated as (
1), a proper regularization is necessary because without a regularizing term the noisy image itself is the only solution of the problem; i.e., if
λ=0 then
u=
v is the only minimizer of
E
λ
(
u). A famous example of variational denoising methods is the Rudin-Osher-Fatemi (ROF) model that has the following form [
23]:
$$ E_{\lambda}(u)= \frac{1}{2} ||u-v||_{2}^{2} + \lambda \int_{\Omega} |\nabla u| $$
(2)
where Ω denotes the image domain and λ is the regularization parameter. The second term, where ∇u is the gradient of u, is the regularization term. The choice of ℓ
2-norm in the first term stems from the assumption of additive Gaussian noise. For the case of Poisson distribution considered in this study, this term has to be modified.
Let
u(
i,
j) and
v(
i,
j) denote pixels of discretized versions of
u and
v. For an arbitrary pixel location (from the probability mass function of a variable with Poisson distribution):
$$ P(v(i,j);u(i,j))= \frac{e^{-u(i,j)} u(i,j)^{v(i,j)}}{v(i,j)!} $$
(3)
assuming the pixel values are independent, for the entire image we will have:
$$ P(v|u)= \prod\limits_{i,j} \frac{e^{-u(i,j)} u(i,j)^{v(i,j)}}{v(i,j)!} $$
(4)
We ignore the denominator, which is independent of
u. Since we want to find a functional to minimize, we consider the negative logarithm of the numerator:
$$ -\text{log}(P(v|u) \propto \sum\limits_{i,j} u(i,j)- v(i,j) \text{log}\,(u(i,j)) $$
(5)
With this modified fidelity term, the new energy functional to be minimized will have the following form [
24]:
$$ E_{\lambda}(u)= \frac{1}{2} \int_{\Omega} (u-v \, \text{log} u) + \lambda \int_{\Omega} |\nabla u| $$
(6)
Now, compare the optimality conditions for the two models (obtained from the Euler-Lagrange equation):
$$ {}\left\{ \begin{array}{ll} (u - v) + \lambda \, p =0 & \quad \text{For the ROF model in (2)}\\ (u - v) + (\lambda u) \, p =0 & \quad \text{For the new model in (6)} \end{array} \right. $$
(7)
where p is a sub-gradient of \(\int _{\Omega } |\nabla u|\). The only difference between the two equations is the dependence of the regularization parameter on u in the new model. The new model suggests that a stronger smoothing must be applied where signal has larger values. This outcome agrees with what we expect since under Poisson distribution noise level is proportional to signal intensity.
Minimization of the new functional in (
6) is not a straightforward problem. One approach is to first replace |∇
u| with
\(\sqrt {|\nabla u|^{2}+\epsilon }\) for a small
ε>0 and then to apply a gradient descent iteration [
24]. Another approach, suggested in [
25], is to use a Taylor’s expansion of the data fidelity term and to minimize this approximate model. In this paper, we follow an algorithm that was developed for the original ROF model [
26]. However, we denoise each sinogram pixel separately by minimizing
E
λ
(
u) in a small neighborhood around that pixel, and with a regularization parameter inspired by the new model.
Total variation denoising has proved to be very effective when applied to piecewise-constant images. On more complex images, however, it can lead to undesired effects. Most importantly, on images with piecewise-linear features, i.e. regions with smooth change in intensity, ROF’s original model leads to staircase artifacts. This is because, by design, ROF’s model favors piecewise-constant solutions. We believe that this can be a major problem when applying TV denoising to sinogram images because even if the imaged object is piecewise-constant, its projections can be very far from piecewise-constant. This is easy to visualize because a feature with uniform intensity in the imaged object will have a piecewise-constant projection in the sinogram only if different rays from the x-ray source to the detector plane cut the same length through that feature, which is a very unlikely scenario. Hence, the projections are very likely to contain features of higher orders of regularity (i.e., piecewise-linear, piecewise-quadratic, etc.) that would suffer from a staircase effect when treated with the ROF model. A heavily researched approach to reducing the staircase effect is to replace the
ℓ
1 norm of the gradient with the
ℓ
1 norm of higher-order differential operators [
27,
28]. A less sophisticated approach, but one that has a trivial implementation, is to perform the total variation minimization locally. This approach has also been shown to alleviate the staircase effect [
29]. Moreover, with a local minimization strategy, if the size of the neighborhood considered in minimization is small enough, one can safely assume that the signal intensity and noise level are approximately constant. Therefore, a solution based on the ROF’s original model will be a good approximation to the solution of the new model based on Poisson noise. This way, we can utilize efficient existing algorithms for the ROF model while avoiding the staircase artifacts.
Since our approach is based on Chambolle’s famous algorithm [
26], we briefly describe this algorithm here. If we denote by
X and
Y the space of the image
u and its gradient, ∇
u, respectively, then an alternative definition of total variation of
u is:
$$ \sum\limits_{i,j} |\nabla u|_{i,j}= \text{sup} \left\{ \langle p,\nabla u \rangle_{Y} : p \in Y, |p_{i,j}| \leq 1 \right\} $$
(8)
Chambolle introduced the discrete divergence operator as the dual of the gradient operator, i.e. 〈
p,∇
u〉
Y
=〈−div
p,
u〉
X
. In the discrete image domain:
$$ \left(\text{div} \, p\right)_{i,j}= \left(p_{i,j}^{1}-p_{i-1,j}^{1}\right)+ \left(p_{i,j}^{2}-p_{i,j-1}^{2}\right) $$
(9)
Because of the duality of the gradient and divergence operators, total variation can also be written as:
$$ {}\sum\limits_{i,j} |\nabla u|_{i,j}= \mathop{\text{sup}}_{z \in K} \, \langle z,u \rangle_{X} \; \; \; \; K= \{\text{div }p: p \in Y, \, |p_{i,j}| \leq 1 \} $$
(10)
The minimizer of the energy functional in (
2) is then obtained by projecting
v onto the set
λ
K:
$$ u= \, v- \pi_{\lambda K}(v) $$
(11)
which is equivalent to minimizing the Euclidian distance between
v and
λ div
p, and this can be achieved via the following iteration for computing
p:
$$ p^{0}=0; \,\,\, p_{i,j}^{n+1}= \frac{p_{i,j}^{n}+\tau \left(\nabla(\text{div} \, p^{n} -v/\lambda)\right)_{i,j}}{1+\tau |\left(\nabla(\text{div} \, p^{n} -v/\lambda)\right)_{i,j}|} $$
(12)
where
τ>0 is the step size. For a small enough step size,
τ≤1/8, the algorithm is guaranteed to converge [
26].
Instead of a global solution, we minimize the energy functional (
6) in a small neighborhood of each pixel. To this end, let us denote by
ω the set of indices that define the desired neighborhood around the current pixel. For example, for a square neighborhood of size (2
m+1)×(2
m+1) pixels that we used in this study,
\(\omega =\left \lbrace (i,j): \, i,j=-m:m \right \rbrace \). We also consider a normalized Gaussian weighting function on this neighborhood:
$$ W(i,j)= \exp \left(-\frac{(i^{2}+j^{2})}{h^{2}}\right) $$
(13)
The local problem will then become that of minimizing the following functional:
$$ E_{\lambda,W}(u')= \frac{1}{2} ||u'-v_{\omega}||_{W}^{2} + \lambda' \int_{\omega} |\nabla u'| $$
(14)
where
\(||.||_{W}^{2}\) denotes the weighted norm with weights
W and
v
ω
denotes the noisy image restricted to the window
ω around the current pixel. It must be clear that here
u
′ is a function defined only on the neighborhood
ω. The solution of this local optimization problem will be similar to Chambolle’s algorithm described above [
29]. The only difference is in the update formula for
p:
$$ p^{0}=0; \,\,\, p_{i,j}^{n+1}= \frac{p_{i,j}^{n}+\tau \left(\nabla\left(D^{-1}\text{div} \, p^{n} -v/\lambda'\right)\right)_{i,j}}{1+\tau |\left(\nabla\left(D^{-1} \text{div} \, p^{n} -v/\lambda'\right)\right)_{i,j}|} $$
(15)
where D is a diagonal matrix whose diagonal elements are the values of W.
The regularization parameter,
λ
′, must be chosen according to (
7). The simplest approach is to set
λ
′=
λ
v(
i,
j), where
λ is a global regularization parameter and
v(
i,
j) is the value of the current pixel in the noisy image. Since
v(
i,
j) is noisy, a better choice is to use a weighted local average as the estimate of the intensity of the true image at the current pixel (note that the maximum-likelihood estimate of the mean of a Poisson process from a set of observations is the arithmetic mean of the observations). Therefore, we suggest the following choice for the local regularization parameter.
$$ {} \begin{aligned} \lambda'&= \lambda \, \frac{\sum_{-a \leq i',j' \leq a} W'\left(i',j'\right)v\left(i-i',j-j'\right)}{\sum_{-a \leq i',j' \leq a}W'\left(i',j'\right)}\quad \text{where}\\ W'(i,j)&= \exp \left(-\frac{\left(i^{2}+j^{2}\right)}{h'^{2}}\right) \end{aligned} $$
(16)
There are several parameters in the proposed algorithm. The global regularization parameter
λ controls the strength of the denoising. It should be set based on the desired level of smoothing. Parameter
m sets the size of the neighborhood considered around each pixel, which in this study was chosen to be a square window of size (2
m+1)×(2
m+1). Numerical experiments in [
29] have shown that in total variation denoising, the influence map of a pixel is usually limited to a radius of approximately 10 pixels for typical values of the regularization parameter. Therefore, a good value for
m would be around 10, which is the value we used for all experiments reported in this paper. The width of the Gaussian weighting function
W is adjusted through
h. We used
h=2
m which we found empirically to work well. Similarly,
a and
h
′ in (
16) determine the size of the window and the weights used for determining the local regularization parameter. These have to be set based on the noise level in the image; larger values should be chosen when noise is stronger. In this study, we used
a=4 and
h
′=2
a.
A simple implementation of the proposed algorithm can be computationally intensive because it will involve solving a minimization problem, though very small, for every individual pixel in the sinogram. This will be a major drawback because a big advantage of sinogram denoising methods, compared to iterative image reconstruction methods, is the shorter computational time. To reduce the computational time, after minimizing the local cost function (
14) around the current pixel, we will replace the value of all pixels in the window of size (2
a+1)×(2
a+1) around the current pixel, instead of just the center pixel, and then shift the window by (2
a+1). Our extensive numerical experiments with simulated and real projections showed that with this approach the results will be almost identical to the case where only one pixel is denoised at a time. This is the approach that we followed in all experiments reported in this paper.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
DK and PD conceived of the study. PD and NF performed the study design and data collection. Algorithm development and data analysis was carried out by DK and RW. DK prepared and submitted the manuscript. All authors cooperated in the interpretation of the results. All authors have read and approved the final manuscript.