Iterative quality enhancement via residual-artifact learning networks for low-dose CT

Yongbo Wang; Yuting Liao; Yuanke Zhang; Ji He; Sui Li; Zhaoying Bian; Hao Zhang; Yuanyuan Gao; Deyu Meng; Wangmeng Zuo; Dong Zeng; Jianhua Ma

doi:10.1088/1361-6560/aae511

1. Introduction

Computed tomography (CT) has been widely used for clinical purposes. However, the radiation dose during the CT exams has been reported to be linked with increased risk of cancer, which has become an issue in CT examinations and has generated great concern (Pierce and Preston 2000). Radiation dose risk needs to be balanced with benefits. One of the effective means to achieve radiation dose reduction is to lower the x-ray tube current (mA) (McCollough et al 2006). However, with the low-dose projection measurements, the CT image quality by the conventional filtered back-projection (FBP) method would be degraded, if there is no adequate handling of the data noise. Therefore, to yield high-quality CT images at low-dose cases is a hot topic in the CT imaging field.

Several commercial IR techniques have been released by the vendors, i.e. ASIR of GE Healthcare and SAFIRE of Siemens Healthcare (Geyer et al 2015). Moreover, it is reported that model-based iterative reconstruction (MBIR) can provide images with better diagnostic performance against the conventional FBP algorithm at lower radiation dose (Geyer et al 2015, Yan et al 2018). Furthermore, to improve low-dose CT image quality, various advanced approaches have been proposed including statistical iterative reconstruction (SIR) algorithms (Tian et al 2011, Ma et al 2012b, Xu et al 2012, Zhang et al 2014, Zhang et al 2015, Hu et al 2016, Zhang et al 2016, 2016a, 2016b, 2017a) sinogram-domain restoration algorithms (Li et al 2004, La Rivière 2005, Wang et al 2006, Forthmann et al 2010, Little and La Rivière 2015), and image-domain restoration algorithms (Schaap et al 2008, Ma et al 2011, Li et al 2014, Irrera et al 2016, Kim et al 2016, Li et al 2017). Among them, it was reported that the SIR algorithms, utilizing the knowledge of the underlying physics, can substantially improve CT image quality compared to the FBP algorithm. A major drawback of the SIR algorithms is the larger computational requirement and more calculational complexity than non-SIR algorithms because multiple forward projection and backward projection operations are included in the reconstruction process. An alternative way to reduce the computation burden is to estimate the desired sinogram data from the noisy sinogram measurements via the statistical sinogram restoration strategies. Then the restored sinogram data can be used to reconstruct high-quality CT images via the FBP algorithm (Li et al 2004, La Rivière 2005, Wang et al 2006, Forthmann et al 2010, Little and La Rivière 2015). The gain in efficiency is at the cost of suffering from noticeable resolution loss. Different from the aforementioned methods, image-domain restoration algorithms conduct linear/nonlinear filters on low-dose CT images to achieve noise reduction without noticeable resolution loss. Various algorithms have been investigated, such as pixel-wise algorithms (Schaap et al 2008) and patch-wise algorithms (Ma et al 2011). A comprehensive review of these methods can be found in Zhang et al (2014) and (2017a).

Recently, machine learning has demonstrated strong capability in many computer vision areas, such as image/video denoising, inpainting, super-resolution and low-light image enhancement (Dong et al 2016, Liu et al 2016, Lore et al 2016, Bae et al 2017, Zhang et al 2017b, 2017c). Specifically, deep neural network frameworks, i.e. convolutional neural networks (ConvNet) are becoming the most promising technique with impressive performance. The machine learning-based methods stack multiple-layers of collaborative auto-encoders or directly learn non-linear mapping from the objective space (i.e. low-quality images with serious noise and artifacts) to the desired space (i.e. corresponding high-quality ones with little noise and artifacts). It is noted that the deep learning methods allow end-to-end training of all the model components between the objective input and the desired output, therefore, they can obtain significant improvements over most of the image restoration methods. The success of ConvNet can be attributed to many factors, including deep architecture, batch normalization (BN) (Ioffe and Szegedy 2015), rectifier linear unit (ReLu) (Nair and Hinton 2010), residual learning and high-performance graphical processing units (GPUs). ConvNet-based restoration algorithms have been also applied on low-dose CT imaging processing (Chen et al 2017, Hu et al 2017, Jin et al 2017, Kang et al 2017a, 2017b, Wolterink et al 2017, Wu et al 2017, Yang et al 2017). For instance, Chen et al utilized a conventional ConvNet to directly suppress noise in a low-dose CT image (Hu et al 2017). Then, Kang et al applied the ConvNet to wavelet transform coefficients of low-dose CT images to effectively reduce CT noise (Kang et al 2017a). To obtain better performance in CT image restoration, Kang et al also developed an improved ConvNet wherein the network estimates the noise of wavelet transform coefficients in each band; the desired images can be estimated from the de-noised wavelet coefficients by subtracting the band-specific noise (Kang et al 2017b). In addition, Yang et al applied a generative adversarial network (GAN) with Wasserstein distance and perceptual similarity to suppress noise in the low-dose CT images (Yang et al 2017). However, the major drawback of these ConvNet strategies is that they could not remove the severe noise-induced artifacts in the ultra-low-dose cases. One reason is that some strong noise-induced artifacts in ultra-low-dose CT images may resemble small lesions and the ConvNet cannot tell the difference between them (Wolterink et al 2017).

In this study, inspired by the natural gains of residual networks in image restoration (Zhang et al 2017b), we develop an iterative residual-artifact learning ConvNet (IRLNet) approach to improve the reconstruction performance against the conventional residual network. Specifically, the proposed IRLNet estimates the high-frequency details within the noise and then removes them iteratively, after eliminating severe streaks in the low-dose CT images, the residual low-frequency details can be processed through the conventional network. Moreover, the proposed IRLNet scheme can be extended for robust handling of quantitative dual energy CT/cerebral perfusion CT imaging and statistical iterative reconstruction. The corresponding experimental results demonstrate the efficacy of the proposed IRLNet in reducing both noise and artifacts, improving the detectability of low-contrast objects, and preserving the resolution of the reconstructed CT images.

2. Methods and materials

2.1. Image restoration with the residual learning convolutional neural network (RLNet) approach

Deep ConvNets have been designed that directly learn the nonlinear mapping from the objective space (i.e. low-quality images with serious noise and artifacts) to the desired space (i.e. corresponding high-quality ones with little noise and artifacts) in a way similar to coupled sparse coding (Gregor and Lecun 2010) by building a relationship between the objective and desired images represented by the deep networks. The structure of the ConvNets consists of sequential convolution modules such as convolution, BN and ReLU, as shown in figure 1(a). It is noted that the ConvNets are trained end-to-end between the objective and desired images in supervised fashion, and the ConvNets can obtain significant improvement over most of the image restoration methods. Up to now, many ConvNet-based approaches have been developed for images restoration, operating either in the image domain or transformation domain (Bae et al 2017, Zhang et al 2017b, 2017c). For example, the recent deep residual learning convolutional neural network (RLNet) approach (Zhang et al 2017b) utilizes a residual learning strategy to estimate a noisy observation ${{\mathcal R}}\left({{I_{{{\rm measured}}}}} \right)$ from the measurement ${I_{{{\rm measured}}}}$ , and then the desired images are obtained through removing the estimated noisy observation, i.e. ${I_{{{\rm estimated}}}} = {I_{{{\rm measured}}}} - {{\mathcal R}}\left({{I_{{{\rm measured}}}}} \right)$ . The RLNet takes advantage of both BN and residual learning to speed up the training and improve the image restoration performance.

**Figure 1.** (a) The architecture of the RLNet approach, (b) the high- and low-dose CT images reconstructed by the FBP and RLNet approaches, respectively. All the images are displayed in the same window of $[-160, 240]$ HU.
Download figure:
Standard image High-resolution image

From figure 1(a) it can be seen that the RLNet with the depth of D contains three types of layers, i.e. Conv + ReLU layers, Conv + BN + ReLU layers, and Conv layers. In particular, the first module excluding the BN layer can produce 64 feature maps by using 64 filters with the size of $3 \times 3 \times 1$ . Then these feature maps are processed by the ReLU. In the Conv + BN + ReLU layers, 64 filters with the size of $3\times3\times64$ are introduced, and then the BN is employed between convolution filters and ReLU operations in layers 2 ∼ (D − 1), wherein the BN operation has the ability to alleviate the internal covariate shift. The last module only containing one Conv layer can construct the desired output from the processed feature maps with a single filter with the size of $3\times3\times64$ . During the training, loss function was utilized to learn the trainable parameters $\Theta$ and can be expressed as follows:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq1} \begin{array}{@{}l@{}} E(\Theta)=\frac{1}{{2N}}\sum\limits_{i = 1}^N \| {{\mathcal R}}({I_{{{\rm measured}},i}};\Theta)-({I_{{{\rm measured}},i}}-{I_{{{\rm estimated}},i}})\|_F^2, \end{array} \nonumber \end{align} \tag{ 1 }$

where $\left\{{\left({{I_{{{\rm measured}}, i}}, {I_{{\rm estimated}, i}}} \right)} \right\}_{i = 1}^N$ represents N pair noisy images and its corresponding clean images. $\left\| \cdot \right\|_F^2$ is the Frobenius norm.

2.2. Image restoration with the iterative residual-artifact learning ConvNet (IRLNet) approach

Although the RLNet approach can remove noise in CT images to some extent, the residual noise-induced artifacts still exist in the RLNet processed CT images, as shown in figure 1(b). One reason is that the prominence of the both orientation and intensity features in artifacts are similar to those in normal tissues in the feature maps. To overcome this, we propose an image restoration with the IRLNet approach to improve the CT image reconstruction performance. The resulting IRLNet structure is illustrated in figure 2. It can be observed that the proposed IRLNet approach mainly contains three steps, i.e. a residual image estimation step, a high-frequency details of residual image estimation step, and a ConvNet process step.

Residual image estimation. During this residual image estimation process, a RLNet approach (Zhang et al 2017b) is utilized to estimate the initial residual image. In the implementation, the RLNet operation is the same as the work in Zhang et al (2017b). It is obvious that the residual image yielded contains almost all of the noise and a great number of artifacts, as shown in figure 2.
High-frequency details of residual image estimation. Figure 2 shows that most of the artifacts have similar orientation at the high-frequency band $I_{{{\rm HFR}}}^i$ in the wavelet domain of residual images. Hence, to obtain the high-frequency band $I_{{{\rm HFR}}}^i$ of the residual image, the 2D discrete wavelet analysis is used. The idea behind it is that there are some important image features in the low-frequency band $I_{{{\rm LFR}}}^i$ , and a clinically acceptable CT image is obtained with difficulty directly via RLNet individually. Therefore, in this work, we utilize the high-frequency details in the wavelet domain to capture the artifact information in the residual images.
ConvNet process. After obtaining the intermediate image $I_{{{\rm tmp}}}^i$ at the ith iteration, i.e. $I_{{{\rm tmp}}}^i = I - I_{{{\rm HFR}}}^i$ , a conventional ConvNet approach is utilized to process the intermediate image $I_{{{\rm tmp}}}^i$ at the ith iteration. Finally, the desired image $I_{{{\rm output}}}^K$ can be achieved.

**Figure 2.** (a) The flowchart of the proposed IRLNet approach for low-dose CT imaging, (b) the flowchart of wavelet-based processing: the residual image is equal to $FBP(10~{\rm mA}) - RLNet(10~{\rm mA})$ .
Download figure:
Standard image High-resolution image

**Figure 2.** (a) The flowchart of the proposed IRLNet approach for low-dose CT imaging, (b) the flowchart of wavelet-based processing: the residual image is equal to $FBP(10~{\rm mA}) - RLNet(10~{\rm mA})$ .
Download figure:
Standard image High-resolution image

2.3. Implementation details

The training of the RLNet in the IRLNet is to determine a set of parameters, i.e. $\Theta = \left\{{{W_d}, {B_d};d = 1, ..., D} \right\}$ , via minimizing the loss between the network low-dose image I and the corresponding high-dose image ${I^{\prime}}$ . Given a set of pairs $\left\{{I, {I^{\prime}}} \right\}$ a commonly used loss function for regression tasks, and is defined as follows:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq2} E\left(\Theta \right) = \frac{1}{{2N}}\sum\limits_{i = 1}^N \| {{\mathcal R}}({I_{i}};\Theta)-({I_{i}}-{I^{\prime}_{i}})\|_F^2, \nonumber \end{align} \tag{ 2 }$

where N is the number of training samples. The network depth (D) was set to 13, and the training patch size was set to $38\times38\times22$ with the sliding interval of 4 pixels all for the three directions of three-dimensional CT images.

The loss function was minimized via a mini-batch stochastic gradient descent (SGD) algorithm (Cherry et al 1998). In the implementation, the batch size, momentum, and weight decay for the mini-batch SGD were set to 256, 0.7, and 10⁻⁴, respectively. The learning rates of all the convolution layers were set to 10⁻² during the first five epochs and the others were set to 10⁻⁵ in the remaining epochs. The filter weights of each layer were initialized by a Gaussian function with a zero mean and standard deviation of $\sqrt {{2 / M}}$ , with M indicating the number of incoming nodes of one neuron. The initial biases of each convolution layer were set to zero.

The MatConvNet toolbox (Vedaldi and Lenc 2014) was used to train the proposed method in MATLAB (2015b). The training time for the 2D case was 8 h 10 min, and for the 3D case it was 10 h 15 min, on a workstation (Intel(R) Core(TM) i7-5820K 3.30 GHz CPU and a Titan X GPUs).

2.4. IRLNet for real scenarios

2.4.1. Object-driven CT image restoration

Deep learning can store the gained knowledge while solving one problem and apply it to a different but related problem (Dong et al 2016, Liu et al 2016, Lore et al 2016, Bae et al 2017, Zhang et al 2017b, 2017c). When applying the gained knowledge from one task in another task, it is often necessary to leverage the characteristics of a source task to the target task. This strategy has been successfully utilized in the computer vision area, e.g. super resolution (Liu et al 2016). Most of the recent ConvNet-based CT image restoration approaches have been focused on conventional two-dimensional (2D) and/or three-dimensional (3D) CT images processing, and they can boost the restoration performance compared to the conventional CT restoration approach (Wang et al 2006, Ma et al 2011, Zhang et al 2016). Meanwhile, they do not take into account two promising modalities, i.e. dual energy CT (DECT) imaging (Alvarez et al 2004), and sequenced cerebral perfusion CT (CPCT) imaging (Axel 1980, Wintermark et al 2008, Nett et al 2010, Zeng et al 2016d, 2017). One reason is that the images from the two modalities are less accessible than the conventional CT. Another possible reason is that the DECT/CPCT images are much more complicated than the conventional CT images because two different energy spectra scans are needed in DECT for material decomposition or/and monochrome image synthesis, and sequence dynamic scans are needed in perfusion CT for estimating perfusion hemodynamic maps (PHM). It is worth noting that the IRLNet is object-driven and can be easily adapted for a particular object-driven task. The purpose of this work is to try to take advantage of the aforementioned merit of deep models by learning the proposed model, and then present an object-driven approach using the fine-tuned and learned IRLNet approach. Specifically, the networks trained by the conventional CT images are introduced to process the low-dose DECT/CPCT images to yield promising results. This work demonstrates that the characteristics of one task can be mapped to another complicated task with high flexibility.

2.4.2. PWLS framework with IRLNet regularization

Considering that the IRLNet reconstructed images have fewer noise-induced artifacts than the low-dose FBP images, the high-quality component information in the IRLNet restored images as a priori knowledge can also be introduced into a statistical iterative reconstruction framework, for example penalized weight least-square (PWLS) (Ma et al 2012a), to improve reconstruction performance in low-dose CT imaging. According to our previous studies (Ma et al 2012b), in this study we propose an IRLNet restored image induced PWLS reconstruction framework, and the corresponding objective function can be written as follows:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq3} \tilde x = \mathop {\arg \min }\limits_{x \geqslant {{\rm 0}}} {\left({y - Ax} \right)^\prime }{\Sigma ^{- 1}}\left({y - Ax} \right) + \beta \left\| {x - IRLNet\left(x \right)} \right\|_p^{\,p}, \nonumber \end{align} \tag{ 3 }$

where y represents the measured sinogram data, and A is the system matrix wherein A_ij denotes the contribution of the jth pixel in the attenuation map to the ith projection ray. Σ is a diagonal matrix with element of $\sigma^{2}$ , the variance of sinogram data is y, and in the implementation the calculation of the $\Sigma^{-1}$ was based on the previous study (Ma et al 2012a). $\beta$ is a hyper-parameter modulating the tradeoff between the data-fidelity term and the regularization term. $\left\| \cdot \right\|_p^{\,p}$ denotes the p-norm of the discrete magnitude of image, and in this study, p was set to be 2. The Gauss–Seidel (GS) updating strategy was employed to minimize the objective function in equation (3). For simplicity, the PWLS framework is dubbed 'PWLS-IRLNet'. The optimization of the PWLS-IRLNet is listed in algorithm 1

Algorithm 1. PWLS-IRLNet for CT image reconstruction.

Require: $\tilde{x } = FBP\left \{y \right \}$ , $m = A\tilde{x }$ , $\tilde{r } = y-q$ , $D=\mathrm{diag}\left \{\frac{(N_{i}^{o}e^{-q_{i}}){}^{2}}{N_{i}^{o}e^{-q_{i}}+\sigma _{e}^{2}} \right \}$ ;

$\lambda _{j}=A_{j}^{T}DA_{j}, \forall j$ $\lambda _{j}=A_{j}^{T}DA_{j}, \forall j$

Ensure: $\tilde{x}$ $\tilde{x}$

1: While the stopping criteria are not satisfied do

2: $\tilde{x}_{j}^\mathrm{old}=\hat{x}_{j}$ $\tilde{x}_{j}^\mathrm{old}=\hat{x}_{j}$ ;

3: $\tilde{x}_{j}^\mathrm{new}=\frac{A_{j}^{T}D\tilde{r}+\lambda _{j}+\beta IRLNet\left (x \right) }{\lambda _{j}+\beta }$ $\tilde{x}_{j}^\mathrm{new}=\frac{A_{j}^{T}D\tilde{r}+\lambda _{j}+\beta IRLNet\left (x \right) }{\lambda _{j}+\beta }$ ;

4: $\tilde{x}_{j}=\mathrm{max}\left\{0, \tilde{x}_{j}^\mathrm{new}\right\}$ $\tilde{x}_{j}=\mathrm{max}\left\{0, \tilde{x}_{j}^\mathrm{new}\right\}$ ;

5: $\tilde{r} = \tilde{r} + A_{j}\left (\tilde{x}_{j}^\mathrm{old}-\tilde{x}_{j} \right)$ $\tilde{r} = \tilde{r} + A_{j}\left (\tilde{x}_{j}^\mathrm{old}-\tilde{x}_{j} \right)$ ;

6: $D=\mathrm{diag}\left \{\frac{(N_{i}^{o}e^{-\sum_{j}A_{ij}\tilde{x}_{j}}){}^{2}}{N_{i}^{o}e^{-\sum_{j}A_{ij}\tilde{x}_{j}}+\sigma _{e}^{2}} \right \}$ $D=\mathrm{diag}\left \{\frac{(N_{i}^{o}e^{-\sum_{j}A_{ij}\tilde{x}_{j}}){}^{2}}{N_{i}^{o}e^{-\sum_{j}A_{ij}\tilde{x}_{j}}+\sigma _{e}^{2}} \right \}$ ;

7: $\lambda _{j}=A_{j}^{T}DA_{j}, \forall j$ $\lambda _{j}=A_{j}^{T}DA_{j}, \forall j$ ;

8: End while

$N_{i}^{o}$ is the mean number of photons before entering the object and going toward the detector bin i. $\sigma _{e}^{2}$ denotes the electronic noise. $N_{i}^{o}$ and $\sigma _{e}^{2}$ are estimated with the given scanning protocol (Ma et al 2012a, Zhang et al 2016). A_j represents the jth column of the system matrix A.

2.5. Experimental data acquisition

In this work, under the authorization from the Mayo clinic for 'The 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge'^*, the clinical data were used to validate and evaluate the performance of the proposed IRLNet approach. In the examination, 120 kVp and 200 effective mA were utilized, which served as the reference tube potential and quality reference effective mA (also referred to as the high-dose case). For low-dose CT images, we simulated them from the high-dose ones using the simulation techniques (Zeng et al 2015). In this study, the simulated low-dose cases are from 10 mA to 100 mA with 5 mA intervals. In DECT and CPCT imaging studies, the institutional review board approved this study and written informed consent was obtained from each patient. Patient A with coronary atherosclerotic plaques and patient B with brain deficits were scanned with a GE Discovery CT750 HD scanner with helical scanning mode to acquire DECT data and CPCT data, respectively. In DECT imaging, the scanning protocol was 80 kVp with 100 mA, and 140 kVp with 100 mA. And in the CPCT imaging, the tube voltage was 120 kVp, and tube current was 150 mA. The low-dose DECT/CPCT data were also acquired using the simulation method (Zeng et al 2015).

3. Results

3.1. The properties of the RLNet-based approach

3.1.1. The effect of the dimension of data

Figure 3 shows the CT images reconstructed by the proposed IRLNet approach from 2D and 3D CT image datasets, respectively. It is noted that in the 3D cases, the selective transverse slices are the same as those in the 2D cases. It can be seen that the 3D cases from the proposed IRLNet approach perform much better than the 2D cases in terms of noise-induced artifact suppression and resolution preservation while some serious artifacts still exist in the 2D cases. A reasonable fact is that the adjacent slices in the 3D cases are highly correlated and contain more structural information than the 2D cases. Thus, the RLNet-based approaches can successfully extract useful structural information from the 3D cases to train the network. Table 1 lists the quantitative comparisons of two different approaches from 2D and 3D CT image datasets. The results explicitly illustrate that the 3D cases for the proposed IRLNet approach can achieve higher peak signal-to-noise ratio (PSNR) and feature-similarity (FSIM) index (Zhang et al 2011), and lower normalized mean square error (NMSE) measurements than the 2D cases. With the above observation, we adopt the network trained from 3D cases as the default network in the following experiments.

Table 1. The quantitative comparisons of the two different approaches from 2D and 3D CT image datasets.

	Patient L067			Patient L192
	FBP (10 mA, 2D)	IRLNet (10 mA, 2D)	IRLNet (10 mA, 3D)	FBP (60 mA, 2D)	IRLNet (60 mA, 2D)	IRLNet (60 mA, 3D)
PSNR	13.9332	31.5381	33.6761	29.2304	39.1980	39.6139
NMSE	0.3955	0.0522	0.0407	0.0704	0.0223	0.0212
FSIM	0.7058	0.8676	0.8863	0.8896	0.9295	0.9452

3.1.2. The effect of the amount of training data

In the training, two strategies are utilized. The first one is to only employ the CT images (i.e. 22 800 images) at target mA to train the network, henceforth termed as 'T-Net', and the other one is to employ the CT images (i.e. 433 200 images) at all mA to train the network, henceforth termed as 'A-Net'. Figure 4(a) shows the low-dose CT images and the corresponding ones reconstructed by the proposed IRLNet approach through T-Net and A-Net, respectively. Several observations can be made easily from the results. Firstly, it is evident that both of the two nets via the proposed IRLNet approach can suppress noise-induced artifacts to varying extent. Secondly, although the A-Net via the proposed IRLNet approach can yield high-quality CT images in some cases, i.e. 30 and 60 mA, some severe noise-induced artifacts still exist in the resultant images of the ultra-low-dose case. Thirdly, the T-Net combining with the proposed IRLNet approach performs better than the A-Net combing the proposed IRLNet approach at effectively removing the noise-induced artifacts while preserving the essential structure in the CT images, especially in ultra-low-dose cases, i.e. 10 mA. For instance, the zoomed images of three red selected ROIs (ROIs 1, 2, and 3) as indicated in figure 4(a). In addition, figure 4(b) shows the T-Net combining the proposed IRLNet approach preserves the edge clearer and sharper than other approaches, as indicated by the red arrow. The reason may be that the T-Net is more focused than the A-Net on the mA-specific case, leading to better-targeted reconstruction performance. It is worth noticing that for a fair comparison, the number of data should be equal in both cases but the data for the T-Net is from a specific x-ray current due to our limited available data resource. In future study, we will try to augment the number of training data in the T-Net to make the fair comparison.

**Figure 4.** (a) The high-dose and low-dose CT images reconstructed by three different approaches at three different mA from the two different pre-trained networks, (b) the zoom-in ROIs as indicated by the three red boxes (ROIs 1, 2, and 3) in (a). All the images are displayed in the same window of $[-160, 240]$ HU.
Download figure:
Standard image High-resolution image

3.1.3. The effect of the iterations in the proposed IRLNet approach

In this subsection, we display the proposed IRLNet approach results as a function of iteration numbers. As can be observed in figure 5(a), the image quality is becoming better and better by visual inspection and the resultant images progressively approximates the corresponding normal-dose ones as the iteration number increases. However, as the number of iterations increases, the computational burden of the proposed IRLNet approach also increases. Furthermore, figure 5(b) shows the quantitative comparisons of different iterations. The results demonstrate that the proposed IRLNet approach at the second iteration can produce relatively satisfactory achievement. To obtain better tradeoff between image quality and computational burden, in this study, the images at the second iteration were considered as the appropriate result in the proposed IRLNet approach processing.

3.2. Comparison results

Figures 6, 8 and 9 show the CT image reconstructed with the six different approaches at three different mA, i.e. 10, 30 and 60 mA. In particular, to estimate the modulation transfer function (MTF) of different approaches, we simulated a calcification with the diameter of 14 mm and density of 1.35 g cm⁻³ on the patient myocardium. The reconstruction results and MTF curves of the different approaches are shown in the figure 8. Figure 7 shows the profile comparison of the CT images reconstructed by different approaches at 10 mA in Figure 6 along the red line. Tables 2 and 3 summarize the FSIM and NMSE scores of these CT images at 60 mA. From the results, it can be concluded that: (1) all the network-based approaches can yield high-quality images with less noise-induced artifacts than the corresponding low-dose ones reconstructed by the FBP approach; (2) all the network-based approaches can yield better indices than the FBP approach in all the cases, indicating the ability of network-based approaches to reliably capture more patch information for improving the restoration performance; and (3) the proposed IRLNet approach has the lowest level of noise-induced artifacts, followed by the RLNet, ConvNet, WaveNet (Kang et al 2017a), and RED-CNN (Chen et al 2017), in which only minor artifacts can be observed at 10 mA. The proposed IRLNet approach can yield much closer profile to the one at 200 mA (ground truth) and better image quality (i.e. less noise and better resolution) relative to the other approaches. The corresponding spatial resolution measurements, i.e. MTF, are shown in figure 8(b).

**Figure 6.** The high- and low-dose CT images reconstructed by the six different approaches at 10 mA. All the images are displayed in the same window of $[-160, 240]$ HU.
Download figure:
Standard image High-resolution image

**Figure 7.** Profile comparison of the CT images in figure 6 reconstructed by the six different approaches at 10 mA along the red line.
Download figure:
Standard image High-resolution image

**Figure 8.** (a) The high- and low-dose CT images reconstructed by the six different approaches at 30 mA, (b) MTF curves obtained by the five different approaches at 30 mA. All the images are displayed in the same window of $[-200, 400]$ HU.
Download figure:
Standard image High-resolution image

**Figure 8.** (a) The high- and low-dose CT images reconstructed by the six different approaches at 30 mA, (b) MTF curves obtained by the five different approaches at 30 mA. All the images are displayed in the same window of $[-200, 400]$ HU.
Download figure:
Standard image High-resolution image

**Figure 9.** The high- and low-dose CT images reconstructed by the six different approaches at 60 mA. All the images are displayed in the same window of $[-160, 240]$ HU.
Download figure:
Standard image High-resolution image

Table 2. The FSIM measurements of low-dose CT images reconstructed by the six different approaches at 60 mA.

Patient	FBP	ConvNet	WaveNet	RED-CNN	RLNet	IRLNet
Patient L067	0.8748	0.9216	0.9054	0.9250	0.9056	0.9395
Patient L192	0.8580	0.9117	0.8968	0.8968	0.8965	0.9314
Patient L310	0.8314	0.9054	0.9046	0.9046	0.8840	0.9292

Table 3. The NMSE measurements of low-dose CT images reconstructed by the six different approaches at 60 mA.

Patient	FBP	ConvNet	WaveNet	RED-CNN	RLNet	IRLNet
Patient L067	0.1213	0.0363	0.0497	0.0408	0.0357	0.0328
Patient L192	0.1395	0.0418	0.1007	0.0492	0.0409	0.0385
Patient L310	0.1474	0.0401	0.0544	0.0501	0.0398	0.0365

In addition, the proposed IRLNet approach exhibits an average of more than 23.12%, 16.76%, 28.76%, and 26.55% gains over the ConvNet, RLNet, WaveNet and RED-CNN, respectively, confirming the visual observations. More details are listed in our supplementary material (stacks.iop.org/PMB/63/215004/mmedia).

3.3. IRLNet for real scenarios

3.3.1. Object-driven CT image restoration

Considering that the proposed IRLNet approach is object driven, it can be easily adapted for a practical task. Figure 10 shows the DECT images reconstructed by the FBP, BM3D (Dabov et al 2007), ConvNet and the proposed fine-tuned IRLNet approaches. As shown in figure 10, the proposed IRLNet approach can remove almost all artifacts and can reconstruct all essential structures, which is promising for practical applications such as quality enhancement of low-dose DECT imaging. Figure 10 also shows the corresponding virtual monochromic image (VMI) reconstructed from the DECT images. The proposed IRLNet approach can provide similar VMI with the normal-dose one. Table 4 illustrates the quantitative measurements of the DECT images reconstructed by the FBP, BM3D, ConvNet and proposed IRLNet approaches. From the results, it can be observed that the proposed IRLNet approach can produce better quantitative measurements than the other approaches at the two cases. Figure 11(a) shows the CPCT images reconstructed by the FBP, BM3D, ConvNet and the proposed fine-tuned IRLNet approaches. Figure 11(b) shows their corresponding perfusion hemodynamic maps (PHM), i.e. cerebral blood flow (CBF). Specifically, the PHM can be calculated from the CPCT images via image-based deconvolution algorithms, such as the singular value decomposition (SVD)-based algorithm (Wu et al 2003). The results also demonstrate the performance of the proposed IRLNet approach, which is consistent with the observations in DECT results. In addition, the CBF maps were scored by three experienced physicians from the following attributes: images noise, artifacts, edge and structure, overall image quality and stroke region estimation with a five-point scale from 1 (worst) to 5 (best) (Zeng et al 2016c). The corresponding scores are displayed in table 5. It is evident that the proposed IRLNet performs the best of all.

**Figure 10.** The high- and low-dose DECT images reconstructed by the FBP, BM3D, ConvNet and proposed IRLNet approaches with the fine-tuned network, respectively. All the images are displayed in the same window of $[-200, 550]$ HU.
Download figure:
Standard image High-resolution image

**Figure 11.** (a) The high- and low-dose CPCT images reconstructed by the FBP, BM3D, ConvNet and proposed IRLNet approaches with the fine-tuned network, (b) the corresponding CBF maps estimated from the CECT images reconstructed by the FBP, BM3D, ConvNet and proposed IRLNet approaches with the fine-tuned network, respectively. The unit is ml/100 g/min. Display window is [0, 200] HU for (a).
Download figure:
Standard image High-resolution image

Table 4. The quantitative measurements of the DECT images reconstructed by the FBP, BM3D, ConvNet and proposed IRLNet approaches.

		FBP	BM3D	ConvNet	IRLNet
140 kVp	PSNR (dB)	23.8222	30.8738	33.3690	33.5865
	FSIM	0.9219	0.9324	0.9401	0.9527
	NMSE	0.0637	0.0283	0.0225	0.0207

80 kVp	PSNR (dB)	27.1051	35.2280	35.9916	36.1532
	FSIM	0.9277	0.9294	0.9411	0.9532
	NMSE	0.0660	0.0259	0.0246	0.0233

Table 5. The physician's scores of the estimated CBF maps from the different approaches.

Approaches	FBP	BM3D	ConvNet	IRLNet
Score	2.50	3.17	3.67	4.17

3.3.2. PWLS framework with IRLNet regularization

Figure 12 shows the high- and low-dose CT images reconstructed by the FBP, PWLS-TV, PWLS-NLM, IRLNet and proposed PWLS-IRLNet approaches at 10 mA, respectively. Although the PWLS-TV and PWLS-NLM approaches can reduce the noise to some extent, the severe 'structure' artifacts still exist in these results. Meanwhile, the proposed PWLS-IRLNet approach achieves better reconstruction qualities and preserves the fine image details without obvious artifacts, as shown in the zoom-in ROI indicated by the red box. Moreover, table 6 lists the quantitative comparisons of low-dose CT images reconstructed by the FBP, IRLNet, PWLS-TV, PWLS-NLM and proposed PWLS-IRLNet approaches. It can be further observed that the two IRLNet-based approaches perform better than the other approaches, demonstrating the efficiency of the proposed IRLNet approach. Moreover, through the combination of the SIR framework and the IRLNet prior information, better image quality (i.e. less noise, and better resolution) can be achieved with the higher PSNR, FSIM values and smaller NMSE values over the other approaches. Table 7 lists the average computing time of different approaches in one iteration for a slice CT image. It is obvious that the proposed IRLNet approach performs much better than the other three statistical iterative approaches. At the same time, the proposed PWLS-IRLNet approach is longer than the conventional approaches, i.e. PWLS-TV and PWLS-NLM.

Table 6. The quantitative measurements of low-dose CT images reconstructed by the five different approaches.

	PSNR	FSIM	NMSE
FBP	14.1127	0.7108	0.3541
PWLS-TV	30.0336	0.8665	0.0566
PWLS-NLM	30.1723	0.8612	0.0557
IRLNet	32.3046	0.8874	0.0438
PWLS-IRLNet	32.6947	0.8919	0.0417

Table 7. Computing time at one iteration for all compared approaches.

Approaches	PWLS-TV	PWLS-NLM	IRLNet	PWLS-IRLNet
Computing time (s/slice)	0.3176	0.3821	0.2351	0.4940

4. Discussion and conclusion

The conventional ConvNet-based approaches have achieved great performance improvement in low-dose CT image reconstruction (Chen et al 2017, Hu et al 2017, Kang et al 2017a, 2017b, Wolterink et al 2017), and have demonstrated superiority over traditional methods with respect to both reconstruction accuracy and computational efficiency. Most of the ConvNet-based approaches learn mapping from low-dose CT images to corresponding high-dose ones in an end-to-end manner. However, some limitations still exist in them. The first one is that some of the noise-induced artifacts in ultra-low-dose CT images resemble small lesions and the ConvNet-based approaches cannot tell the difference between the artifacts and real lesions. Therefore, residual noise-induced artifacts could not be removed totally and may cause false positivity in diagnosis (Chen et al 2017, Kang et al 2017a, 2017b). The second one is that the previous ConvNet-based approaches cannot predict accurate HU values in the reconstructed CT images, and they could lead to variety in HU values in the material-specific regions (Wolterink et al 2017). Up to now, many strategies have been developed to address the two issues for yielding high-quality CT images. In this work, we proposed an IRLNet approach to suppress noise-induced artifacts in low-dose CT images, especially in ultra-low-dose cases. Specifically, the proposed IRLNet approach estimates the high-frequency details within the noise and then removes them iteratively. After eliminating severe streaks in the low-dose CT images, the residual low-frequency details can be processed through the conventional network. The proposed IRLNet approach constructs a deeper neural network, and can suppress noise-induced artifacts effectively at ultra-low-dose cases. The experimental results in section 3 showed that the proposed IRLNet approach can yield more significant gains than the existing ConvNet-based approaches in terms of different measurement metrics.

In this work, we utilized the fine-tuned IRLNet learnt from a conventional 3D CT images dataset in DECT and CPCT imaging applications, and then we also obtained high-quality DECT/CPCT images and corresponding decomposed material images/PHM. The experimental results demonstrated that the proposed IRLNet approach can be efficiently transferred from a conventional CT imaging task to another specific CPCT/DECT imaging task by re-using the intermediate representation in the proposed IRLNet approach, although the HU values in DECT/CPCT images are different from the conventional CT images. Moreover, the improved image quality acquired from the proposed IRLNet approach can promote the SIR performance, i.e. accelerating the reconstruction speed, and yielding clinically acceptable images with the prior IRLNet images.

The success of the proposed IRLNet approach in CT imaging stems from the iterative residual-artifact learning. The high-frequency noise-induced artifacts can be removed from objective CT images iteratively. In addition, the 3D images used in network learning can help to promote the proposed IRLNet approach performance because the adjacent slices were highly correlated. Last but not the least, in this study a realistic low-dose CT simulation tool was introduced to simulate low-dose CT images from corresponding normal-dose ones, which can help dataset augmentation to improve the proposed IRLNet approach performance. Meanwhile, a potential limitation of the proposed IRLNet approach is that we only evaluated the proposed IRLNet approach using two clinical taskx, i.e. coronary atherosclerotic plaques and strokes, to analyze the generalization of the approach. In practice, there is much interest in the development ofa high-quality ultra/low-dose CT imaging for all body areas and diseases. More experiments with larger datasets should be performed to validate the approach more thoroughly. In the future, we will try to evaluate and validate the proposed IRLNet approach on a variety of clinically relevant tasks such as lung imaging (Prakash et al 2010) and abdomen imaging (Singh et al 2011), which would be an useful and interesting research topic. In addition, in future work, we will consider incorporating the wavelet-based processing step into the network by extra network layers.

Acknowledgments

This work was supported in part by the NSFC under Grant 81701690, Grant U1708261, Grant 61871383, Grant 61701217 and Grant 61571214, in part by the Science and Technology Program of Guangzhou, China, under Grant 201705030009, and in part by the Science and Technology Program of Guangdong, China, under Grant 2015B020233008.

Iterative quality enhancement via residual-artifact learning networks for low-dose CT

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Methods and materials

2.1. Image restoration with the residual learning convolutional neural network (RLNet) approach