Deep learning-based attenuation correction in the absence of structural information for whole-body positron emission tomography imaging

Xue Dong; Yang Lei; Tonghe Wang; Kristin Higgins; Tian Liu; Walter J Curran; Hui Mao; Jonathon A Nye; Xiaofeng Yang

doi:10.1088/1361-6560/ab652c

1. Introduction

Attenuation and scatter correction (referred throughout as 'AC') are essential components of positron emission tomography (PET) reconstruction that improves visual interpretation, and more importantly, enables absolute quantification. The prevailing AC method is to calculate attenuation factors and model scatter using information from a structural image obtained with either computed tomography (CT) or magnetic resonance imaging (MRI). CT-based AC (CTAC) methods are widely accepted given the simplicity of mapping Hounsfield units to 511 keV linear attenuation coefficients but limitations persist including propagation of CT-based artefacts and spatial inconsistencies such as PET-to-CT misalignment (Berker and Li 2016). In addition, concerns remains regarding the potential hazards of radiation exposures, especially CT doses (Journy et al 2014, 2017, Berrington de Gonzalez et al 2016, Fahey et al 2017) and excessive exposures on pediatric patients receiving sequential scans (Chawla et al 2010, Cheuk et al 2012), though the models used to estimate stochastic risks remain controversial (Fahey et al 2011).

MRI is a non-ionizing alternative to CT and has superior soft tissue contrast (Lei et al 2019c), but presents even greater challenges for AC due to an absence of a direct conversion method between MRI voxel intensity and 511 keV linear attenuation coefficients (Hofmann et al 2009, Mehranian et al 2016). Addressing this problem necessitates substantial pre-processing such as segmentation (Zaidi et al 2003, Hofmann et al 2009, Catana et al 2010, Keereman et al 2010, Fei et al 2012) and registration to atlas templates (Hofmann et al 2008, 2011, Malone et al 2011). Segmenting lung and cortical bone are difficult because the two only produce weak signals with conventional magnetic resonance (MR) sequences. Though ultrashort echo time (UTE) (Keereman et al 2010) and zero echo time (ZTE) (Leynes et al 2017) pulse sequences are investigated for bone visualization and segmentation, the performances were limited by high level of noise and image artefacts (Mehranian et al 2016). Moreover, these sequences provide limited diagnostic value compared to other conventional sequences and are employed for the sole reason of AC, which prolong the overall acquisition duration. Registration-based AC methods are usually computational costly, especially when multiple registrations are required. The accuracy of this technique depends highly on the registration accuracy, while accurate registration is not always guaranteed due to organ morphology and variability across patients. Additional concerns regarding the reproducibility of MRI and potential misalignment with PET images persist (Olin et al 2018). Image artefacts, such as truncation and distortion, can also propagate into the creation of the 511 keV attenuation map, adversely affecting PET quantification (Mehranian et al 2016).

The development of deep learning (DL) has demonstrated tremendous potential in computer vision as well as medical imaging (Shen et al 2017). Deep learning could help generate synthetic CT from MR images to predict AC maps (Lei et al 2018a, 2018b, Spuhler et al 2018, Dong et al 2019, Yang et al 2019). However, these methods still require structural images, and the accuracy is limited by image artefacts as well as inter-modality co-registration errors. To circumvent the need for structural information, we aimed to develop a deep learning-based method that learns the relationship between existing attenuation-corrected PET (AC PET) and non-attenuation-corrected PET (NAC PET) to directly map a new NAC to AC PET. This approach provides a solution to the AC problem that can bypass confounds associated with collection and processing of structural image data as described above. To generate AC PET images without the use of structural data, a supervised 3D patch-based cycle-consistent generative adversarial network (CycleGAN) architecture was employed for our deep learning-based AC (DL-AC) method to model non-linear mapping from NAC PET to AC PET. Several residual blocks, which aim to learn the difference between NAC PET and AC PET, were integrated into the CycleGAN architecture. Leave-one-out cross-validation was performed on whole-body PET images while the algorithm's reliability was further evaluated on an additional hold-out test dataset consisting of longitudinal PET scans.

2. Materials and methods

2.1. System overview

Figures 1 and 2 outline the schematic workflow for the training and correction stages of the proposed DL-AC method, respectively. For a given pair of NAC PET and corresponding AC PET, 3D patches of NAC PET and AC PET were extracted from NAC PET and AC PET images as training pairs. The 3D patches were extracted by sliding a window (with voxel size of 72 × 72 × 32) from NAC PET or AC PET image with an overlap (with voxel size of 60 × 60 × 24) between each two neighboring patches. The AC PET patch was used as the deep learning-based target of the NAC PET patch. The goal of DL-AC is to learn the mapping from NAC PET to AC PET to directly generate DL-AC PET images that can reach the same image quality level of an original AC PET image. Since the NAC PET image is contaminated with attenuation and scatter artifacts, training an NAC-PET-to-AC-PET (NAC-to-AC) mapping model is highly under-constrained, meaning artifacts may mislead the mapping. To cope with this issue, first, a CycleGAN architecture (Harms et al 2019, Lei et al 2019a) was applied into DL-AC to enforce the learned mapping from NAC-to-AC to be closer to one-to-one mapping via introducing an inverse AC-PET-to-NAC-PET (AC-to-NAC) mapping and using cycle consistent loss to supervise the two mappings. The two mappings were modeled by two generators. In addition, to increase the realism of the generated DL-AC PET image (synthetic AC), two discriminators were used to judge the realism of the synthetic image produced by the two generators. During correction stage, the patches of a new NAC PET image were fed into the trained NAC-to-AC generator to obtain DL-AC PET patches, and the final DL-AC PET image was reconstructed with patch fusion.

**Figure 1.** Training schematic flow chart of the proposed method.
Download figure:
Standard image High-resolution image

**Figure 2.** Testing (correction) schematic flow chart of the proposed method.
Download figure:
Standard image High-resolution image

2.2. Image acquisition

A retrospective sample of 25 whole-body PET patients was used for training and leave-one-out cross-validation. Each patient has one whole-body PET/CT image dataset. Each dataset contains AC PET, NAC PET and CT that were acquired in one exam. Among the 25 patients, the reasons for exam of eight patients were lung cancer, four lymphoma, four head and neck cancer, 3 for skin cancer, two breast cancer and 4 for abdominal cancer. Fourteen out of the 25 patients contain head in the PET/CT images. An additional cohort of ten whole-body patients, each with three sequential whole-body PET scans separated by approximately one month (30 datasets total) were excluded from training as hold-out test to further evaluate the proposed method. All the 10 patients were lung cancer patients, and 1 out of the ten patients contains head in PET/CT images. All PET data were acquired on a Discovery 690 PET/CT scanner (General Electric, Waukesha, WI) using a clinical 18F-FDG whole-body protocol. Briefly, patients were intravenously administered between 370 (body mass index (BMI) < 30) and 444 MBq (BMI >= 30) followed by a 60 min uptake period. Emission data were collected at a bed duration of 1.5 min (BMI <= 18.5), 2 min (18.5 < BMI < 25) or 2.5 min (BMI >= 25). AC PET Images were reconstructed as static images with a 3D order-subset expectation maximization algorithm (three iterations and 24 subsets) with time-of-flight and all corrections (scatter, randoms, attenuation, normalization and dead time) (Iatrou et al 2004). NAC PET did not include attenuation, scatter and time-of-flight correction. The final reconstructed matrix size was 192 × 192 with a pixel size is 3.65 × 3.65 × 3.27 mm³. The number of slices of PET ranges from 263 to 515.

2.3. Network architecture

Figure 3 shows the generator and discriminator network architectures used in the proposed method. As can be seen from figure 3, the network architecture of discriminator is a traditional FCN (Lei et al 2019b), which include several convolution layers followed by max-pooling and a sigmoid layer to obtain binary output. The generator architecture (of both NAC-to-AC and AC-to-NAC) is an end-to-end U-Net including encoding and decoding paths. The encoding path is composed of two convolution layers followed by max-pooling to reduce the feature maps size. The decoding path is composed of two deconvolution layers to obtain end-to-end mapping and a tanh layer to perform the regression. In order to combine the features extracted from encoding path and decoding path, a short connection was used to bypass the features extracted from previous hidden layer to current hidden layer. The short connection was implemented by six residual blocks, since the residual block could lead the feature maps extracted from deep hidden layer to learn the difference of source and target images' distributions. This enforces the DL-AC focus on learning image differences between the NAC PET and AC PET, which would be mainly the attenuation and scatter artifact.

Convolutional neural networks (CNN) with residual blocks have achieved promising results in tasks where source and target images are largely similar, much like the relationship between NAC PET and AC PET images. Each residual block includes a residual connection and multiple hidden layers. Through the residual connection, an input bypasses the hidden layers of a residual block, thus these hidden layers are enforced to learn specific differences between input and output, which would be attenuation and scatter artifacts. As shown in generator architecture of figure 3, a residual block is implemented by two convolution layers within residual connection and an element-wise sum operator.

2.4. Cycle consistent loss

Generative adversarial network (GAN) relies on two sub-networks, a generator and a discriminator, which work in competition with each other. Given a NAC PET and an AC PET image, an initial mapping is learned to be able to generate a DL-AC (synthetic AC) PET image from a NAC PET image. The generator generates a DL-AC PET image that can fool the discriminator to misrecognize the image as an original AC PET image. Conversely, the discriminators' training objective is to decrease the judgment error of the discriminator network, and enhance the ability to differentiate DL-AC PET from AC PET. As these networks are pitted against each other, the capabilities of each improve, leading to more accurate DL-AC PET generation. CycleGAN doubles the process of GAN by enforcing inverse mapping. This doubly constrains the model and can increase accuracy of output images. As shown in figure 1, during training stage, the extracted patches of the NAC PET were fed into the generator (NAC-to-AC) to get the equal-sized DL-AC PET patches. The DL-AC PET was then fed into another generator (AC-to-NAC) to generate NAC PET, referred to as cycle NAC PET. To enforce forward-backward consistency, the extracted patches of the training AC PET are also fed into the two generators to produce a synthetic NAC PET and cycle AC PET.

To train the CycleGAN, the learnable parameters of generators and discriminators were optimized iteratively and in an alternative manner. The accuracy of both networks is directly dependent on the design of their corresponding loss functions. The generator loss consists of an adversarial loss and a cycle consistency loss. The goal of the adversarial loss is to improve the generator to produce the synthetic images that can fool the discriminators via minimizing adversarial losses, which relies on the output of the discriminators, i.e. the distribution of feeding synthetic AC image (generated from NAC-to-AC generator ${{G}_{{\rm NAC}-{\rm AC}}}$ )) into the discriminator of AC and the distribution of feeding synthetic NAC image (generated from AC-to-NAC generator ${{G}_{{\rm AC}-{\rm NAC}}}$ ) into the discriminator of NAC. For clarity, we present only formulation for ${{G}_{{\rm NAC}-{\rm AC}}}$ .

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {{L}_{{\rm adv}}}({{G}_{{\rm NAC}-{\rm AC}}},{{D}_{{\rm AC}}},{{I}_{{\rm NAC}}}) ~ ={\rm SCE}\left[{{D}_{{\rm AC}}}\left({{G}_{{\rm NAC}-{\rm AC}}}({{I}_{{\rm NAC}}}) \right),1 \right]\nonumber \end{align} \tag{ 1 }$

where ${{I}_{{\rm NAC}}}$ denotes the NAC PET image and ${{G}_{{\rm NAC}-{\rm AC}}}({{I}_{{\rm NAC}}})$ is the output of the NAC-to-AC generator, i.e. the DL-AC (or synthetic AC). ${{D}_{{\rm AC}}}$ is the AC discriminator which is designed to return a binary value indicating whether a distribution is real (from AC) or fake (from synthetic AC). The function ${\rm SCE}\left(\cdot ,1 \right)$ is the sigmoid cross entropy between the distribution output of discriminator and a unit.

The cycle consistent loss is computed as the combination of the mean squared error (MSE) and gradient difference error (GDE) between the original images and the cycle images. The MSE loss forces the generator to synthesis AC images with accurate voxel intensity to a level of ground truth AC images. The GDE loss forces the synthetic AC images' gradient structure to a level of ground truth AC images.

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {{L}_{{\rm cyc}}}\left({{G}_{{\rm NAC}-{\rm AC}}},{{G}_{{\rm AC}-{\rm NAC}}},{{I}_{{\rm NAC}}},{{I}_{{\rm AC}}} \right)=\begin{matrix} {\rm MSE}\left[{{G}_{{\rm AC}-{\rm NAC}}}\left({{G}_{{\rm NAC}-{\rm AC}}}({{I}_{{\rm NAC}}}) \right),{{I}_{{\rm AC}}} \right] \nonumber \\ +\lambda \cdot {\rm GDE}\left[{{G}_{{\rm AC}-{\rm NAC}}}\left({{G}_{{\rm NAC}-{\rm AC}}}({{I}_{{\rm NAC}}}) \right),{{I}_{{\rm AC}}} \right] \nonumber \\ \end{matrix}\nonumber \end{align} \tag{ 2 }$

where $\lambda$ is a parameter which control the balance of MSE and GDE loss for cycle consistency. ${{G}_{{\rm AC}-{\rm NAC}}}\left({{G}_{{\rm NAC}-{\rm AC}}}({{I}_{{\rm NAC}}}) \right)$ is the output of first feeding ${{I}_{{\rm NAC}}}$ into the generator ${{G}_{{\rm NAC}-{\rm AC}}}$ and then feeding the output into the generator ${{G}_{{\rm AC}-{\rm NAC}}}$ , namely the output of this term denotes the cycle NAC. The parameter $\lambda$ was set to 10 in this work.

Finally, the optimization of generator is obtained by

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {{G}_{{\rm NAC}-{\rm AC}}},{{G}_{{\rm AC}-{\rm NAC}}}= ~ \underset{{{G}_{{\rm NAC}-{\rm AC}}},{{G}_{{\rm AC}-{\rm NAC}}}}{\mathop{{\rm arg}\,{\rm min}}}\,\left\{\begin{matrix} {{L}_{{\rm adv}}}({{G}_{{\rm NAC}-{\rm AC}}},{{D}_{{\rm AC}}},{{I}_{{\rm NAC}}})+{{L}_{{\rm adv}}}({{G}_{{\rm AC}-{\rm NAC}}},{{D}_{{\rm NAC}}},{{I}_{{\rm AC}}}) \nonumber \\ {{L}_{{\rm cyc}}}\left({{G}_{{\rm NAC}-{\rm AC}}},{{G}_{{\rm AC}-{\rm NAC}}},{{I}_{{\rm NAC}}},{{I}_{{\rm AC}}} \right)+{{L}_{{\rm cyc}}}\left({{G}_{{\rm AC}-{\rm NAC}}},{{G}_{{\rm NAC}-{\rm AC}}},{{I}_{{\rm AC}}},{{I}_{{\rm NAC}}} \right) \nonumber \\ \end{matrix} \right\}.\nonumber \end{align} \tag{ 3 }$

The optimization of discriminator is obtained by

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {{D}_{{\rm AC}}},{{{\rm D}}_{{\rm NAC}}}=\underset{{{D}_{{\rm AC}}},{{{\rm D}}_{{\rm NAC}}}}{\mathop{{\rm arg}\,{\rm min}}}\,\left\{\left. \begin{matrix} {\rm SCE}\left[{{D}_{{\rm AC}}}\left({{G}_{{\rm NAC}-{\rm AC}}}({{I}_{{\rm NAC}}} \right)),0 \right]+ ~{\rm SCE}\left[{{D}_{{\rm AC}}}\left({{I}_{{\rm AC}}} \right),1 \right]\quad \nonumber \\ {\rm +SCE}\left[{{D}_{{\rm NAC}}}\left({{G}_{{\rm AC}-{\rm NAC}}}({{I}_{{\rm AC}}} \right)),0 \right]+ ~{\rm SCE}\left[{{D}_{{\rm NAC}}}\left({{I}_{{\rm NAC}}} \right),1 \right] \nonumber \\ \end{matrix} \right\}. \right.\nonumber \end{align} \tag{ 4 }$

To supervise the generators and discriminators via the proposed loss functions, Adam gradient optimizer with learning rate of 2 × 10⁻⁴ was used for optimization. The batch size was set to 20. The number of training iterations was set to 8.6 × 10⁴. The proposed algorithm was implemented by Python 3.7 and TensorFlow as in-house software on a NVIDIA Tesla V100 GPU with 32GB of memory.

2.5. Validation and evaluations

To evaluate the reliability of the proposed method on predicting the quantification changes over time, we calculate the mean error (ME), normalized mean square error (NMSE), peak signal to noise ratio (PSNR) and normalized cross correlations (NCC) metrics between DL-AC PET and AC PET on the evaluation dataset. These metrics were calculated within whole-body volume and within contoured organs, such as brain, lung, heart, left and right kidney and liver. The calculation of these metrics is as follows:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm ME}=\frac{\sum\nolimits_{i\in V}\left({{I}_{{\rm AC}}}\left(i \right)-{{I}_{{\rm DL}-{\rm AC}}}\left(i \right) \right)}{\sum\nolimits_{i\in V}{{I}_{{\rm AC}}}\left(i \right)}\nonumber \end{align} \tag{ 5 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm NMSE}=\frac{\sum\nolimits_{i\in V}{{\left({{I}_{{\rm AC}}}\left(i \right)-{{I}_{{\rm DL}-{\rm AC}}}\left(i \right) \right)}^{2}}}{\sum\nolimits_{i\in V}{{\left({{I}_{{\rm AC}}}\left(i \right) \right)}^{2}}}\nonumber \end{align} \tag{ 6 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm PSNR}=10{\rm lo}{{{\rm g}}_{10}}\left(\frac{N\cdot {{\max}_{i\in V}}\left({{I}_{{\rm AC}}}\left(i \right),{{I}_{{\rm DL}-{\rm AC}}}\left(i \right) \right)}{\sum\nolimits_{i\in V}{{\left({{I}_{{\rm AC}}}\left(i \right)-{{I}_{{\rm DL}-{\rm AC}}}\left(i \right) \right)}^{2}}} \right).\nonumber \end{align} \tag{ 7 }$

ME and NMSE are averaged over all the voxels, $i$ , inside the contoured organs or whole-body volume, V, with N the total number of voxels inside the volumes. ${{I}_{{\rm AC}}}\left(i \right)$ and ${{I}_{{\rm DL}-{\rm AC}}}\left(i \right)$ are PET intensities after AC and the proposed DL-AC, respectively. ${\rm ma}{{{\rm x}}_{i\in V}}(\cdot)$ is the max intensity inside the delineated volume.

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm NCC}=\frac{\sum\nolimits_{i\in V}C\cdot \left({{I}_{{\rm AC}}}\left(i \right)-{\rm mean}\left({{I}_{{\rm AC}}}\left(i \right) \right) \right)\cdot \left({{I}_{{\rm DL}-{\rm AC}}}\left(i \right)-{\rm mean}\left({{I}_{{\rm DL}-{\rm AC}}}\left(i \right) \right) \right)}{{\rm std}\left({{I}_{{\rm AC}}} \right)\cdot {\rm std}\left({{I}_{{\rm DL}-{\rm AC}}} \right)}\nonumber \end{align} \tag{ 8 }$

where ${\rm mean}\left(\cdot \right)$ and ${\rm std}\left(\cdot \right)$ calculates the mean and standard deviation (STD) intensity among selected voxels. The intensity profiles of AC PET and DL-AC PET which plot the intensities of voxels one by one along a line in the image volume, are also shown in one figure to qualitatively demonstrate their differences.

In order to quantify the lesion detectability, we calculated and compared the signal-noise-ratio (SNR) and contrast-noise-ratio (CNR) of the lesion in both AC PET and DL-AC PET (Qi 2001, Bao and Chatziioannou 2010, Schaefferkoetter et al 2017). The SNR is defined as

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm SNR}=\frac{{\rm mean}\left(I\left(i \right) \right)}{{\rm std}(I\left(i \right))},\nonumber \end{align} \tag{ 9 }$

where $I\left(i \right)$ are the voxels in the lesion of AC PET or DL-AC PET. The CNR is defined as

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm CNR}=\frac{{\rm mean}\left(I(i) \right)-{\rm mean}\left({{I}_{b}}\left(i \right) \right)}{{\rm std}(I\left(i \right))},\nonumber \end{align} \tag{ 10 }$

where ${{I}_{b}}\left(i \right)$ are the intensities of voxels in the 2 mm margin around the lesion which are considered as background.

Leave-one-out cross-validation experiments were performed with 25 patients, each of which has one PET/CT image dataset. For each experiment, 24 sets of images were used for training and the remaining 1 set for validation. The experiment repeated 25 times to make each set used as test data exactly once. A separate hold-out test was also used to evaluate the proposed method, we trained the model by previous 25 patients and test the model by an additional ten patients, each of which includes three sequential scans for a total of 30 datasets. The change between two sequential PET scans is of clinical interest which may indicate the treatment outcomes or tumor growth. We calculated such change using both AC PET and DL-AC PET and compared the two results in order to demonstrate the reliability of the proposed method in quantifying the change between two sequential PET scans. Volumes of interest (VOIs) were manually delineated over structures of lung, heart, liver, bilateral kidneys and brain if patient dataset contains, based on CT, and lesions based on PET as indicated in patient's clinical report. The whole-body, which is defined as the region within patient body, was also segmented based on CT as a VOI to quantify the general performance across the whole patient body. Statistic T-test was also performed to quantify the statistically difference between AC PET and DL-AC PET images in the sequential scan cohort.

In order to compare the proposed method with other state-of-art learning based method in PET AC, we implemented UNET (Van Hemmen et al 2019) and GAN (Kurz et al 2018) using the same datasets. The UNET method's architecture is a deep convolutional encoder-decoder network structure with 50% dropout, batch normalization, and max pooling (Van Hemmen et al 2019). The GAN method's architecture consists a generator architecture, which is U-shaped deep convolutional neural network (U-Net-like), to synthesize DL-AC from NAC PET, and a discriminator architecture, which is a fully convolutional network, to judge the realism of synthesized DL-AC (Kurz et al 2018). Compared with UNET and GAN, CycleGAN could enforce the mapping of NAC-to-AC to be closer to one-to-one mapping, although NAC-to-AC mapping is an ill-posed problem. Specifically, we trained both UNET and GAN using the 25 patients and tested on the ten patients with sequential scans as we did for CycleGAN. Same quality metrics and VOIs were also used in statistical analysis.

3. Results

3.1. Leave-one-out cross-validation study

Figure 4 shows a comparison of images from the AC PET and the proposed DL-AC PET results on one patient as test dataset among the 25 patients in the leave-one-out cross-validation study. The images generated with the DL-AC method show excellent resemblance to the AC PET images. More explicit comparison is illustrated in figure 5, where the profiles of the DL-AC PET data agree well with those of AC PET. The joint histogram of voxels from the whole-body region is close to identity line indicating good intensity agreement between reference AC PET and our generated DL-AC PET. Figure 6 shows representative plans of brain, lesion in lung and kidneys of AC PET and DL-AC PET and their difference maps, in addition to those of liver shown in figure 4. Output of the validation metrics are listed in table 1. The ME on brain, lung, heart, kidneys, liver and lesion are all less than 4%, with NMSE less than 1.5%. The ME and NMSE on whole body are −0.01% ± 2.91% and 1.21% ± 1.73%. The NCC is close to identity, demonstrating excellent intensity similarity between the AC PET and DL-AC PET. For comparison, the NCC between DL-AC PET and NAC PET is 0.644 ± 0.214 in whole body region. PSNR are all around or larger than 30 dB.

**Figure 5.** Comparison of intensity profiles (top left: dashed line, top right: dotted line, bottom left: dash-dotted line in figure 4(c)) and (bottom right) a joint histogram of all voxels in the whole-body region between AC PET and DL-AC PET.
Download figure:
Standard image High-resolution image

**Figure 6.** Representative brain, lesion in lung, and kidney planes of AC PET (left column) and DL-AC PET (middle column) images and their difference maps (right column). The red dotted boxes indicate the position of the lesion. The unit of color bar is Bq/ml.
Download figure:
Standard image High-resolution image

Table 1. Model performance results of the leave-one-out cross-validation study on 25 sets of whole-body PET images. Data are reported as mean ± STD.

VOI	ME (%)	NMSE (%)	NCC	PSNR (dB)
Brain	1.23 ± 5.16	0.70 ± 0.84	0.977 ± 0.005	29.2 ± 4.2
Lung	−3.79 ± 7.89	1.25 ± 1.75	0.974 ± 0.060	40.7 ± 12.8
Heart	2.15 ± 4.61	0.54 ± 0.90	0.979 ± 0.063	38.8 ± 10.3
Lt kidney	1.37 ± 7.01	1.17 ± 2.37	0.989 ± 0.023	36.5 ± 8.9
Rt kidney	1.08 ± 6.25	1.35 ± 3.04	0.989 ± 0.022	37.8 ± 10.3
Liver	0.89 ± 6.00	0.54 ± 1.07	0.976 ± 0.061	36.9 ± 10.6
Lesion	2.34 ± 3.65	0.29 ± 0.36	0.978 ± 0.030	34.0 ± 7.8

Whole body	−0.01 ± 2.91	1.21 ± 1.73	0.989 ± 0.015	43.1 ± 4.6

3.2. Hold-out validation study

Figure 7 shows the side-by-side comparison of AC PET and DL-AC PET on one patient from a representative sequential PET scan and figure 8 plots the cranial-caudal profile and whole-body joint histograms from the same data. Excellent agreement is observed in both PET image and profile comparisons. The joint histogram shows an intensity distribution that is close to the line of identity, indicating good agreement between reference AC PET and the DL-AC PET from proposed method. Table 2 lists the average quantification results across all scans. Because only one set of images included head in the scan, brain was excluded from the statistical analysis. As indicated in table 2, the ME and NMSE generated with the proposed method are less than 3.5% on all contoured organs except lung. The NCCs are close to identity. For comparison, the NCC between DL-AC PET and NAC PET is 0.618 ± 0.168 in whole body region. The PSNR are all larger than 30dB. Among the 30 datasets, the lesion SNRs on AC PET and DL-AC PET are 4.188 ± 3.086 and 4.143 ± 3.047, respectively, with p -value = 0.393, and the lesion CNRs are 1.548 ± 0.787 and 1.528 ± 0.768, respectively, with p -value = 0.350. Thus there is no statistical significance in lesion SNR or CNR between AC PET and DL-AC PET.

Table 2. Model performance results of the sequential whole-body PET dataset (n = 30). Data are reported as mean ± STD.

VOI	ME (%)	NMSE (%)	NCC	PSNR (dB)
Lung	−17.02 ± 11.98	3.61 ± 3.60	0.992 ± 0.007	37.3 ± 7.0
Heart	2.11 ± 2.51	0.26 ± 0.37	0.997 ± 0.003	35.3 ± 5.5
Lt kidney	3.02 ± 3.90	2.15 ± 6.21	0.984 ± 0.044	35.6 ± 5.0
Rt kidney	2.88 ± 3.79	2.91 ± 7.27	0.978 ± 0.052	35.7 ± 5.1
Liver	2.93 ± 2.45	0.20 ± 0.19	0.994 ± 0.008	34.9 ± 5.7
Lesion	2.85 ± 5.21	0.52 ± 1.88	0.964 ± 0.046	33.9 ± 7.2

Whole body	0.62 ± 1.26	0.72 ± 0.34	0.992 ± 0.004	44.3 ± 3.5

Each patient received three sequential PET scans, therefore we calculated the relative PET intensity changes of the second scan over the first scan, the third over the first and the third over the second to evaluate the reliability of the proposed method. The results are summarized in table 3. Note that the STD in some VOIs such as lesion is large because the intensity change between sequential scans varies a lot from patient to patient. Figure 9 illustrates joint histogram of percentage changes calculated on the contoured organs and lesions with AC PET and DL-AC PET. The linear regressions are almost identical to the line of identity, which indicates excellent agreement of the intensity distributions captured on AC PET and DL-AC PET. The average difference of the percentage changes calculated with the two methods is less than 3% on all contoured volumes. The p -values using t-test (last column in table 3) are all larger than 0.05, indicating no statistically significant difference between the two methods.

Table 3. Intensity changes (%) on sequential PET scans. 'Difference' is the difference of intensity changes between AC PET and DL-AC PET. 'p -value' is the average p-value among all VOIs in the t-test of intensity changes between AC-PET and DL-1AC PET. Data are reported as mean ± STD.

Scan		Lung	Heart	Lt kidney	Rt kidney	Liver	Lesion	p -value
2nd scan over 1st scan	AC PET	−15.1 ± 27.5	7.7 ± 53.4	−6.8 ± 22.9	−7.2 ± 18.3	−12.3 ± 18.1	50.2 ± 83.7	0.883
	DL-AC PET	−17.4 ± 31.4	7.4 ± 51.8	−5.6 ± 20.9	−6.3 ± 17.8	−12.7 ± 18.3	51.5 ± 82.7
	Difference	−2.3 ± 4.9	−0.3 ± 2.0	1.3 ± 4.0	1.0 ± 3.5	−0.3 ± 1.6	1.2 ± 9.3

3rd scan over 1st scan	AC PET	−18.8 ± 25.7	−7.4 ± 33.7	−8.7 ± 20.6	−9.7 ± 20.4	−15.1 ± 15.6	56.3 ± 67.3	0.844
	DL-AC PET	−21.7 ± 28.8	−7.5 ± 33.1	−7.6 ± 18.5	−8.4 ± 19.1	−15.7 ± 15.0	56.8 ± 65.7
	Difference	−2.9 ± 4.7	−0.2 ± 1.2	1.1 ± 4.1	1.3 ± 4.4	−0.5 ± 1.5	0.5 ± 8.6

3rd scan over 2nd scan	AC PET	−0.1 ± 25.7	0.6 ± 47.7	1.4 ± 25.5	0.0 ± 28.6	−0.7 ± 21.8	14.7 ± 45.9	0.727
	DL-AC PET	0.1 ± 26.3	0.0 ± 46.4	0.8 ± 24.4	0.4 ± 27.5	−1.0 ± 21.0	15.0 ± 47.8
	Difference	0.1 ± 3.6	−0.6 ± 1.6	−0.5 ± 1.7	0.3 ± 2.0	−0.3 ± 1.4	0.3 ± 3.3

**Figure 9.** Linear regression analysis of PET quantification changes on both selected organs and lesions of (left) 2nd scan over 1st scan, (middle) 3rd scan over 1st scan and (right) 3rd scan over 2nd scan. The red dashed lines are linear regressions, with regression equation and R2 value displayed on the bottom right.
Download figure:
Standard image High-resolution image

3.3. Comparison Study with state-of-the-art methods

Table 4 lists the average quantification results across all scans among the 10 patients of 30 scans using UNET and GAN and is compared with those of CycleGAN, the proposed method, from table 2. Both UNET and GAN show inferior performance in most metrics and VOIs and larger variation among patients. They also have much large bias found in lesion than the proposed method.

Table 4. Model performance results of the sequential whole-body PET dataset (n = 30) using UNET, GAN and proposed CycleGAN. Data are reported as mean ± STD.

VOI	Method	ME (%)	NMSE (%)	NCC	PSNR (dB)
Lung	UNET	13.84 ± 10.11	7.79 ± 3.88	0.889 ± 0.046	32.6 ± 4.3
	GAN	13.42 ± 10.13	8.03 ± 4.21	0.885 ± 0.052	32.7 ± 4.6
	CycleGAN	−17.02 ± 11.98	3.61 ± 3.60	0.992 ± 0.007	37.3 ± 7.0

Heart	UNET	1.30 ± 5.22	1.42 ± 0.59	0.931 ± 0.051	26.0 ± 2.3
	GAN	−0.85 ± 3.96	1.34 ± 0.50	0.931 ± 0.053	26.0 ± 2.5
	CycleGAN	2.11 ± 2.51	0.26 ± 0.37	0.997 ± 0.003	35.3 ± 5.5

Lt kidney	UNET	−0.16 ± 9.40	3.83 ± 4.17	0.930 ± 0.060	27.0 ± 3.8
	GAN	−0.68 ± 8.30	3.55 ± 3.93	0.934 ± 0.049	27.3 ± 3.8
	CycleGAN	3.02 ± 3.90	2.15 ± 6.21	0.984 ± 0.044	35.6 ± 5.0

Rt kidney	UNET	1.10 ± 6.85	4.36 ± 5.31	0.950 ± 0.029	27.7 ± 2.3
	GAN	1.51 ± 6.22	4.26 ± 5.26	0.950 ± 0.028	27.7 ± 2.4
	CycleGAN	2.88 ± 3.79	2.91 ± 7.27	0.978 ± 0.052	35.7 ± 5.1

Liver	UNET	−1.96 ± 4.58	1.32 ± 0.49	0.860 ± 0.047	25.3 ± 3.0
	GAN	−2.09 ± 4.05	1.19 ± 0.37	0.872 ± 0.039	25.8 ± 3.1
	CycleGAN	2.93 ± 2.45	0.20 ± 0.19	0.994 ± 0.008	34.9 ± 5.7

Lesion	UNET	18.59 ± 18.00	27.69 ± 25.41	0.900 ± 0.070	18.1 ± 3.2
	GAN	19.72 ± 21.12	10.78 ± 9.43	0.898 ± 0.067	17.4 ± 3.0
	CycleGAN	2.85 ± 5.21	0.52 ± 1.88	0.964 ± 0.046	33.9 ± 7.2

Whole body	UNET	2.05 ± 2.21	2.36 ± 0.01	0.972 ± 0.012	39.3 ± 2.4
	GAN	2.25 ± 1.93	2.30 ± 0.66	0.973 ± 0.012	39.4 ± 2.3
	CycleGAN	0.62 ± 1.26	0.72 ± 0.34	0.992 ± 0.004	44.3 ± 3.5

4. Discussion

We present the feasibility of using a deep learning CycleGAN to perform DL-AC from NAC PET without the use of structural information. The method produces highly accurate tracer distribution estimations that are in high agreement with AC PET as evaluated with leave-one-out cross-validation. In addition, the longitudinal evaluation dataset produced excellent agreement with AC PET, demonstrating the model's high reliability. The proposed method provides an alternative to CT and MR-based AC and has the potential to substantially reduce CT dose from serial exams and eliminate the collection of structural information solely for AC. The proposed method has the potential to avoid the quantification bias caused by truncation artifacts of CT or MRI as well as CT- and MRI-PET co-registration errors.

Traditional GAN (Kurz et al 2018) methods train two networks with a generator mapping from source images to target images, and evaluate the transformation with a discriminator. Due to the presence of noise in both input sources and output targets in the model training, it would be difficult to ensure the generator in GAN learns meaningful mapping, and there is possibility for more than one mapping that yields the same output from a given input. CycleGAN adds more constraints to the generator by introducing an inverse transformation in a circular manner. This effectively prevent model collapse and helps the generator to find a unique meaningful mapping.

Though PET/MRI has been increasingly implemented in daily clinical applications, serious technical challenges remain in deriving accurate quantitative measurements. One major concern is the bias caused by current vendor implementations of MR-based AC methods and the resultant limitation in quantitative longitudinal therapy monitoring studies (Catana et al 2018). We evaluate the reliability of the proposed method on quantifying tracer update changes over sequential scans and found that tracer uptake changes calculated on DL-AC PET matches the references well within 3% ('difference' rows in table 3). The linear regression analysis also indicates excellent correlation between DL-AC PET and reference on longitudinal evaluation. These results demonstrate the capability and the potential of the proposed method for quantitative longitudinal therapy monitoring.

Performing attenuation correction (AC) with only NAC images is inherently challenging due to very limited availability of anatomical information of PET images. Regardless of those challenges, we are still able to obtain competitive performances comparing to state-of-the-art techniques. Hofmann et al proposed a MR-based AC method combined atlas and pattern-recognition, and applied this method on both brain (number of subjects N = 17) and whole body (N = 11) imaging (Hofmann et al 2008,2011). This AC method obtains a mean PET quantification error of 3.2% ± 2.5% SUV on brain imaging, and 7.7% ± 8.4% SUV on whole body PET images with 14.0% ± 11.4% SUV in thorax region. Paulus et al proposed a model based MRI AC method for whole body (N = 20) PET/MRI (Paulus et al 2015). The average PET quantification error is 2.7% and 4.9% on normal soft tissue and bone. The ME for soft tissue lesions excluding lung cases is 5.2% ± 5.2%, and that for bone lesions is 2.9% ± 5.8%. The quantification errors are over 20% on lung lesions, with the maximum error over 50%. In our study, the mean PET quantification error obtained with the proposed DL-AC method is 0.62% ± 1.26%, with the highest on lung of −17.02% ± 11.98%.

Though similar idea of using only NAC PET to perform AC was proposed previously, their performances are usually limited without the help of deep learning. Nuyts et al proposed a maximum-likelihood reconstruction method to compensate for the photon attention in the reconstruction process (Nuyts et al 1999) and evaluated on one patient. The average PET quantification error obtained with this method was over 20% across the image volume. A more recent study on performing AC with NAC PET was performed by Chang et al (2012) using an iterative AC method. Their approach segmented tissue into three types, air, soft tissue and lung, and performed AC by assigning corresponding linear attenuation coefficients. Absolute observed differences were 6%–10% on mean SUV and 3% ± 9% mean difference on max SUV with phantom studies. They also evaluated the max SUV uptakes with patient data (N = 11) and found the mean quantification errors are 3% ± 6% and 8% ± 7% on bone and soft tissue lesions. This AC method does not provide bone segmentation, which could cause large quantification errors in brain imaging.

We performed both leave-one-out cross-validation study with 25 sets of data and reliability study with 30 sets of data that were not used for model training. The low sample size of patients involved in training and testing limited this study as a proof-of-concept study. However, the training process is still valid since the number of training samples are thousands of that of patients. First, we used patch-based method to enlarge the training samples. For example, in our one training patient volume with size of 192 × 192 × 299 voxels, the training samples were obtained by extracting 3D patches (size of 72 × 72 × 32) with overlapping size of 60 × 60 × 24 between each two neighboring patches. Thus, the number of training samples for this training patient reached at 3400. In addition, data augmentation was also used to enlarge the training samples' variability. Flipping, rotation and rigid warping were used such that the size of training samples for each training patients was enlarged by 18 times.

In table 3, we used t-test to quantify the significance levels of the performance difference between our DL-AC PET and AC PET. T-test is a common method to compare the mean values between two groups with assumption of normal distribution. Wilcoxon test is usually an alternative to t-test when the assumption of normal distribution is in doubt. We recalculated the p -values in table 3 using Wilcoxon test, which were 0.513, 0.618 and 0.612 corresponding to the p -value column from up to bottom. All the p -values were larger than 0.05 as t-test did. Thus, the conclusion that no statistically significant difference between the two methods remains unchanged.

The proposed self AC method demonstrated similar quantification performances on heart, kidney, liver and lesion with both ME and NMSE less than 3.1%. Lower quantification accuracy is observed on lung with reliability study. This may be due to the discrepancy between the data used for training and the 30 sets of data for reliability evaluation. Only 8 out of 25 sets of training data are obtained from lung cancer patients, while all ten patients for reliability study were lung cancer patients. This likely generated large discrepancies in the PET quantification of the lung region. By including more representative data in the training process, the issue can be mitigated. It is also worth noting that lung has shown a much higher error compared with other region of body in other AC studies as well (Hofmann et al 2011, Paulus et al 2015). A potential reason can be that the activity in lung is usually much lower than other selected organs, thus a similar amount of error may leads to much higher relative error in percentage.

The proposed method was implemented and evaluated with PET/CT data, and could also be applied on PET/MRI. Our method captures the nonlinear relationship between NAC PET and AC PET, and corrects for patient attenuation and scatter which depends on patient anatomy and administered radiotracer. When applying the proposed method on PET/MRI, the quantification performances could be affected by MR coil attenuation as well as differences in protocol and machine settings. With appropriate scanning protocols and machine calibration, their impact on the quantification accuracy could be minimized. In the future, we will implement the proposed method with PET/MRI data for further validation.

The proposed method needs CT or MRI-based AC PET to serve as training targets. However, as long as the training stage is finished, the trained model would be used to generate DL-AC PET with input of NAC PET solely, which does not require CT or MRI. When the train model is used to predicting DL-AC PET on the same scanner that it got trained, the CT or MRI acquisition now can be skipped, which would have advantages in eliminating radiation dose from CT or long acquisition time from MRI. Meanwhile, the trained model is actually not necessarily to be used on the same scanner. It can be used to predict DL-AC PET on another PET scanner without CT or MR functionality, thus it can potentially enable a sole PET scanner to provide a comparable AC PET as a PET/CT or PET/MR. However, using training model on a different PET scanner has not been studied in this paper. It may affect the performance of the proposed method, which would be investigated in future study.

In this study, we proposed a novel learning-based PET AC method. An intermediate number of patients with anatomical variations and pathology abnormalities is investigated to demonstrate the feasibility of the proposed method. In order to further implement the proposed method in clinic, the reliability of DL-AC PET can be validated by involving a larger population of patients with diverse demographics and pathological abnormalities. Different testing and training datasets from different scanners and institutes would be valuable to further evaluate the clinical utility of our method. Subjective scoring or blind assessment in the potentially underestimated region on DL-AC PET with known ground truth would be helpful to understand its clinical impact in diagnostic accuracy.

5. Conclusions

We proposed a deep learning-based approach to create a fully corrected DL-PET dataset from NAC PET by effectively capturing the non-linear relationship between the NAC and AC PET. The CycleGAN deep learning approach adds additional constraints to model training to constrain NAC-to-AC mapping to be closer to one-to-one mapping. The residual networks enforce the network focus on learning the attenuation and scatter artifacts of NAC PET images. The method demonstrates excellent quantification accuracy and reliability and is applicable to PET data collected on a single PET scanner or hybrid platform (PET/CT or PET/MRI).

Acknowledgments

This research was supported in part by the National Cancer Institute of the National Institutes of Health under Award No. R01CA215718 and the Emory Winship Cancer Institute pilot grant.

Disclosures

Dr Kristin Higgins is consulting for AstraZeneca and Varian, on an advisory board for Genentech, and receiving research funding from RefleXion Medical.

Deep learning-based attenuation correction in the absence of structural information for whole-body positron emission tomography imaging

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction