GPU-based cone beam computed tomography

doi:10.1016/j.cmpb.2009.08.006

Computer Methods and Programs in Biomedicine

Volume 98, Issue 3, June 2010, Pages 271-277

https://doi.org/10.1016/j.cmpb.2009.08.006 Get rights and content

Abstract

The use of cone beam computed tomography (CBCT) is growing in the clinical arena due to its ability to provide 3D information during interventions, its high diagnostic quality (sub-millimeter resolution), and its short scanning times (60 s). In many situations, the short scanning time of CBCT is followed by a time-consuming 3D reconstruction. The standard reconstruction algorithm for CBCT data is the filtered backprojection, which for a volume of size 256³ takes up to 25 min on a standard system. Recent developments in the area of Graphic Processing Units (GPUs) make it possible to have access to high-performance computing solutions at a low cost, allowing their use in many scientific problems. We have implemented an algorithm for 3D reconstruction of CBCT data using the Compute Unified Device Architecture (CUDA) provided by NVIDIA (NVIDIA Corporation, Santa Clara, California), which was executed on a NVIDIA GeForce GTX 280. Our implementation results in improved reconstruction times from minutes, and perhaps hours, to a matter of seconds, while also giving the clinician the ability to view 3D volumetric data at higher resolutions. We evaluated our implementation on ten clinical data sets and one phantom data set to observe if differences occur between CPU and GPU-based reconstructions. By using our approach, the computation time for 256³ is reduced from 25 min on the CPU to 3.2 s on the GPU. The GPU reconstruction time for 512³ volumes is 8.5 s.

Introduction

Computed tomography is one of the most popular modalities in the clinical arena, but reconstruction of cone beam computed tomography (CBCT) data can be time consuming on a standard system. Solutions that reduce the turn-around time would provide advantages during both diagnostic and treatment interventions, e.g., real-time reconstruction and high resolution reconstruction.

The high demand for realism in computer games has pushed the development of Graphic Processing Units (GPUs). As a result, the performance of these units themselves are multiple times higher than the supercomputers of only a decade ago. Therefore, it is practical to apply the power of GPUs to problems that exist in the field of medical imaging.

We use a NVIDIA GeForce GTX 280 which provides high performance for a relatively low cost (US$ 350). The advantage of a NVIDIA product is that a C-like programming environment, called Compute Unified Device Architecture (CUDA), is provided.

CUDA has several advantages over traditional low-level GPU programming languages. For example, it uses the standard C language, it allows for access to arbitrary addresses in the device's memory, it allows user-managed shared memory (16 kB in size) that can be shared amongst threads, and it utilizes faster downloads and readbacks to and from the GPU. However, in comparison to shader-based languages, CUDA-based implementations are slightly slower. Compared to traditional CPU calculations, the GPU computations have some disadvantages. These include no support for recursive functions, bottlenecks due to bandwidth limitations and latencies between the CPU and the GPU, and the GPU's deviations from the IEEE 754 standard¹, which includes no support for NaNs.

Since computed tomographic reconstruction is computationally very demanding, several approaches to speed up the process have been developed in recent years. The main achievements have been made using Cell Broadband Engines [2], Field Programmable Gate Arrays (FPGAs) [3], [4], and GPU [5], [6]. A comprehensive summary of the different approaches is given in [7], where four different approaches are compared (PC Reference, FPGAs, GPU and Cell). The system parameter for all techniques are 512 projections, with a projection size of 1024² and a volume of 512³. The reconstruction times are as follows: PC 201 s, FPGA 25 s, GPU 37 s, and Cell 17 s. A direct comparison between the different approaches is difficult since the architecture of the hardware used, especially for GPUs, is frequently updated and may include additional new features.

Several groups have worked on implementing CT reconstructions on GPUs. Over the last decade, the main contributions in accelerated CT have been made by Mueller and coworkers [5], [8], where different implementations and programming platforms are used to show the ability of the graphic accelerator. In [8], a streaming-shader-based CT framework is presented which pipelines the process; the convolution is done on the CPU and the backprojection on the GPU. A similar implementation by using CUDA for parallel beam and cone beam is presented in Yang et al. [6]. Reconstruction of CBCT data from mobile C-arm units by using NVIDIA devices is presented in [9], [10].

Our approach is distinct from the previous work. We have developed a solution that takes advantage of the available shared memory, loads all projection images into the GPU memory, and computes the intensity of each voxel by backprojecting in parallel. We investigate the limitation and differences between the reconstruction on GPUs and on CPUs, which most likely primarily occur as a result of the deviation from the IEEE 754 standard. Since our hardware allows different GPU architectures, we evaluate our algorithm using two different architectures, i.e., sm_10 (basic) and sm_13 (double floating point precision). We monitored the differences by performing a clinical evaluation of ten animal cases and one phantom case. Due to the hardware differences between GPUs (e.g., clock speed and memory size, and variations in the system parameters of different computed tomography modalities), a direct comparison between implementations is difficult to perform.

Section snippets

Cone beam computed tomography

In this section, we revisit a reconstruction method for CBCT data as introduced by Feldkamp et al. [11]. Since we use a rotational angiographic system (Toshiba Infinix VSI/02), which is equipped with a flat panel detector, we only discuss the case of equally spaced planar detectors.

In Fig. 1, the schematic drawing of the cone beam system with a planar detector is presented. During acquisition, the system follows a circular trajectory, with a radius of D placed at the origin. The detector plane

Evaluations

For all evaluations, we used a standard system (Intel Core2 Quad, 2.83 GHz, 4 GB of RAM) equipped with a NVIDIA GeForce GTX 280. The performance profile of the GPU is: 240 Stream Processors with 1296 MHz Shader Clock which equals a peak performance of almost 1 TFlops.²

To evaluate the speed up over the CPU provided by the GPU, we determine total time,

Results

Table 1 shows the reconstruction time of two different volume sizes (256³, 512³) and for the two different GPU architectures. Note, additionally we calculated the transfer time between the main memory and the GPU memory. The total transfer time for a 256³ volume and 106 projection images was 0.75 s, i.e., one fourth of the total reconstruction is devoted to the transfering of data, which is significant. However, the total reconstruction time is substantially reduced compared to the standard CPU

Discussion

In this paper, we presented an efficient and clinically orientated algorithm to reconstruct computed tomography data in almost real-time, demonstrating the power of GPUs in the field of medical imaging. For future work, implementations of other medical imaging problems using a GPU should be considered. In the field of computed tomography, there exists other and more efficient reconstruction algorithms whose running time may benefit by using a similar approach.

In our evaluations, we report

Conflict of interest statement

None declared.

Acknowledgements

This work was partly supported by The State University of New York at Buffalo IRD Fund, NSF grant IIS-0713489, NSF CAREER Award CCF-0546509, and the Toshiba Medical Systems Corporation.

References (13)

D. Hough
Applications of the proposed IEEE-754 standard for floating point arithmetic
Computer
(1981)
M. Kachelriess et al.
Hyperfast parallel-beam and cone-beam backprojection using the cell general purpose hardware
Med. Phys.
(2007)
D. Brasse et al.
Towards an inline reconstruction architecture for micro-CT systems
Phys. Med. Biol.
(2005)
S. Coric, M. Leeser, E. Miller, M. Trepanier, Parallel-beam backprojection: an FPGA implementation optimized for...
F. Xu et al.
Real-time 3D computed tomographic reconstruction using commodity graphics hardware
Phys. Med. Biol.
(2007)
H. Yang et al.
Accelerating backprojections via CUDA architecture

There are more references available in the full text version of this article.

Cited by (0)

View full text

Computer Methods and Programs in Biomedicine

GPU-based cone beam computed tomography

Abstract

Introduction

Section snippets

Cone beam computed tomography

Evaluations

Results

Discussion

Conflict of interest statement

Acknowledgements

Applications of the proposed IEEE-754 standard for floating point arithmetic

Computer

Hyperfast parallel-beam and cone-beam backprojection using the cell general purpose hardware

Med. Phys.

Towards an inline reconstruction architecture for micro-CT systems

Phys. Med. Biol.

Real-time 3D computed tomographic reconstruction using commodity graphics hardware

Phys. Med. Biol.

Accelerating backprojections via CUDA architecture