Abstract
Major breakthroughs in chip and software design have been observed for the last nine years. In October 2001, IBM released the world’s first multicore processor: POWER4. Six years later, in February 2007, NVIDIA made a public release of CUDA SDK, a set of development tools to write algorithms for execution on Graphic Processing Units (GPUs). Although software vendors have started working on parallelizing their products, the vast majority of existing code is still sequential and does not effectively utilize modern multicore CPUs and manycore GPUs.
This article describes Parallel Colt, a multithreaded Java library for scientific computing and image processing. In addition to describing the design and functionality of Parallel Colt, a comparison to MATLAB is presented. Two ImageJ plugins for iterative image deblurring and motion correction of PET brain images are described as typical applications of this library. Performance comparisons with MATLAB, including GPU computations via AccelerEyes’ Jacket toolbox are also given.
- }}AccelerEyes. 2009. Jacket. http://accelereyes.com/.Google Scholar
- }}Ahmed, N., Natarajan, T., and Rao, K. R. 1974. Discrete cosine transfom. Trans. Comput. C-23, 1, 90--93. Google ScholarDigital Library
- }}Amerdo, B., Bodnartchouk, V., Caromel, D., Delbé, C., Huet, F., and Taboada, G. L. 2008. Current state of Java for HPC. Tech. rep. inria-00312039, INRIA.Google Scholar
- }}Anderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J. J., Croz, J. D., Hammarling, S., Greenbaum, A., McKenney, A., and Sorensen, D. 1999. LAPACK Users’ Guide, 3rd Ed. SIAM, Philadelphia, PA. Google ScholarDigital Library
- }}Arndt, H., Bundschus, M., and Nägele, A. 2009. Towards a next-generation matrix library for Java. In Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference. Google ScholarDigital Library
- }}Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., and der Vorst, H. V. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods 2nd Ed. SIAM, Philadelphia, PA.Google Scholar
- }}Bertero, M. and Boccacci, P. 1998. Introduction to Inverse Problems in Imaging. IOP Publishing Ltd., London.Google Scholar
- }}Blackford, L. S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., and Whaley, R. C. 2002. An updated set of Basic Linear Algebra Subprograms (BLAS). ACM Trans. Math. Softw. 28, 2, 135--151. Google ScholarDigital Library
- }}Bluestein, L. I. 1968. A linear filtering approach to the computation of the discrete Fourier transform. Northeast Electronics Research and Engineering Meeting Record 10, 218--219.Google Scholar
- }}Buttari, A., Langou, J., Kurzak, J., and Dongarra, J. 2007. A class of parallel tiled linear algebra algorithms for multicore architectures. Tech. rep., Innovative Computing Laboratory.Google Scholar
- }}Byous, J. 2003. Java technology: The early years. http://java.sun.com/features/1998/05/birthday.html.Google Scholar
- }}Chung, J., Nagy, J. G., and O’Leary, D. P. 2008. A weighted GCV method for Lanczos hybrid regularization. Elec. Trans. Numer. Anal. 28, 149--167.Google Scholar
- }}Cooley, J. W. and Tukey, J. W. 1965. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 90, 297--301.Google ScholarCross Ref
- }}Dautelle, J.-M. 2007. JScience Project. http://jscience.org/.Google Scholar
- }}Davis, T. 2009. The University of Florida sparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices/.Google Scholar
- }}Davis, T. A. 2006. Direct Methods for Sparse Linear Systems. SIAM, Philadelphia, PA. Google ScholarDigital Library
- }}Doolin, D. M., Dongarra, J., and Seymour, K. 1999. JLAPACK - Compiling LAPACK Fortran to Java. Sci. Program. 7, 2, 111--138. Google ScholarDigital Library
- }}Dougherty, R. 2005. Extensions of DAMAS and benefits and limitations of deconvolution in beamforming. In Proceedings of the 11th AIAA/CEAS Aeroacoustics Conference.Google ScholarCross Ref
- }}DRA Systems. 2000. OR-Objects. http://opsresearch.com/OR-Objects/.Google Scholar
- }}Faber, T. L., Raghunath, N., Tudorascu, D., and Votaw, J. R. 2009. Motion correction of PET brain images through deconvolution: I. Theoretical development and analysis in software simulations. Phys. Med. Biol. 54, 3, 797--811.Google ScholarCross Ref
- }}Frigo, M. and Johnson, S. G. 2005. The design and implementation of FFTW3. Proc. IEEE 93, 2, 216--231.Google ScholarCross Ref
- }}Frigo, M. and Johnson, S. G. 2009. FFT benchmark methodology. http://www.fftw.org/speed/method.html.Google Scholar
- }}Golub, G. H. and Loan, C. F. V. 1996. Matrix Computations (Johns Hopkins Studies in Mathematical Sciences). The Johns Hopkins University Press.Google Scholar
- }}Gomez, R. 2009. JQuantLib project. http://www.jquantlib.org/index.php/Main_Page.Google Scholar
- }}Hansen, P. C. 1997. Rank-Deficient and Discrete Ill-Posed Problems. SIAM, Philadelphia, PA. Google ScholarDigital Library
- }}Hartley, R. 1942. A more symmetrical Fourier analysis applied to transmission problems. In Proceedings of the Institute of Radio Engineer (IRE).Google ScholarCross Ref
- }}Heimsund, B.-O. 2007. Matrix toolkits for Java. http://ressim.berlios.de/.Google Scholar
- }}Henriksson, J. 2009. Endrov project. http://www.endrov.net/index.php/Main_Page.Google Scholar
- }}Hicklin, J., Moler, C., Webb, P., Boisvert, R. F., Miller, B., Pozo, R., and Remington, K. 2005. JAMA: A Java matrix package. http://math.nist.gov/javanumerics/jama/.Google Scholar
- }}Hoschek, W. 2000. Uniform, versatile and efficient dense and sparse multi-dimensional arrays. http://acs.lbl.gov/%7Ehoschek/publications/ACMJava2000.pdf.Google Scholar
- }}Hoschek, W. 2004. Colt project. http://dsd.lbl.gov/%7Ehoschek/colt/index.html.Google Scholar
- }}Hudson, M. and Larkin, R. 1994. Accelerated image reconstruction using ordered subsets of projection data. IEEE Trans. Med. Imag. 13, 601--609.Google ScholarCross Ref
- }}JCuda. 2009. JCuda project. http://www.jcuda.org/jcuda/JCuda.html.Google Scholar
- }}JUnit Community. 2009. JUnit Project. http://www.junit.org/.Google Scholar
- }}Kahan, W. and Darcy, J. D. 1998. How Java’s floating-point hurts everyone everywhere. http://www.cs.berkeley.edu/%7Ewkahan/JAVAhurt.pdf.Google Scholar
- }}Liebke, D. 2009. Incanter project. http://wiki.github.com/liebke/incanter.Google Scholar
- }}Menke, M. A. and Buckley, K. 1996. Compensation methods for head motion detected during PET imaging. IEEE Trans. Nucl. Sci. 43, 310--317.Google ScholarCross Ref
- }}Mcintosh, A. R., Bookstein, F. L., Haxby, J. V., and Grady, C. L. 1996. Spatial pattern analysis of functional brain images using partial least squares. Neuroimage 3, 143--157.Google ScholarCross Ref
- }}Messaoudi, C., Boudier, T., Sorzano, C., and Marco, S. 2007. TomoJ: Tomography software for three-dimensional reconstruction in transmission electron microscopy. BMC Bioinf. 8, 1, 288.Google ScholarCross Ref
- }}Nagy, J. G., Palmer, K., and Perrone, L. 2004. Iterative methods for image deblurring: A MATLAB object-oriented approach. Numer. Algo. 36, 1, 73--93.Google ScholarCross Ref
- }}Nagy, J. G. and StrakošZ. 2000. Enforcing nonnegativity in image reconstruction algorithms. In Proceedings of the Mathematical Modeling, Estimation, and Imaging, D.C. Wilson et al. Ed., Vol. 4121. 182--190.Google Scholar
- }}NVIDIA Corporation. 2009. CUDA zone. http://www.nvidia.com/object/cuda_home.html.Google Scholar
- }}Ooura, T. 2006. General purpose FFT (fast Fourier/cosine/sine transform) package. http://www.kurims.kyoto-u.ac.jp/%7Eooura/fft.html.Google Scholar
- }}Picard, Y. and Thompson, C. J. 1997. Motion correction of PET images using multiple acquisition frames. IEEE Trans. Med. Imag. 16, 137--144.Google ScholarCross Ref
- }}Rasband, W. S. 2009. ImageJ, U.S. National Institutes of Health, Bethesda, MD. http://rsb.info.nih.gov/ij/.Google Scholar
- }}Saad, Y. 1994. ILUT: A dual threshold incomplete ILU factorization. Numer. Linear Algebr. Appl. 1, 387--402.Google ScholarCross Ref
- }}Schatzman, J. C. 1996. Accuracy of the discrete Fourier transform and the fast Fourier transform. SIAM J. Sci. Comput. 17, 5, 1150--1166. Google ScholarDigital Library
- }}Strother, S., McIntosh, A. R., Somji, I., Oder, A., Spreng, N., Waller, J., Wong, F., Wright, D., Yourganov, G., and Zhao, R. 2009. PLS/NPAIRS-J project. http://code.google.com/p/plsnpairs/.Google Scholar
- }}Strother, S. C., Anderson, J., and Hansen, L. K. 2002. The quantitative evaluation of functional neuroimaging experiments: The NPAIRS. NeuroImage 15, 4, 747--771.Google ScholarCross Ref
- }}Sun Microsystems. 2004. New features and enhancements J2SE 5.0. http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html.Google Scholar
- }}Swarztrauber, P. N. 2004. FFTPACK5. http://www.cisl.ucar.edu/css/software/fftpack5/.Google Scholar
- }}Vanek, P., Mandel, J., and Brezina, M. 1996. Algebraic multigrid based on smoothed aggregation for second and fourth order problems. Comput. 56, 179--196.Google ScholarCross Ref
- }}Verrill, S. 2009. Nonlinear optimization Java package. http://www1.fpl.fs.fed.us/optimization.html.Google Scholar
- }}Vogel, C. R. 2002. Computational Methods for Inverse Problems. SIAM, Philadelphia, PA. Google ScholarDigital Library
- }}Wendykier, P. 2009a. CSparseJ project. http://sites.google.com/site/piotrwendykier/software/csparsej.Google Scholar
- }}Wendykier, P. 2009b. JPlasma project. http://sites.google.com/site/piotrwendykier/software/jplasma.Google Scholar
- }}Wendykier, P. 2009c. JTransforms project. http://sites.google.com/site/piotrwendykier/software/jtransforms.Google Scholar
- }}Wendykier, P. 2009d. Parallel Colt project. http://sites.google.com/site/piotrwendykier/software/parallelcolt.Google Scholar
- }}Wendykier, P. 2009e. Parallel HRRT Deconvolution project. http://sites.google.com/site/piotrwendykier/software/deconvolution/parallelhrrtdeconvolution.Google Scholar
- }}Wendykier, P. 2009f. Parallel iterative deconvolution project. http://sites.google.com/site/piotrwendykier/software/deconvolution/paralleliterativedeconvolution.Google Scholar
- }}Whaley, R. C. and Dongarra, J. J. 1998. Automatically tuned linear algebra software ATLAS. In Proceedings of Supercomputing Conference. Google ScholarDigital Library
- }}Yip, P. and Rao, K. R. 1980. A fast computational algorithm for the discrete sine transform. IEEE Trans. Comm. 28, 2, 304 -- 307.Google ScholarCross Ref
- }}Zhang, B. 2005. Java FFTPack project. http://jfftpack.sourceforge.net/.Google Scholar
Index Terms
- Parallel Colt: A High-Performance Java Library for Scientific Computing and Image Processing
Recommendations
Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisThis paper demonstrates the first tera-scale performance of Intel® Xeon Phi™ coprocessors on 1D FFT computations. Applying a disciplined performance programming methodology of sound algorithm choice, valid performance model, and well-executed ...
A parallel implementation of a magnetic induction tomography: image reconstruction algorithm on the ClearSpeed advance accelerator board
ICIP'09: Proceedings of the 16th IEEE international conference on Image processingMagnetic Induction Tomography (MIT) is a relatively new contactless imaging modality which aims at reconstructing conductivity and permittivity distributions within objects. One of MIT's main challenges is the computational intensity required for image ...
Parallel perfusion imaging processing using GPGPU
Background and purposeThe objective of brain perfusion quantification is to generate parametric maps of relevant hemodynamic quantities such as cerebral blood flow (CBF), cerebral blood volume (CBV) and mean transit time (MTT) that can be used in ...
Comments