research-article

Parallel Colt: A High-Performance Java Library for Scientific Computing and Image Processing

Authors:
Piotr Wendykier

Emory University

Emory University
View Profile

,
James G. Nagy

Emory University

Emory University
View Profile

ACM Transactions on Mathematical Software Volume 37 Issue 3Article No.: 31pp 1–22https://doi.org/10.1145/1824801.1824809

Published:01 September 2010Publication History

ACM Transactions on Mathematical Software

Abstract

Major breakthroughs in chip and software design have been observed for the last nine years. In October 2001, IBM released the world’s first multicore processor: POWER4. Six years later, in February 2007, NVIDIA made a public release of CUDA SDK, a set of development tools to write algorithms for execution on Graphic Processing Units (GPUs). Although software vendors have started working on parallelizing their products, the vast majority of existing code is still sequential and does not effectively utilize modern multicore CPUs and manycore GPUs.

This article describes Parallel Colt, a multithreaded Java library for scientific computing and image processing. In addition to describing the design and functionality of Parallel Colt, a comparison to MATLAB is presented. Two ImageJ plugins for iterative image deblurring and motion correction of PET brain images are described as typical applications of this library. Performance comparisons with MATLAB, including GPU computations via AccelerEyes’ Jacket toolbox are also given.

References

}}AccelerEyes. 2009. Jacket. http://accelereyes.com/.Google Scholar
}}Ahmed, N., Natarajan, T., and Rao, K. R. 1974. Discrete cosine transfom. Trans. Comput. C-23, 1, 90--93. Google ScholarDigital Library
}}Amerdo, B., Bodnartchouk, V., Caromel, D., Delbé, C., Huet, F., and Taboada, G. L. 2008. Current state of Java for HPC. Tech. rep. inria-00312039, INRIA.Google Scholar
}}Anderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J. J., Croz, J. D., Hammarling, S., Greenbaum, A., McKenney, A., and Sorensen, D. 1999. LAPACK Users’ Guide, 3rd Ed. SIAM, Philadelphia, PA. Google ScholarDigital Library
}}Arndt, H., Bundschus, M., and Nägele, A. 2009. Towards a next-generation matrix library for Java. In Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference. Google ScholarDigital Library
}}Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., and der Vorst, H. V. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods 2nd Ed. SIAM, Philadelphia, PA.Google Scholar
}}Bertero, M. and Boccacci, P. 1998. Introduction to Inverse Problems in Imaging. IOP Publishing Ltd., London.Google Scholar
}}Blackford, L. S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., and Whaley, R. C. 2002. An updated set of Basic Linear Algebra Subprograms (BLAS). ACM Trans. Math. Softw. 28, 2, 135--151. Google ScholarDigital Library
}}Bluestein, L. I. 1968. A linear filtering approach to the computation of the discrete Fourier transform. Northeast Electronics Research and Engineering Meeting Record 10, 218--219.Google Scholar
}}Buttari, A., Langou, J., Kurzak, J., and Dongarra, J. 2007. A class of parallel tiled linear algebra algorithms for multicore architectures. Tech. rep., Innovative Computing Laboratory.Google Scholar
}}Byous, J. 2003. Java technology: The early years. http://java.sun.com/features/1998/05/birthday.html.Google Scholar
}}Chung, J., Nagy, J. G., and O’Leary, D. P. 2008. A weighted GCV method for Lanczos hybrid regularization. Elec. Trans. Numer. Anal. 28, 149--167.Google Scholar
}}Cooley, J. W. and Tukey, J. W. 1965. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 90, 297--301.Google ScholarCross Ref
}}Dautelle, J.-M. 2007. JScience Project. http://jscience.org/.Google Scholar
}}Davis, T. 2009. The University of Florida sparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices/.Google Scholar
}}Davis, T. A. 2006. Direct Methods for Sparse Linear Systems. SIAM, Philadelphia, PA. Google ScholarDigital Library
}}Doolin, D. M., Dongarra, J., and Seymour, K. 1999. JLAPACK - Compiling LAPACK Fortran to Java. Sci. Program. 7, 2, 111--138. Google ScholarDigital Library
}}Dougherty, R. 2005. Extensions of DAMAS and benefits and limitations of deconvolution in beamforming. In Proceedings of the 11th AIAA/CEAS Aeroacoustics Conference.Google ScholarCross Ref
}}DRA Systems. 2000. OR-Objects. http://opsresearch.com/OR-Objects/.Google Scholar
}}Faber, T. L., Raghunath, N., Tudorascu, D., and Votaw, J. R. 2009. Motion correction of PET brain images through deconvolution: I. Theoretical development and analysis in software simulations. Phys. Med. Biol. 54, 3, 797--811.Google ScholarCross Ref
}}Frigo, M. and Johnson, S. G. 2005. The design and implementation of FFTW3. Proc. IEEE 93, 2, 216--231.Google ScholarCross Ref
}}Frigo, M. and Johnson, S. G. 2009. FFT benchmark methodology. http://www.fftw.org/speed/method.html.Google Scholar
}}Golub, G. H. and Loan, C. F. V. 1996. Matrix Computations (Johns Hopkins Studies in Mathematical Sciences). The Johns Hopkins University Press.Google Scholar
}}Gomez, R. 2009. JQuantLib project. http://www.jquantlib.org/index.php/Main_Page.Google Scholar
}}Hansen, P. C. 1997. Rank-Deficient and Discrete Ill-Posed Problems. SIAM, Philadelphia, PA. Google ScholarDigital Library
}}Hartley, R. 1942. A more symmetrical Fourier analysis applied to transmission problems. In Proceedings of the Institute of Radio Engineer (IRE).Google ScholarCross Ref
}}Heimsund, B.-O. 2007. Matrix toolkits for Java. http://ressim.berlios.de/.Google Scholar
}}Henriksson, J. 2009. Endrov project. http://www.endrov.net/index.php/Main_Page.Google Scholar
}}Hicklin, J., Moler, C., Webb, P., Boisvert, R. F., Miller, B., Pozo, R., and Remington, K. 2005. JAMA: A Java matrix package. http://math.nist.gov/javanumerics/jama/.Google Scholar
}}Hoschek, W. 2000. Uniform, versatile and efficient dense and sparse multi-dimensional arrays. http://acs.lbl.gov/&percnt;7Ehoschek/publications/ACMJava2000.pdf.Google Scholar
}}Hoschek, W. 2004. Colt project. http://dsd.lbl.gov/&percnt;7Ehoschek/colt/index.html.Google Scholar
}}Hudson, M. and Larkin, R. 1994. Accelerated image reconstruction using ordered subsets of projection data. IEEE Trans. Med. Imag. 13, 601--609.Google ScholarCross Ref
}}JCuda. 2009. JCuda project. http://www.jcuda.org/jcuda/JCuda.html.Google Scholar
}}JUnit Community. 2009. JUnit Project. http://www.junit.org/.Google Scholar
}}Kahan, W. and Darcy, J. D. 1998. How Java’s floating-point hurts everyone everywhere. http://www.cs.berkeley.edu/&percnt;7Ewkahan/JAVAhurt.pdf.Google Scholar
}}Liebke, D. 2009. Incanter project. http://wiki.github.com/liebke/incanter.Google Scholar
}}Menke, M. A. and Buckley, K. 1996. Compensation methods for head motion detected during PET imaging. IEEE Trans. Nucl. Sci. 43, 310--317.Google ScholarCross Ref
}}Mcintosh, A. R., Bookstein, F. L., Haxby, J. V., and Grady, C. L. 1996. Spatial pattern analysis of functional brain images using partial least squares. Neuroimage 3, 143--157.Google ScholarCross Ref
}}Messaoudi, C., Boudier, T., Sorzano, C., and Marco, S. 2007. TomoJ: Tomography software for three-dimensional reconstruction in transmission electron microscopy. BMC Bioinf. 8, 1, 288.Google ScholarCross Ref
}}Nagy, J. G., Palmer, K., and Perrone, L. 2004. Iterative methods for image deblurring: A MATLAB object-oriented approach. Numer. Algo. 36, 1, 73--93.Google ScholarCross Ref
}}Nagy, J. G. and StrakošZ. 2000. Enforcing nonnegativity in image reconstruction algorithms. In Proceedings of the Mathematical Modeling, Estimation, and Imaging, D.C. Wilson et al. Ed., Vol. 4121. 182--190.Google Scholar
}}NVIDIA Corporation. 2009. CUDA zone. http://www.nvidia.com/object/cuda_home.html.Google Scholar
}}Ooura, T. 2006. General purpose FFT (fast Fourier/cosine/sine transform) package. http://www.kurims.kyoto-u.ac.jp/&percnt;7Eooura/fft.html.Google Scholar
}}Picard, Y. and Thompson, C. J. 1997. Motion correction of PET images using multiple acquisition frames. IEEE Trans. Med. Imag. 16, 137--144.Google ScholarCross Ref
}}Rasband, W. S. 2009. ImageJ, U.S. National Institutes of Health, Bethesda, MD. http://rsb.info.nih.gov/ij/.Google Scholar
}}Saad, Y. 1994. ILUT: A dual threshold incomplete ILU factorization. Numer. Linear Algebr. Appl. 1, 387--402.Google ScholarCross Ref
}}Schatzman, J. C. 1996. Accuracy of the discrete Fourier transform and the fast Fourier transform. SIAM J. Sci. Comput. 17, 5, 1150--1166. Google ScholarDigital Library
}}Strother, S., McIntosh, A. R., Somji, I., Oder, A., Spreng, N., Waller, J., Wong, F., Wright, D., Yourganov, G., and Zhao, R. 2009. PLS/NPAIRS-J project. http://code.google.com/p/plsnpairs/.Google Scholar
}}Strother, S. C., Anderson, J., and Hansen, L. K. 2002. The quantitative evaluation of functional neuroimaging experiments: The NPAIRS. NeuroImage 15, 4, 747--771.Google ScholarCross Ref
}}Sun Microsystems. 2004. New features and enhancements J2SE 5.0. http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html.Google Scholar
}}Swarztrauber, P. N. 2004. FFTPACK5. http://www.cisl.ucar.edu/css/software/fftpack5/.Google Scholar
}}Vanek, P., Mandel, J., and Brezina, M. 1996. Algebraic multigrid based on smoothed aggregation for second and fourth order problems. Comput. 56, 179--196.Google ScholarCross Ref
}}Verrill, S. 2009. Nonlinear optimization Java package. http://www1.fpl.fs.fed.us/optimization.html.Google Scholar
}}Vogel, C. R. 2002. Computational Methods for Inverse Problems. SIAM, Philadelphia, PA. Google ScholarDigital Library
}}Wendykier, P. 2009a. CSparseJ project. http://sites.google.com/site/piotrwendykier/software/csparsej.Google Scholar
}}Wendykier, P. 2009b. JPlasma project. http://sites.google.com/site/piotrwendykier/software/jplasma.Google Scholar
}}Wendykier, P. 2009c. JTransforms project. http://sites.google.com/site/piotrwendykier/software/jtransforms.Google Scholar
}}Wendykier, P. 2009d. Parallel Colt project. http://sites.google.com/site/piotrwendykier/software/parallelcolt.Google Scholar
}}Wendykier, P. 2009e. Parallel HRRT Deconvolution project. http://sites.google.com/site/piotrwendykier/software/deconvolution/parallelhrrtdeconvolution.Google Scholar
}}Wendykier, P. 2009f. Parallel iterative deconvolution project. http://sites.google.com/site/piotrwendykier/software/deconvolution/paralleliterativedeconvolution.Google Scholar
}}Whaley, R. C. and Dongarra, J. J. 1998. Automatically tuned linear algebra software ATLAS. In Proceedings of Supercomputing Conference. Google ScholarDigital Library
}}Yip, P. and Rao, K. R. 1980. A fast computational algorithm for the discrete sine transform. IEEE Trans. Comm. 28, 2, 304 -- 307.Google ScholarCross Ref
}}Zhang, B. 2005. Java FFTPack project. http://jfftpack.sourceforge.net/.Google Scholar

Index Terms

Parallel Colt: A High-Performance Java Library for Scientific Computing and Image Processing
1. Mathematics of computing
  1. Mathematical software
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types

Recommendations

Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

This paper demonstrates the first tera-scale performance of Intel® Xeon Phi™ coprocessors on 1D FFT computations. Applying a disciplined performance programming methodology of sound algorithm choice, valid performance model, and well-executed ...
Read More
A parallel implementation of a magnetic induction tomography: image reconstruction algorithm on the ClearSpeed advance accelerator board
ICIP'09: Proceedings of the 16th IEEE international conference on Image processing

Magnetic Induction Tomography (MIT) is a relatively new contactless imaging modality which aims at reconstructing conductivity and permittivity distributions within objects. One of MIT's main challenges is the computational intensity required for image ...
Read More
Parallel perfusion imaging processing using GPGPU

Background and purposeThe objective of brain perfusion quantification is to generate parametric maps of relevant hemodynamic quantities such as cerebral blood flow (CBF), cerebral blood volume (CBV) and mean transit time (MTT) that can be used in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Mathematical Software Volume 37, Issue 3
September 2010
296 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/1824801
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2010
- Accepted: 1 April 2010
- Revised: 1 January 2010
- Received: 1 December 2008
Published in toms Volume 37, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deconvolution
FFT
PET
inverse problems
iterative methods
motion correction
multithreading
regularization
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 1,170
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parallel Colt: A High-Performance Java Library for Scientific Computing and Image Processing

ACM Transactions on Mathematical Software

Abstract

References

Cited By

Index Terms

Recommendations

Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors

A parallel implementation of a magnetic induction tomography: image reconstruction algorithm on the ClearSpeed advance accelerator board

Parallel perfusion imaging processing using GPGPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Parallel Colt: A High-Performance Java Library for Scientific Computing and Image Processing

ACM Transactions on Mathematical Software

Abstract

References

Cited By

Index Terms

Recommendations

Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors

A parallel implementation of a magnetic induction tomography: image reconstruction algorithm on the ClearSpeed advance accelerator board

Parallel perfusion imaging processing using GPGPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media