skip to main content
10.1145/2744769.2744794acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

An Analysis of Accelerator Coupling in Heterogeneous Architectures

Published:07 June 2015Publication History

ABSTRACT

Existing research on accelerators has emphasized the performance and energy efficiency improvements they can provide, devoting little attention to practical issues such as accelerator invocation and interaction with other on-chip components (e.g. cores, caches). In this paper we present a quantitative study that considers these aspects by implementing seven high-throughput accelerators following three design models: tight coupling behind a CPU, loose out-of-core coupling with Direct Memory Access (DMA) to the LLC, and loose out-of-core coupling with DMA to DRAM. A salient conclusion of our study is that working sets of non-trivial size are best served by loosely-coupled accelerators that integrate private memory blocks tailored to their needs.

References

  1. R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. Scratchpad Memory: Design Alternative for Cache On-chip Memory in Embedded Systems. In Proc. of CODES+ISSS, pages 73{78, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Barker, T. Benson, D. Campbell, D. Ediger, R. Gioiosa, A. Hoisie, D. Kerbyson, J. Manzano, A. Marquez, L. Song, N. Tallent, and A. Tumeo. PERFECT Benchmark Suite Manual. Pacific Northwest National Laboratory and Georgia Tech Research Institute, 2013.Google ScholarGoogle Scholar
  3. T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. DianNao: a Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In Proc. of ASPLOS, pages 269{284, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, and G. Reinman. Architecture Support for Accelerator-rich CMPs. In Proc. of DAC, pages 843{849, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Fog. Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs. Copenhagen University College of Engineering, 2011.Google ScholarGoogle Scholar
  6. J. Huang, Y. Huang, O. Temam, P. Ienne, Y. Chen, and C. Wu. A Low-cost Memory Interface for High-throughput Accelerators. In Proc. of CASES, pages 11:1{11:10, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Jaleel. Memory Characterization of Workloads Using Instrumentation-Driven Simulation. Web Copy, 2010.Google ScholarGoogle Scholar
  8. J. H. Kelm and S. S. Lumetta. HybridOS: Runtime Support for Reconfigurable Accelerators. In Proc. of FPGA, pages 212{221, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. D. Kersey, A. Rodrigues, and S. Yalamanchili. A Universal Parallel Front-End for Execution Driven Microarchitecture Simulation. In Proc. of RAPIDO, pages 25{32, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. McPAT: an Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In Proc. of MICRO, pages 469{480, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Martin and G. Smith. High-Level Synthesis: Past, Present, and Future. IEEE Design & Test of Computers, 26(4):18{25, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Muralimanohar, R. Balasubramonian, and N. Jouppi. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In Proc. of MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Reagen, R. Adolf, Y. S. Shao, G.-Y. Wei, and D. Brooks. MachSuite: Benchmarks for Accelerator Design and Customized Architectures. 2014.Google ScholarGoogle Scholar
  14. P. Rosenfeld, E. Cooper-Balis, and B. Jacob. DRAMSim2: A Cycle Accurate Memory System Simulator. Computer Architecture Letters, 10(1):16 {19, jan.-june 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Sampson and T. F. Wenisch. ZCache Skew-ered. In Proc. of WDDD, 2011.Google ScholarGoogle Scholar
  16. S. Srinivasan, L. Zhao, R. Illikkal, and R. Iyer. Efficient interaction between os and architecture in heterogeneous platforms. ACM SIGOPS Operating Systems Review, 45(1):62{72, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Stuecheli, B. Blaner, C. Johns, and M. Siegel. CAPI: A Coherent Accelerator Processor Interface. IBM Journal of Research and Development, 59(1):7{1, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor. Conservation Cores: Reducing the Energy of Mature Computations. In Proc. of ASPLOS, pages 205{218, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Vo, Y. Lee, A. Waterman, and K. Asanovic. A Case for OS-Friendly Hardware Accelerators. In Proc. of WIVOSCA, 2013.Google ScholarGoogle Scholar
  20. L. Wu, A. Lottarini, T. K. Paine, M. A. Kim, and K. A. Ross. Q100: the Architecture and Design of a Database Processing Unit. In Proc. of ASPLOS, pages 255{268, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Analysis of Accelerator Coupling in Heterogeneous Architectures

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            DAC '15: Proceedings of the 52nd Annual Design Automation Conference
            June 2015
            1204 pages
            ISBN:9781450335201
            DOI:10.1145/2744769

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 June 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,770of5,499submissions,32%

            Upcoming Conference

            DAC '24
            61st ACM/IEEE Design Automation Conference
            June 23 - 27, 2024
            San Francisco , CA , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader