skip to main content
10.1145/2897937.2898036acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Latency sensitivity-based cache partitioning for heterogeneous multi-core architecture

Authors Info & Claims
Published:05 June 2016Publication History

ABSTRACT

Shared last-level cache (LLC) management is a critical design issue for heterogeneous multi-cores. In this paper, we observe two major challenges: the contribution of LLC latency to overall performance varies among applications/cores and also across time; overlooking the off-chip latency factor often leads to adverse effects on overall performance. Hence, we propose a Latency Sensitivity-based Cache Partitioning (LSP) framework, including a lightweight runtime mechanism to quantify the latency-sensitivity and a new cost function to guide the LLC partitioning. Results show that LSP improves the overall throughput by 8% on average (27% at most), compared with the state-of-the-art partitioning mechanism, TAP.

References

  1. R. Ausavarungnirun, K. K.-W. Chang, L. Subramanian, G. H. Loh, and O. Mutlu. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. In ISCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In ISPASS, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. R. Brodtkorb, T. R. Hagen, and M. L. SÃętra. Graphics processing unit (gpu) programming strategies and trends in gpu computing. J. Parallel Distrib. Comput., 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Garrido and J. Grajal. Continuous-flow variable-length memoryless linear regression architecture. Electronics Letters, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  6. L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on cmps: Caches as a shared resource. In PACT, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. O. Kayiran, N. Nachiappan, A. Jog, R. Ausavarungnirun, M. Kandemir, G. Loh, O. Mutlu, and C. Das. Managing gpu concurrency in heterogeneous architectures. In MICRO, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Lee and H. Kim. Tap: A tlp-aware cache management policy for a cpu-gpu heterogeneous architecture. In HPCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. Lin and R. Balasubramonian. Refining the utility metric for utility-based cache partitioning. In WDDD, 2011.Google ScholarGoogle Scholar
  11. J. Lotze, P. Sutton, and H. Lahlou. Many-core accelerated libor swaption portfolio pricing. In SCC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Mekkat, A. Holey, P.-C. Yew, and A. Zhai. Managing shared last-level cache in a heterogeneous multicore processor. In PACT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Patel, F. Afram, S. Chen, and K. Ghose. Marss: A full system simulator for multicore x86 cpus. In DAC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin. Scaling the bandwidth wall: Challenges in and avenues for cmp scaling. SIGARCH Comput. Archit. News, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Rosenfeld, E. Cooper-Balis, and B. Jacob. Dramsim2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. The Journal of Supercomputing, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P.-H. Wang, G.-H. Liu, J.-C. Yeh, T.-M. Chen, H.-Y. Huang, C.-L. Yang, S.-L. Liu, and J. Greensky. Full system simulation framework for integrated cpu/gpu architecture. In VLSI-DAT, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  19. P.-H. Wang, C.-W. Lo, C.-L. Yang, and Y.-J. Cheng. A cycle-level simt-gpu simulation framework. In ISPASS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    DAC '16: Proceedings of the 53rd Annual Design Automation Conference
    June 2016
    1048 pages
    ISBN:9781450342360
    DOI:10.1145/2897937

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 5 June 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate1,770of5,499submissions,32%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader