ABSTRACT
Fast and accurate performance and power prediction is a key challenge in co-development of hardware and software. Traditional analytical or simulation-based approaches are often too inaccurate or slow. In this work, we propose LACross, a novel learning-based, analytical cross-platform prediction framework that provides fast and accurate estimation of time-varying software performance and power consumption on a target hardware platform. We employ a fine-grained phase-based approach, where the learning algorithm synthesizes analytical proxy models that predict the performance and power of the workload in each program phase from performance statistics obtained through hardware counter measurements on the host. Our learning approach relies on a one-time training phase using a target reference model or real hardware. We applied our approach to 35 benchmarks from SPEC 2006, SD-VBS and MiBench. Results show on average over 97% prediction accuracy for predicting both fine-grain performance and power traces at speeds of over 500 MIPS.
- ODROID U3 Development Board. http://www.hardkernel.com/main/products/prdt_info.php?g_code=g138745696275.Google Scholar
- ODROID XU3 Development Board. http://www.hardkernel.com/main/products/prdt_info.php?g_code=g140448267127.Google Scholar
- N. Binkert et al. The gem5 simulator. SIGARCH Computer Architecture News, 39(2):1--7, 2011. Google ScholarDigital Library
- W. Bircher et al. Runtime identification of microprocessor energy saving opportunities. In ISLPED, 2005. Google ScholarDigital Library
- O. Bringmann et al. The next generation of virtual prototyping: Ultra-fast yet accurate simulation of HW/SW systems. In DATE, 2015. Google ScholarDigital Library
- S. Browne et al. A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl., 14(3):189--204, 2000. Google ScholarDigital Library
- T. E. Carlson et al. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In SC, 2011. Google ScholarDigital Library
- D. Chiou et al. FPGA-accelerated simulation technologies (FAST): Fast, full-system, cycle-accurate simulators. In MICRO, 2007. Google ScholarDigital Library
- M. Huang et al. A framework for dynamic energy efficiency and temperature management. In MICRO, 2000. Google ScholarDigital Library
- E. Ipek and S. A. Mckee. Efficiently exploring architectural design spaces via predictive modeling. In ASPLOS, 2006. Google ScholarDigital Library
- P. J. Joseph. A predictive performance model for superscalar processors. In MICRO, 2006. Google ScholarDigital Library
- R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI, 1995. Google ScholarDigital Library
- B. C. Lee et al. CPR: Composable performance regression for scalable multiprocessor models, 2008.Google Scholar
- S. Li et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO, 2009. Google ScholarDigital Library
- P. S. Magnusson et al. Simics: A full system simulation platform. IEEE Computer, 35(2):50--58, 2002. Google ScholarDigital Library
- J. C. McCullough et al. Evaluating the effectiveness of model-based power characterization. In USENIX, 2011. Google ScholarDigital Library
- P. J. Mucci et al. PAPI: A portable interface to hardware performance counters. In DoD HPCMP, 1999.Google Scholar
- Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103:127--152, 2005. Google ScholarDigital Library
- D. B. Noonburg and J. P. Shen. Theoretical modeling of superscalar processor performance. In MICRO, 1994. Google ScholarDigital Library
- T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In PACT, 2001. Google ScholarDigital Library
- D. J. Sorin et al. Analytic evaluation of shared-memory systems with ILP processors. In ISCA, 1998. Google ScholarDigital Library
- X. Zheng et al. Learning-based analytical cross-platform performance prediction. In SAMOS, 2015.Google Scholar
- Accurate phase-level cross-platform power and performance estimation
Recommendations
System-level power-performance tradeoffs for reconfigurable computing
In this paper, we propose a configuration-aware datapartitioning approach for reconfigurable computing. We show how the reconfiguration overhead impacts the data-partitioning process. Moreover, we explore the system-level power-performance tradeoffs ...
Profile assisted online system-level performance and power estimation for dynamic reconfigurable embedded systems
ASPDAC '11: Proceedings of the 16th Asia and South Pacific Design Automation ConferenceSignificant research has demonstrated the performance and power benefits of runtime dynamic reconfiguration of FPGAs and microprocessor/FPGA devices. For dynamically reconfigurable systems, in which the selection of hardware coprocessors to implement ...
Profiling and online system-level performance and power estimation for dynamically adaptable embedded systems
Significant research has demonstrated the performance and power benefits of runtime dynamic reconfiguration of FPGAs and microprocessor/FPGA devices. For dynamically reconfigurable systems, in which the selection of hardware coprocessors to implement ...
Comments