research-article

Exploring Energy-Efficient Cache Design in Emerging Mobile Platforms

Authors:
Kaige Yan

University of Houston, TX

University of Houston, TX
View Profile

,
Lu Peng

Louisiana State University, Baton Rouge, LA

Louisiana State University, Baton Rouge, LA
View Profile

,
Mingsong Chen

East China Normal University, Shanghai, China

East China Normal University, Shanghai, China
View Profile

,
Xin Fu

University of Houston, TX

University of Houston, TX
View Profile

ACM Transactions on Design Automation of Electronic Systems Volume 22 Issue 4Article No.: 58pp 1–20https://doi.org/10.1145/2843940

Published:20 July 2017Publication History

ACM Transactions on Design Automation of Electronic Systems

Abstract

Mobile devices are quickly becoming the most widely used processors in consumer devices. Since their major power supply is battery, energy-efficient computing is highly desired. In this article, we focus on energy-efficient cache design in emerging mobile platforms. We observe that more than 40% of L2 cache accesses are OS kernel accesses in interactive smartphone applications. Such frequent kernel accesses cause serious interferences between the user and kernel blocks in the L2 cache, leading to unnecessary block replacements and high L2 cache miss rate. We first propose to statically partition the L2 cache into two separate segments, which can be accessed only by the user code and kernel code, respectively. Meanwhile, the overall size of the two segments is shrunk, which reduces the energy consumption while still maintaining the similar cache miss rate. We then find completely different access behaviors between the two separated kernel and user segments and explore the multi-retention STT-RAM-based user and kernel segments to obtain higher energy savings in this static partition-based cache design. Finally, we propose to dynamically partition the L2 cache into the user and kernel segments to minimize overall cache size. We also integrate the short-retention STT-RAM into this dynamic partition-based cache design for maximal energy savings. The experimental results show that our static technique reduces cache energy consumption by 75% with 2% performance loss, and our dynamic technique further shows strong capability to reduce cache energy consumption by 85% with only 3% performance loss.

References

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The gem5 Simulator. SIGARCH Computer Architecture News 39, 2, 1--7. DOI:http://dx.doi.org/10.1145/2024716.2024718 Google ScholarDigital Library
Koushik Chakraborty, Philip M. Wells, and Gurindar S. Sohi. 2006. Computation Spreading: Employing hardware migration to specialize CMP cores on-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’06). ACM, New York, NY, 283--292. DOI:http://dx.doi.org/10.1145/1168857.1168893 Google ScholarDigital Library
Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 308--319. DOI:http://dx.doi.org/10.1145/2485922.2485949 Google ScholarDigital Library
Xiangyu Dong, Cong Xu, Yuan Xie, and N. P. Jouppi. 2012. NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 7, 994--1007. DOI:http://dx.doi.org/10.1109/TCAD.2012.2185930 Google ScholarDigital Library
Inc Gartner. 2014. Forecast: PCs, Ultramobiles, and Mobile Phones, Worldwide, 2011-2018, 2Q14 Update. Retrieved May 21, 2017 from http://www.gartner.com/newsroom/id/2791017Google Scholar
Saurabh Gupta and Huiyang Zhou. 2015. Spatial locality-aware cache partitioning for effective cache sharing. In 44rd International Conference on Parallel Processing (ICPP’15). Google ScholarDigital Library
Anthony Gutierrez, Ronald G. Dreslinski, Thomas F. Wenisch, Trevor Mudge, Ali Saidi, Chris Emmons, and Nigel Paver. 2011. Full-system analysis and characterization of interactive smartphone applications. In Proceedings of the 2011 IEEE International Symposium on Workload Characterization (IISWC’11). IEEE Computer Society, Washington, DC, 81--90. DOI:http://dx.doi.org/10.1109/IISWC.2011.6114205 Google ScholarDigital Library
Yongbing Huang, Zhongbin Zha, Mingyu Chen, and Lixin Zhang. 2014. Moby: A mobile benchmark suite for architectural simulators. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’14). 45--54. DOI:http://dx.doi.org/10.1109/ISPASS.2014.6844460 Google ScholarCross Ref
Hsien-Hsin S. Lee and Gary S. Tyson. 2000. Region-based caching: An energy-delay efficient memory architecture for embedded processors. In Proceedings of the 2000 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’00). ACM, New York, NY, 120--127. DOI:http://dx.doi.org/10.1145/354880.354898 Google ScholarDigital Library
Sheng Li, Jung-Ho Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). 469--480. Google ScholarDigital Library
Tao Li and Lizy Kurian John. 2003. Routine based OS-aware microprocessor resource adaptation for run-time operating system power saving. In Proceedings of the 2003 International Symposium on Low Power Electronics and Design (ISLPED’03). ACM, New York, NY, 241--246. DOI:http://dx.doi.org/10.1145/871506.871565 Google ScholarDigital Library
D. Pandiyan, S.-Y. Lee, C. Wu. 2013. Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite - MobileBench. In IEEE International Symposium on Workload Characterization. Google ScholarCross Ref
D. Pandiyan and C. Wu. 2014. Quantifying the energy cost of data movement for emerging smart phone workloads on mobile platforms. In IEEE International Symposium on Workload Characterization. Google ScholarCross Ref
Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). IEEE Computer Society, Washington, DC, 423--432. DOI:http://dx.doi.org/10.1109/MICRO.2006.49 Google ScholarDigital Library
Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11). ACM, New York, NY, 57--68. DOI:http://dx.doi.org/10.1145/2000064.2000073 Google ScholarDigital Library
Alex Shye, Benjamin Scholbrock, and Gokhan Memik. 2009. Into the wild: Studying real user activity patterns to guide power optimizations for mobile architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). ACM, New York, NY, 168--178. DOI:http://dx.doi.org/10.1145/1669112.1669135 Google ScholarDigital Library
Zhenyu Sun, Xiuyuan Bi, Hai (Helen) Li, Weng-Fai Wong, Zhong-Liang Ong, Xiaochun Zhu, and Wenqing Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). ACM, New York, NY, 329--338. DOI:http://dx.doi.org/10.1145/2155620.2155659 Google ScholarDigital Library
K. T. Sundararajan, V. Porpodas, T. M. Jones, N. P. Topham, and B. Franke. 2012. Cooperative partitioning: Energy-efficient cache partitioning for high-performance CMPs. In IEEE 18th International Symposium on High Performance Computer Architecture (HPCA’12). 1--12. DOI:http://dx.doi.org/10.1109/HPCA.2012.6169036 Google ScholarDigital Library
Jue Wang, Xiangyu Dong, and Yuan Xie. 2013. OAP: An obstruction-aware cache management policy for STT-RAM last-level caches. In Design, Automation Test in Europe Conference Exhibition (DATE’13). 847--852. DOI:http://dx.doi.org/10.7873/DATE.2013.179 Google ScholarCross Ref
Ruisheng Wang and Lizhong Chen. 2014. Futility scaling: High-associativity cache partitioning. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). 356--367. DOI:http://dx.doi.org/10.1109/MICRO.2014.46 Google ScholarDigital Library
S. J. E. Wilton and N. P. Jouppi. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits 31, 5, 677--688. DOI:http://dx.doi.org/10.1109/4.509850 Google ScholarCross Ref
Kaige Yan and Xin Fu. 2015. Energy-efficient cache design in emerging mobile platforms: The implications and optimizations. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE’15). EDA Consortium, San Jose, CA, 375--380. http://dl.acm.org/citation.cfm?id=2755753.2755838 Google ScholarCross Ref
Ying Ye, Richard West, Zhuoqun Cheng, and Ye Li. 2014. COLORIS: A dynamic cache partitioning system using page coloring. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT’14). ACM, New York, NY, 381--392. DOI:http://dx.doi.org/10.1145/2628071.2628104 Google ScholarDigital Library
Yuhao Zhu and Vijay Janapa Reddi. 2013. High-performance and energy-efficient mobile web browsing on big/little systems. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13). IEEE Computer Society, Washington, DC, 13--24. DOI:http://dx.doi.org/10.1109/HPCA.2013.6522303 Google ScholarDigital Library
Yuhao Zhu and Vijay Janapa Reddi. 2014. WebCore: Architectural support for mobile web browsing. In Proceedings of the 41st International Symposium on Computer Architecture (ISCA’14). Google ScholarCross Ref

Index Terms

Exploring Energy-Efficient Cache Design in Emerging Mobile Platforms
1. Computer systems organization
  1. Architectures
    1. Other architectures

Recommendations

Energy-Efficient GPU L2 Cache Design Using Instruction-Level Data Locality Similarity

This article presents a novel energy-efficient cache design for massively parallel, throughput-oriented architectures like GPUs. Unlike L1 data cache on modern GPUs, L2 cache shared by all of the streaming multiprocessors is not the primary performance ...
Read More
Domino Cache: An Energy-Efficient Data Cache for Modern Applications

The energy consumption for processing modern workloads is challenging in data centers. Due to the large datasets of cloud workloads, the miss rate of the L1 data cache is high, and with respect to the energy efficiency concerns, such misses are costly ...
Read More
Energy-efficient cache design in emerging mobile platforms: the implications and optimizations
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition

Mobile devices are quickly becoming the most widely used processors in consumer devices. Since their major power supply is battery, the energy-efficient computing is highly desired. In this paper, we focus on the energy-efficient cache design in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Design Automation of Electronic Systems Volume 22, Issue 4
October 2017
430 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3097980
Editor:
Naehyuck Chang
Korea Advanced Institute of Science and Technology, Korea
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 20 July 2017
- Accepted: 1 November 2015
- Revised: 1 October 2015
- Received: 1 June 2015
Published in todaes Volume 22, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
IC3
Mobile processors
cache
spin-transfer torque RAM (STT-RAM)
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 335
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploring Energy-Efficient Cache Design in Emerging Mobile Platforms

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

Energy-Efficient GPU L2 Cache Design Using Instruction-Level Data Locality Similarity

Domino Cache: An Energy-Efficient Data Cache for Modern Applications

Energy-efficient cache design in emerging mobile platforms: the implications and optimizations