Abstract
Mobile devices are quickly becoming the most widely used processors in consumer devices. Since their major power supply is battery, energy-efficient computing is highly desired. In this article, we focus on energy-efficient cache design in emerging mobile platforms. We observe that more than 40% of L2 cache accesses are OS kernel accesses in interactive smartphone applications. Such frequent kernel accesses cause serious interferences between the user and kernel blocks in the L2 cache, leading to unnecessary block replacements and high L2 cache miss rate. We first propose to statically partition the L2 cache into two separate segments, which can be accessed only by the user code and kernel code, respectively. Meanwhile, the overall size of the two segments is shrunk, which reduces the energy consumption while still maintaining the similar cache miss rate. We then find completely different access behaviors between the two separated kernel and user segments and explore the multi-retention STT-RAM-based user and kernel segments to obtain higher energy savings in this static partition-based cache design. Finally, we propose to dynamically partition the L2 cache into the user and kernel segments to minimize overall cache size. We also integrate the short-retention STT-RAM into this dynamic partition-based cache design for maximal energy savings. The experimental results show that our static technique reduces cache energy consumption by 75% with 2% performance loss, and our dynamic technique further shows strong capability to reduce cache energy consumption by 85% with only 3% performance loss.
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The gem5 Simulator. SIGARCH Computer Architecture News 39, 2, 1--7. DOI:http://dx.doi.org/10.1145/2024716.2024718 Google ScholarDigital Library
- Koushik Chakraborty, Philip M. Wells, and Gurindar S. Sohi. 2006. Computation Spreading: Employing hardware migration to specialize CMP cores on-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’06). ACM, New York, NY, 283--292. DOI:http://dx.doi.org/10.1145/1168857.1168893 Google ScholarDigital Library
- Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 308--319. DOI:http://dx.doi.org/10.1145/2485922.2485949 Google ScholarDigital Library
- Xiangyu Dong, Cong Xu, Yuan Xie, and N. P. Jouppi. 2012. NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 7, 994--1007. DOI:http://dx.doi.org/10.1109/TCAD.2012.2185930 Google ScholarDigital Library
- Inc Gartner. 2014. Forecast: PCs, Ultramobiles, and Mobile Phones, Worldwide, 2011-2018, 2Q14 Update. Retrieved May 21, 2017 from http://www.gartner.com/newsroom/id/2791017Google Scholar
- Saurabh Gupta and Huiyang Zhou. 2015. Spatial locality-aware cache partitioning for effective cache sharing. In 44rd International Conference on Parallel Processing (ICPP’15). Google ScholarDigital Library
- Anthony Gutierrez, Ronald G. Dreslinski, Thomas F. Wenisch, Trevor Mudge, Ali Saidi, Chris Emmons, and Nigel Paver. 2011. Full-system analysis and characterization of interactive smartphone applications. In Proceedings of the 2011 IEEE International Symposium on Workload Characterization (IISWC’11). IEEE Computer Society, Washington, DC, 81--90. DOI:http://dx.doi.org/10.1109/IISWC.2011.6114205 Google ScholarDigital Library
- Yongbing Huang, Zhongbin Zha, Mingyu Chen, and Lixin Zhang. 2014. Moby: A mobile benchmark suite for architectural simulators. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’14). 45--54. DOI:http://dx.doi.org/10.1109/ISPASS.2014.6844460 Google ScholarCross Ref
- Hsien-Hsin S. Lee and Gary S. Tyson. 2000. Region-based caching: An energy-delay efficient memory architecture for embedded processors. In Proceedings of the 2000 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’00). ACM, New York, NY, 120--127. DOI:http://dx.doi.org/10.1145/354880.354898 Google ScholarDigital Library
- Sheng Li, Jung-Ho Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). 469--480. Google ScholarDigital Library
- Tao Li and Lizy Kurian John. 2003. Routine based OS-aware microprocessor resource adaptation for run-time operating system power saving. In Proceedings of the 2003 International Symposium on Low Power Electronics and Design (ISLPED’03). ACM, New York, NY, 241--246. DOI:http://dx.doi.org/10.1145/871506.871565 Google ScholarDigital Library
- D. Pandiyan, S.-Y. Lee, C. Wu. 2013. Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite - MobileBench. In IEEE International Symposium on Workload Characterization. Google ScholarCross Ref
- D. Pandiyan and C. Wu. 2014. Quantifying the energy cost of data movement for emerging smart phone workloads on mobile platforms. In IEEE International Symposium on Workload Characterization. Google ScholarCross Ref
- Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). IEEE Computer Society, Washington, DC, 423--432. DOI:http://dx.doi.org/10.1109/MICRO.2006.49 Google ScholarDigital Library
- Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11). ACM, New York, NY, 57--68. DOI:http://dx.doi.org/10.1145/2000064.2000073 Google ScholarDigital Library
- Alex Shye, Benjamin Scholbrock, and Gokhan Memik. 2009. Into the wild: Studying real user activity patterns to guide power optimizations for mobile architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). ACM, New York, NY, 168--178. DOI:http://dx.doi.org/10.1145/1669112.1669135 Google ScholarDigital Library
- Zhenyu Sun, Xiuyuan Bi, Hai (Helen) Li, Weng-Fai Wong, Zhong-Liang Ong, Xiaochun Zhu, and Wenqing Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). ACM, New York, NY, 329--338. DOI:http://dx.doi.org/10.1145/2155620.2155659 Google ScholarDigital Library
- K. T. Sundararajan, V. Porpodas, T. M. Jones, N. P. Topham, and B. Franke. 2012. Cooperative partitioning: Energy-efficient cache partitioning for high-performance CMPs. In IEEE 18th International Symposium on High Performance Computer Architecture (HPCA’12). 1--12. DOI:http://dx.doi.org/10.1109/HPCA.2012.6169036 Google ScholarDigital Library
- Jue Wang, Xiangyu Dong, and Yuan Xie. 2013. OAP: An obstruction-aware cache management policy for STT-RAM last-level caches. In Design, Automation Test in Europe Conference Exhibition (DATE’13). 847--852. DOI:http://dx.doi.org/10.7873/DATE.2013.179 Google ScholarCross Ref
- Ruisheng Wang and Lizhong Chen. 2014. Futility scaling: High-associativity cache partitioning. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). 356--367. DOI:http://dx.doi.org/10.1109/MICRO.2014.46 Google ScholarDigital Library
- S. J. E. Wilton and N. P. Jouppi. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits 31, 5, 677--688. DOI:http://dx.doi.org/10.1109/4.509850 Google ScholarCross Ref
- Kaige Yan and Xin Fu. 2015. Energy-efficient cache design in emerging mobile platforms: The implications and optimizations. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE’15). EDA Consortium, San Jose, CA, 375--380. http://dl.acm.org/citation.cfm?id=2755753.2755838 Google ScholarCross Ref
- Ying Ye, Richard West, Zhuoqun Cheng, and Ye Li. 2014. COLORIS: A dynamic cache partitioning system using page coloring. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT’14). ACM, New York, NY, 381--392. DOI:http://dx.doi.org/10.1145/2628071.2628104 Google ScholarDigital Library
- Yuhao Zhu and Vijay Janapa Reddi. 2013. High-performance and energy-efficient mobile web browsing on big/little systems. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13). IEEE Computer Society, Washington, DC, 13--24. DOI:http://dx.doi.org/10.1109/HPCA.2013.6522303 Google ScholarDigital Library
- Yuhao Zhu and Vijay Janapa Reddi. 2014. WebCore: Architectural support for mobile web browsing. In Proceedings of the 41st International Symposium on Computer Architecture (ISCA’14). Google ScholarCross Ref
Index Terms
- Exploring Energy-Efficient Cache Design in Emerging Mobile Platforms
Recommendations
Energy-Efficient GPU L2 Cache Design Using Instruction-Level Data Locality Similarity
This article presents a novel energy-efficient cache design for massively parallel, throughput-oriented architectures like GPUs. Unlike L1 data cache on modern GPUs, L2 cache shared by all of the streaming multiprocessors is not the primary performance ...
Domino Cache: An Energy-Efficient Data Cache for Modern Applications
The energy consumption for processing modern workloads is challenging in data centers. Due to the large datasets of cloud workloads, the miss rate of the L1 data cache is high, and with respect to the energy efficiency concerns, such misses are costly ...
Energy-efficient cache design in emerging mobile platforms: the implications and optimizations
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & ExhibitionMobile devices are quickly becoming the most widely used processors in consumer devices. Since their major power supply is battery, the energy-efficient computing is highly desired. In this paper, we focus on the energy-efficient cache design in ...
Comments