research-article

Free Access

Could Compression Be of General Use? Evaluating Memory Compression across Domains

Authors:
Somayeh Sardashti

University of Wisconsin—Madison, Madison, USA

University of Wisconsin—Madison, Madison, USA
View Profile

,
David A. Wood

University of Wisconsin—Madison, Madison, USA

University of Wisconsin—Madison, Madison, USA
View Profile

ACM Transactions on Architecture and Code Optimization Volume 14 Issue 4Article No.: 44pp 1–24https://doi.org/10.1145/3138805

Published:05 December 2017Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Recent proposals present compression as a cost-effective technique to increase cache and memory capacity and bandwidth. While these proposals show potentials of compression, there are several open questions to adopt these proposals in real systems including the following: (1) Do these techniques work for real-world workloads running for long time? (2) Which application domains would potentially benefit the most from compression? (3) At which level of memory hierarchy should we apply compression: caches, main memory, or both?

In this article, our goal is to shed light on some main questions on applicability of compression. We evaluate compression in the memory hierarchy for selected examples from different application classes. We analyze real applications with real data and complete runs of several benchmarks. While simulators provide a pretty accurate framework to study potential performance/energy impacts of ideas, they mostly limit us to a small range of workloads with short runtimes. To enable studying real workloads, we introduce a fast and simple methodology to get samples of memory and cache contents of a real machine (a desktop or a server). Compared to a cycle-accurate simulator, our methodology allows us to study real workloads as well as benchmarks. Our toolset is not a replacement for simulators but mostly complements them. While we can use a simulator to measure performance/energy impact of a particular compression proposal, here with our methodology we can study the potentials with long running workloads in early stages of the design.

Using our toolset, we evaluate a collection of workloads from different domains, such as a web server of CS department of UW—Madison for 24h, Google Chrome (watching a 1h-long movie on YouTube), and Linux games (playing for about an hour). We also use several benchmarks from different domains, including SPEC, mobile, and big data. We run these benchmarks to completion.

Using these workloads and our toolset, we analyze different compression properties for both real applications and benchmarks. We focus on eight main hypotheses on compression, derived from previous work on compression. These properties (Table 2) act as foundation of several proposals on compression, so performance of those proposals depends very much on these basic properties.

Overall, our results suggest that compression could be of general use both in main memory and caches. On average, the compression ratio is ≥2 for 64% and 54% of workloads, respectively, for memory and cache data. Our evaluation indicates significant potential for both cache and memory compression, with higher compressibility in memory due to abundance of zero blocks. Among application domains we studied, servers show on average the highest compressibility, while our mobile benchmarks show the lowest compressibility.

For comparing benchmarks with real workloads, we show that (1) it is critical to run benchmarks to completion or considerably long runtimes to avoid biased conclusions, and (2) SPEC benchmarks are good representative of real Desktop applications in terms of compressibility of their datasets. However, this does not hold for all compression properties. For example, SPEC benchmarks have much better compression locality (i.e., neighboring blocks have similar compressibility) than real workloads. Thus, it is critical for designers to consider wider range of workloads, including real applications, to evaluate their compression techniques.

Supplemental Material

Available for Download

pdf

taco1404-44.pdf (801.1 KB)

Slide deck associated with this paper

References

B. Abali, H. Franke, X. Shen, D. Poff, and T. Smith. 2001. Performance of hardware compressed main memory. In Proceedings of the 7th IEEE Symposium on High-Performance Computer Architecture. Google ScholarDigital Library
Alaa R. Alameldeen and David A. Wood. 2004. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA-31). Google ScholarDigital Library
Apple OS X Mavericks. 2013. Retrieved from http://www.apple.com/media/us/osx/2013/docs/OSX_Mavericks_Core_Technology_Overview.pdf.Google Scholar
Angelos Arelakis and P. Stenstrom. 2014. Sc2: A statistical compression cache scheme. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, 145--156. Google ScholarDigital Library
Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, Junghee Lee, and Jongman Kim. 2013. ECM: Effective capacity maximizer for high-performance compressed caching. In Proceedings of IEEE Symposium on High-Performance Computer Architecture. Google ScholarDigital Library
Á. Beszédes, R. Ferenc, T. Gyimóthy, A. Dolenc, and K. Karsisto. 2003. Survey of code-size reduction methods. ACM Comput. Surv. 35, 3 (2003), 223--267. Google ScholarDigital Library
N. Binkert, B. Beckmann, G. Black, S. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. Hill, and D. Wood. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News. 1--7. Google ScholarDigital Library
M. Burtscher and P. Ratanaworabhan. 2007. High throughput compression of double-precision floating-point data. Data Compression Conference. Google ScholarDigital Library
M. Burtscher and P. Ratanaworabhan. 2010. gFPC: A self-tuning compression algorithm. In Proceedings of the Data Compression Conference. Google ScholarDigital Library
I. Chen, P. Bird, and T. Mudge. 1997. The impact of instruction compression on I-cache performance. Tech. Rep. CSE-TR-330--97, EECS Department, University of Michigan.Google Scholar
Xi Chen, Lei Yang, Robert P. Dick, Li Shang, and Haris Lekatsas. 2010. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 1196--1208. Google ScholarDigital Library
Yann Collet and Chip Turner. 2016. Facebook zstandard compression: Smaller and faster data compression with zstandard. Retrieved from https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/.Google Scholar
Coremark. Retrieved from www.coremark.org.Google Scholar
Arelakis F. Dahlgren and P. Stenstrom. 2015. Hycomp: A hybrid cache compression method for selection of data-type-specific compression methods. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, 38--49. Google ScholarDigital Library
Julien Dusser and Andre Seznec. 2011. Decoupled zero-compressed memory. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. Google ScholarDigital Library
M. Ekman and P. Stenstrom. 2005. A robust main-memory compression scheme. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. 74--85. Google ScholarDigital Library
E. Hallnor and S. Reinhardt. 2005. A unified compressed memory hierarchy. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture. Google ScholarDigital Library
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. Popescu, A. Ailamaki, and B. Falsafi. 2012. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). Google ScholarDigital Library
J. Gandhi, A. Basu, M. Hill, and M. Swift 2014. BadgerTrap: A tool to instrument x86-64 TLB misses. SIGARCH Computer Architecture News (CAN), 2014 Google ScholarDigital Library
Jayesh Gaur, Alaa R. Alameldeen, and Sreenivas Subramoney. 2016. Base-victim compression: An opportunistic cache compression architecture. In Proceedings of the 43th Annual International Symposium on Computer Architecture (ISCA’16). Google ScholarDigital Library
R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. 2010. Introducing the Graph 500. Cray User Group 2010 Proceedings.Google Scholar
A. Gutierrez, R. Dreslinski, T. Wenisch, T. Mudge, A. Saidi, C. Emmons, and N. Paver. 2011. Full-system analysis and characterization of interactive smartphone applications. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'11). Google ScholarDigital Library
G. Hamerly, E. Perelman, J. Lau, and B. Calder. 2005. SimPoint 3.0: Faster and more flexible program analysis. In Proceedings of the Workshop on Modeling, Benchmarking and Simulation.Google Scholar
Y. Jin and R. Chen 2000. Instruction Cache Compression for Embedded Systems. Berkley Wireless Research Center,” Technical Report, 2000.Google Scholar
K. Kant and R. Iyer. 2002. Compressibility characteristics of address/data transfers in commercial workloads. In Proceedings of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads. 59--67.Google Scholar
Nam Sung Kim, Todd Austin, and Trevor Mudge. 2002. Low-energy data cache using sign compression and cache line bisection. In Proceedings of the 2nd Annual Workshop on Memory Performance Issues.Google Scholar
Soontae Kim, Jesung Kim, Jongmin Lee, and Seokin Hong. 2011. Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. Google ScholarDigital Library
Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proceedings of the 43th Annual International Symposium on Computer Architecture (ISCA’16) Google ScholarDigital Library
Jang-Soo Lee, Won-Kee Hong, and Shin-Dug Kim. 2000. An on-chip cache compression technique to reduce decompression overhead and design complexity. Journal of Systems Architecture: The EUROMICRO Journal 46, 15 (2000), 1365--1382. 2000. Google ScholarDigital Library
Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Won Woo Ro, and Murali Annavaram. 2015. Warped-compression: Enabling power efficient GPUs through register compression. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’15). Google ScholarDigital Library
C. Lefurgy, P. Bird, I. Chen, and T. Mudge. 1997. Improving code density using compression techniques. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. 194--203. Google ScholarDigital Library
N. R. Mahapatra, J. Liu, K. Sundaresan, S. Dangeti, and B. V. Venkatrao. 2003. The potential of compression to improve memory system performance, power consumption, and cost. In Proceedings of IEEE Performance, Computing and Communications Conference.Google Scholar
N. R. Mahapatra, J. Liu, K. Sundaresan, S. Dangeti, and B. V. Venkatrao 2005. A limit study on the potential of compression for improving memory system performance, power consumption, and cost. J. Instruct.-Level Parallel. 7 (2005), 1--37.Google Scholar
Sparsh Mittal and Jeffrey S. Vetter. 2016. A survey of architectural approaches for data compression in cache and main memory systems. IEEE Transactions on Parallel and Distributed Systems, 2016. Google ScholarDigital Library
Tri M. Nguyen and David Wentzlaff. 2015. MORC: A manycore-oriented compressed cache. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’15). Google ScholarDigital Library
Poovaiah M. Palangappa and Kartik Mohanram. 2016. CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’16).Google Scholar
Poovaiah M. Palangappa and Kartik Mohanram. 2017, CompEx++: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVMs. ACM Transactions on Architecture and Code Optimization (TACO), 2017. Google ScholarDigital Library
Biswabandan Panda (INRIA) and André Seznec. 2016. Dictionary sharing: An efficient cache compression scheme for compressed caches. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture, 2016.Google Scholar
Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, Stephen W. Keckler. 2016. A case for toggle-aware compression for GPU systems. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’16), 2016.Google ScholarCross Ref
G. Pekhimenko, T. Huberty, R. Cai, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2015. Exploiting compressed block size as an indicator of future reuse. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 51--63.Google Scholar
Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT'12). ACM, New York, NY, 377--388. Google ScholarDigital Library
Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. Linearly compressed pages: A low-complexity, low-latency main memory compression framework. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture, 2013. Google ScholarDigital Library
P. Ratanaworabhan, J. Ke, and M. Burtscher. 2006. Fast lossless compression of scientific floating-point data. In Proceedings of the Data Compression Conference. Google ScholarDigital Library
Somayeh Sardashti and David A. Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture. Google ScholarDigital Library
Somayeh Sardashti, Angelos Arelakis, Per Stenstrom, and David A. Wood. 2015. A primer on compression in the memory hierarchy. Morgan and Claypool. Google ScholarDigital Library
Somayeh Sardashti, Andre Seznec, and David A. Wood. 2014. Skewed compressed caches. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). Google ScholarDigital Library
Somayeh Sardashti, Andre Seznec, and David A. Wood. 2016. Yet another compressed cache: A low-cost yet effective compressed cache. ACM Transactions on Architecture and Code Optimization (TACO), 2016. Google ScholarDigital Library
Vijay Sathish, Michael J. Schulte, and Nam Sung Kim. 2012. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
Ali Shafiee, Meysam Taassori, Rajeev Balasubramonian, and Al Davis. 2014. Memzip: Exploiting unconventional benefits from memory compression. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’14).Google ScholarCross Ref
Luis Villa, Michael Zhang, and Krste Asanovic. 2000. Dynamic zero compression for cache energy reduction. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarDigital Library
Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, and Onur Mutlu. 2015. A case for core-assisted bottleneck acceleration in GPUs: Enabling flexible data compression with assist warps. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’15). Google ScholarDigital Library
C. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely, and J. Emer. 2011. SHiP: Signature-based hit predictor for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. Google ScholarDigital Library
Jun Yang and Rajiv Gupta. 2002. Frequent value locality and its applications. ACM Trans. Embed. Comput. Syst. 2002. Google ScholarDigital Library
Jun Yang, Youtao Zhang, and Rajiv Gupta. 2000. Frequent value compression in data caches. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’00). Google ScholarDigital Library
D. Yoon, M. Jeong, and M. Erez. 2011. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput. In Proceeding of the 38th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Vinson Young, Prashant J. Nair, Moinuddin K. Qureshi. 2017. DICE: Compressing DRAM Caches for bandwidth and capacity. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). Google ScholarDigital Library

Index Terms

Could Compression Be of General Use? Evaluating Memory Compression across Domains
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures

Recommendations

Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache

Cache memories play a critical role in bridging the latency, bandwidth, and energy gaps between cores and off-chip memory. However, caches frequently consume a significant fraction of a multicore chip's area and thus account for a significant fraction ...
Read More
Base-victim compression: an opportunistic cache compression architecture
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture

The memory wall has motivated many enhancements to cache management policies aimed at reducing misses. Cache compression has been proposed to increase effective cache capacity, which potentially reduces capacity and conflict misses. However, complexity ...
Read More
Base-victim compression: an opportunistic cache compression architecture
ISCA'16

The memory wall has motivated many enhancements to cache management policies aimed at reducing misses. Cache compression has been proposed to increase effective cache capacity, which potentially reduces capacity and conflict misses. However, complexity ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 14, Issue 4
December 2017
600 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3154814
Editor:
Koen De Bosschere
Ghent University
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 December 2017
- Accepted: 1 September 2017
- Revised: 1 July 2017
- Received: 1 June 2016
Published in taco Volume 14, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Compression
cache and memory design
energy efficiency
multi-core systems
performance
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 843
  Total Downloads
- Downloads (Last 12 months)117
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.