skip to main content
research-article
Free Access

Could Compression Be of General Use? Evaluating Memory Compression across Domains

Published:05 December 2017Publication History
Skip Abstract Section

Abstract

Recent proposals present compression as a cost-effective technique to increase cache and memory capacity and bandwidth. While these proposals show potentials of compression, there are several open questions to adopt these proposals in real systems including the following: (1) Do these techniques work for real-world workloads running for long time? (2) Which application domains would potentially benefit the most from compression? (3) At which level of memory hierarchy should we apply compression: caches, main memory, or both?

In this article, our goal is to shed light on some main questions on applicability of compression. We evaluate compression in the memory hierarchy for selected examples from different application classes. We analyze real applications with real data and complete runs of several benchmarks. While simulators provide a pretty accurate framework to study potential performance/energy impacts of ideas, they mostly limit us to a small range of workloads with short runtimes. To enable studying real workloads, we introduce a fast and simple methodology to get samples of memory and cache contents of a real machine (a desktop or a server). Compared to a cycle-accurate simulator, our methodology allows us to study real workloads as well as benchmarks. Our toolset is not a replacement for simulators but mostly complements them. While we can use a simulator to measure performance/energy impact of a particular compression proposal, here with our methodology we can study the potentials with long running workloads in early stages of the design.

Using our toolset, we evaluate a collection of workloads from different domains, such as a web server of CS department of UW—Madison for 24h, Google Chrome (watching a 1h-long movie on YouTube), and Linux games (playing for about an hour). We also use several benchmarks from different domains, including SPEC, mobile, and big data. We run these benchmarks to completion.

Using these workloads and our toolset, we analyze different compression properties for both real applications and benchmarks. We focus on eight main hypotheses on compression, derived from previous work on compression. These properties (Table 2) act as foundation of several proposals on compression, so performance of those proposals depends very much on these basic properties.

Overall, our results suggest that compression could be of general use both in main memory and caches. On average, the compression ratio is ≥2 for 64% and 54% of workloads, respectively, for memory and cache data. Our evaluation indicates significant potential for both cache and memory compression, with higher compressibility in memory due to abundance of zero blocks. Among application domains we studied, servers show on average the highest compressibility, while our mobile benchmarks show the lowest compressibility.

For comparing benchmarks with real workloads, we show that (1) it is critical to run benchmarks to completion or considerably long runtimes to avoid biased conclusions, and (2) SPEC benchmarks are good representative of real Desktop applications in terms of compressibility of their datasets. However, this does not hold for all compression properties. For example, SPEC benchmarks have much better compression locality (i.e., neighboring blocks have similar compressibility) than real workloads. Thus, it is critical for designers to consider wider range of workloads, including real applications, to evaluate their compression techniques.

Skip Supplemental Material Section

Supplemental Material

References

  1. B. Abali, H. Franke, X. Shen, D. Poff, and T. Smith. 2001. Performance of hardware compressed main memory. In Proceedings of the 7th IEEE Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alaa R. Alameldeen and David A. Wood. 2004. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA-31). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Apple OS X Mavericks. 2013. Retrieved from http://www.apple.com/media/us/osx/2013/docs/OSX_Mavericks_Core_Technology_Overview.pdf.Google ScholarGoogle Scholar
  4. Angelos Arelakis and P. Stenstrom. 2014. Sc2: A statistical compression cache scheme. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, 145--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, Junghee Lee, and Jongman Kim. 2013. ECM: Effective capacity maximizer for high-performance compressed caching. In Proceedings of IEEE Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Á. Beszédes, R. Ferenc, T. Gyimóthy, A. Dolenc, and K. Karsisto. 2003. Survey of code-size reduction methods. ACM Comput. Surv. 35, 3 (2003), 223--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Binkert, B. Beckmann, G. Black, S. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. Hill, and D. Wood. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News. 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Burtscher and P. Ratanaworabhan. 2007. High throughput compression of double-precision floating-point data. Data Compression Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Burtscher and P. Ratanaworabhan. 2010. gFPC: A self-tuning compression algorithm. In Proceedings of the Data Compression Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. Chen, P. Bird, and T. Mudge. 1997. The impact of instruction compression on I-cache performance. Tech. Rep. CSE-TR-330--97, EECS Department, University of Michigan.Google ScholarGoogle Scholar
  11. Xi Chen, Lei Yang, Robert P. Dick, Li Shang, and Haris Lekatsas. 2010. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 1196--1208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yann Collet and Chip Turner. 2016. Facebook zstandard compression: Smaller and faster data compression with zstandard. Retrieved from https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/.Google ScholarGoogle Scholar
  13. Coremark. Retrieved from www.coremark.org.Google ScholarGoogle Scholar
  14. Arelakis F. Dahlgren and P. Stenstrom. 2015. Hycomp: A hybrid cache compression method for selection of data-type-specific compression methods. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, 38--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Julien Dusser and Andre Seznec. 2011. Decoupled zero-compressed memory. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Ekman and P. Stenstrom. 2005. A robust main-memory compression scheme. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Hallnor and S. Reinhardt. 2005. A unified compressed memory hierarchy. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. Popescu, A. Ailamaki, and B. Falsafi. 2012. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Gandhi, A. Basu, M. Hill, and M. Swift 2014. BadgerTrap: A tool to instrument x86-64 TLB misses. SIGARCH Computer Architecture News (CAN), 2014 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jayesh Gaur, Alaa R. Alameldeen, and Sreenivas Subramoney. 2016. Base-victim compression: An opportunistic cache compression architecture. In Proceedings of the 43th Annual International Symposium on Computer Architecture (ISCA’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. 2010. Introducing the Graph 500. Cray User Group 2010 Proceedings.Google ScholarGoogle Scholar
  22. A. Gutierrez, R. Dreslinski, T. Wenisch, T. Mudge, A. Saidi, C. Emmons, and N. Paver. 2011. Full-system analysis and characterization of interactive smartphone applications. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Hamerly, E. Perelman, J. Lau, and B. Calder. 2005. SimPoint 3.0: Faster and more flexible program analysis. In Proceedings of the Workshop on Modeling, Benchmarking and Simulation.Google ScholarGoogle Scholar
  24. Y. Jin and R. Chen 2000. Instruction Cache Compression for Embedded Systems. Berkley Wireless Research Center,” Technical Report, 2000.Google ScholarGoogle Scholar
  25. K. Kant and R. Iyer. 2002. Compressibility characteristics of address/data transfers in commercial workloads. In Proceedings of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads. 59--67.Google ScholarGoogle Scholar
  26. Nam Sung Kim, Todd Austin, and Trevor Mudge. 2002. Low-energy data cache using sign compression and cache line bisection. In Proceedings of the 2nd Annual Workshop on Memory Performance Issues.Google ScholarGoogle Scholar
  27. Soontae Kim, Jesung Kim, Jongmin Lee, and Seokin Hong. 2011. Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proceedings of the 43th Annual International Symposium on Computer Architecture (ISCA’16) Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jang-Soo Lee, Won-Kee Hong, and Shin-Dug Kim. 2000. An on-chip cache compression technique to reduce decompression overhead and design complexity. Journal of Systems Architecture: The EUROMICRO Journal 46, 15 (2000), 1365--1382. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Won Woo Ro, and Murali Annavaram. 2015. Warped-compression: Enabling power efficient GPUs through register compression. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Lefurgy, P. Bird, I. Chen, and T. Mudge. 1997. Improving code density using compression techniques. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. 194--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. R. Mahapatra, J. Liu, K. Sundaresan, S. Dangeti, and B. V. Venkatrao. 2003. The potential of compression to improve memory system performance, power consumption, and cost. In Proceedings of IEEE Performance, Computing and Communications Conference.Google ScholarGoogle Scholar
  33. N. R. Mahapatra, J. Liu, K. Sundaresan, S. Dangeti, and B. V. Venkatrao 2005. A limit study on the potential of compression for improving memory system performance, power consumption, and cost. J. Instruct.-Level Parallel. 7 (2005), 1--37.Google ScholarGoogle Scholar
  34. Sparsh Mittal and Jeffrey S. Vetter. 2016. A survey of architectural approaches for data compression in cache and main memory systems. IEEE Transactions on Parallel and Distributed Systems, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tri M. Nguyen and David Wentzlaff. 2015. MORC: A manycore-oriented compressed cache. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Poovaiah M. Palangappa and Kartik Mohanram. 2016. CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’16).Google ScholarGoogle Scholar
  37. Poovaiah M. Palangappa and Kartik Mohanram. 2017, CompEx++: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVMs. ACM Transactions on Architecture and Code Optimization (TACO), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Biswabandan Panda (INRIA) and André Seznec. 2016. Dictionary sharing: An efficient cache compression scheme for compressed caches. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture, 2016.Google ScholarGoogle Scholar
  39. Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, Stephen W. Keckler. 2016. A case for toggle-aware compression for GPU systems. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’16), 2016.Google ScholarGoogle ScholarCross RefCross Ref
  40. G. Pekhimenko, T. Huberty, R. Cai, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2015. Exploiting compressed block size as an indicator of future reuse. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 51--63.Google ScholarGoogle Scholar
  41. Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT'12). ACM, New York, NY, 377--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. Linearly compressed pages: A low-complexity, low-latency main memory compression framework. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. P. Ratanaworabhan, J. Ke, and M. Burtscher. 2006. Fast lossless compression of scientific floating-point data. In Proceedings of the Data Compression Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Somayeh Sardashti and David A. Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Somayeh Sardashti, Angelos Arelakis, Per Stenstrom, and David A. Wood. 2015. A primer on compression in the memory hierarchy. Morgan and Claypool. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Somayeh Sardashti, Andre Seznec, and David A. Wood. 2014. Skewed compressed caches. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Somayeh Sardashti, Andre Seznec, and David A. Wood. 2016. Yet another compressed cache: A low-cost yet effective compressed cache. ACM Transactions on Architecture and Code Optimization (TACO), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Vijay Sathish, Michael J. Schulte, and Nam Sung Kim. 2012. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ali Shafiee, Meysam Taassori, Rajeev Balasubramonian, and Al Davis. 2014. Memzip: Exploiting unconventional benefits from memory compression. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’14).Google ScholarGoogle ScholarCross RefCross Ref
  50. Luis Villa, Michael Zhang, and Krste Asanovic. 2000. Dynamic zero compression for cache energy reduction. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, and Onur Mutlu. 2015. A case for core-assisted bottleneck acceleration in GPUs: Enabling flexible data compression with assist warps. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. C. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely, and J. Emer. 2011. SHiP: Signature-based hit predictor for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Jun Yang and Rajiv Gupta. 2002. Frequent value locality and its applications. ACM Trans. Embed. Comput. Syst. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Jun Yang, Youtao Zhang, and Rajiv Gupta. 2000. Frequent value compression in data caches. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. D. Yoon, M. Jeong, and M. Erez. 2011. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput. In Proceeding of the 38th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Vinson Young, Prashant J. Nair, Moinuddin K. Qureshi. 2017. DICE: Compressing DRAM Caches for bandwidth and capacity. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Could Compression Be of General Use? Evaluating Memory Compression across Domains

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 14, Issue 4
      December 2017
      600 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/3154814
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 December 2017
      • Accepted: 1 September 2017
      • Revised: 1 July 2017
      • Received: 1 June 2016
      Published in taco Volume 14, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader