skip to main content
10.1145/2540708.2540724acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Linearly compressed pages: a low-complexity, low-latency main memory compression framework

Published:07 December 2013Publication History

ABSTRACT

Data compression is a promising approach for meeting the increasing memory capacity demands expected in future systems. Unfortunately, existing compression algorithms do not translate well when directly applied to main memory because they require the memory controller to perform non-trivial computation to locate a cache line within a compressed memory page, thereby increasing access latency and degrading system performance. Prior proposals for addressing this performance degradation problem are either costly or energy inefficient.

By leveraging the key insight that all cache lines within a page should be compressed to the same size, this paper proposes a new approach to main memory compression--Linearly Compressed Pages (LCP)--that avoids the performance degradation problem without requiring costly or energy-inefficient hardware. We show that any compression algorithm can be adapted to fit the requirements of LCP, and we specifically adapt two previously-proposed compression algorithms to LCP: Frequent Pattern Compression and Base-Delta-Immediate Compression.

Evaluations using benchmarks from SPEC CPU2006 and five server benchmarks show that our approach can significantly increase the effective memory capacity (by 69% on average). In addition to the capacity gains, we evaluate the benefit of transferring consecutive compressed cache lines between the memory controller and main memory. Our new mechanism considerably reduces the memory bandwidth requirements of most of the evaluated benchmarks (by 24% on average), and improves overall performance (by 6.1%/13.9%/10.7% for single-/two-/four-core workloads on average) compared to a baseline system that does not employ main memory compression. LCP also decreases energy consumed by the main memory subsystem (by 9.5% on average over the best prior mechanism).

References

  1. B. Abali et al. Memory Expansion Technology (MXT): Software Support and Performance. IBM J. Res. Dev., 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. R. Alameldeen and D. A. Wood. Adaptive Cache Compression for High-Performance Processors. In ISCA-31, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. R. Alameldeen and D. A. Wood. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches. Tech. Rep., 2004.Google ScholarGoogle Scholar
  4. E. D. Berger. Memory Management for High-Performance Applications. PhD thesis, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. X. Chen et al. C-Pack: A High-Performance Microprocessor Cache Compression Algorithm. IEEE Transactions on VLSI Systems, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Cooper-Balis, P. Rosenfeld, and B. Jacob. Buffer-On-Board Memory Systems. In ISCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. S. de Castro, A. P. do Lago, and D. Da Silva. Adaptive Compressed Caching: Design and Implementation. In SBAC-PAD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Douglis. The Compression Cache: Using On-line Compression to Extend Physical Memory. In Winter USENIX Conference, 1993.Google ScholarGoogle Scholar
  9. J. Dusser et al. Zero-Content Augmented Caches. In ICS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Ekman and P. Stenström. A Robust Main-Memory Compression Scheme. In ISCA-32, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Farrens and A. Park. Dynamic Base Register Caching: A Technique for Reducing Address Bus Width. In ISCA, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. G. Hallnor and S. K. Reinhardt. A Unified Compressed Memory Hierarchy. In HPCA-11, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Huffman. A Method for the Construction of Minimum-Redundancy Codes. IRE, 1952.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Iacobovici et al. Effective Stream-Based and Execution-Based Data Prefetching. In ICS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual, 2013.Google ScholarGoogle Scholar
  16. JEDEC. GDDR3 Specific SGRAM Functions, JESD21-C, 2012.Google ScholarGoogle Scholar
  17. U. Kang et al. 8Gb 3D DDR3 DRAM Using Through-Silicon-Via Technology. In ISSCC, 2009.Google ScholarGoogle Scholar
  18. S. F. Kaplan. Compressed Caching and Modern Virtual Memory Simulation. PhD thesis, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Lefurgy et al. Energy Management for Commercial Servers. In IEEE Computer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Li, C. Ding, and K. Shen. Quantifying the Cost of Context Switch. In ExpCS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Li et al. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In MICRO-42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Micron. 2Gb: x4, x8, x16, DDR3 SDRAM, 2012.Google ScholarGoogle Scholar
  24. H. Patil et al. Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation. In MICRO-37, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Pekhimenko et al. Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches. In PACT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Pekhimenko et al. Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency. In SAFARI Technical Report No. 2012--002, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. V. Sathish, M. J. Schulte, and N. S. Kim. Lossless and Lossy Memory I/O Link Compression for Improving Performance of GPGPU Workloads. In PACT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Snavely and D. M. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreaded Processor. In ASPLOS-9, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. SPEC CPU2006. http://www.spec.org/.Google ScholarGoogle Scholar
  30. S. Srinath et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In HPCA-13, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical Report HPL-2008-20, HP Laboratories, 2008.Google ScholarGoogle Scholar
  32. M. Thuresson et al. Memory-Link Compression Schemes: A Value Locality Perspective. IEEE TC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Transaction Processing Performance Council. http://www.tpc.org/.Google ScholarGoogle Scholar
  34. R. B. Tremaine et al. Pinnacle: IBM MXT in a Memory Controller Chip. IEEE Micro, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. R. Wilson, S. F. Kaplan, and Y. Smaragdakis. The Case for Compressed Caching in Virtual Memory Systems. In USENIX Annual Technical Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Yang, R. Gupta, and C. Zhang. Frequent Value Encoding for Low Power Data Buses. ACM TODAES, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Yang, Y. Zhang, and R. Gupta. Frequent Value Compression in Data Caches. In MICRO-33, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. H. Yoon, M. K. Jeong, M. Sullivan, and M. Erez. The Dynamic Granularity Memory System. In ISCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Zhang, J. Yang, and R. Gupta. Frequent Value Locality and Value-Centric Data Cache Design. In ASPLOS-9, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE TIT, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Linearly compressed pages: a low-complexity, low-latency main memory compression framework

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
                December 2013
                498 pages
                ISBN:9781450326384
                DOI:10.1145/2540708

                Copyright © 2013 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 7 December 2013

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                MICRO-46 Paper Acceptance Rate39of239submissions,16%Overall Acceptance Rate484of2,242submissions,22%

                Upcoming Conference

                MICRO '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader