research-article

Linearly compressed pages: a low-complexity, low-latency main memory compression framework

Authors:
Gennady Pekhimenko

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Vivek Seshadri

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Yoongu Kim

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Hongyi Xin

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Onur Mutlu

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Phillip B. Gibbons

Intel Labs Pittsburgh

Intel Labs Pittsburgh
View Profile

,
Michael A. Kozuch

Intel Labs Pittsburgh

Intel Labs Pittsburgh
View Profile

,
Todd C. Mowry

Carnegie Mellon University

Carnegie Mellon University
View Profile

MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on MicroarchitectureDecember 2013Pages 172–184https://doi.org/10.1145/2540708.2540724

Published:07 December 2013Publication History

MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 172–184

ABSTRACT

Data compression is a promising approach for meeting the increasing memory capacity demands expected in future systems. Unfortunately, existing compression algorithms do not translate well when directly applied to main memory because they require the memory controller to perform non-trivial computation to locate a cache line within a compressed memory page, thereby increasing access latency and degrading system performance. Prior proposals for addressing this performance degradation problem are either costly or energy inefficient.

By leveraging the key insight that all cache lines within a page should be compressed to the same size, this paper proposes a new approach to main memory compression--Linearly Compressed Pages (LCP)--that avoids the performance degradation problem without requiring costly or energy-inefficient hardware. We show that any compression algorithm can be adapted to fit the requirements of LCP, and we specifically adapt two previously-proposed compression algorithms to LCP: Frequent Pattern Compression and Base-Delta-Immediate Compression.

Evaluations using benchmarks from SPEC CPU2006 and five server benchmarks show that our approach can significantly increase the effective memory capacity (by 69% on average). In addition to the capacity gains, we evaluate the benefit of transferring consecutive compressed cache lines between the memory controller and main memory. Our new mechanism considerably reduces the memory bandwidth requirements of most of the evaluated benchmarks (by 24% on average), and improves overall performance (by 6.1%/13.9%/10.7% for single-/two-/four-core workloads on average) compared to a baseline system that does not employ main memory compression. LCP also decreases energy consumed by the main memory subsystem (by 9.5% on average over the best prior mechanism).

References

B. Abali et al. Memory Expansion Technology (MXT): Software Support and Performance. IBM J. Res. Dev., 2001. Google ScholarDigital Library
A. R. Alameldeen and D. A. Wood. Adaptive Cache Compression for High-Performance Processors. In ISCA-31, 2004. Google ScholarDigital Library
A. R. Alameldeen and D. A. Wood. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches. Tech. Rep., 2004.Google Scholar
E. D. Berger. Memory Management for High-Performance Applications. PhD thesis, 2002. Google ScholarDigital Library
X. Chen et al. C-Pack: A High-Performance Microprocessor Cache Compression Algorithm. IEEE Transactions on VLSI Systems, 2010. Google ScholarDigital Library
E. Cooper-Balis, P. Rosenfeld, and B. Jacob. Buffer-On-Board Memory Systems. In ISCA, 2012. Google ScholarDigital Library
R. S. de Castro, A. P. do Lago, and D. Da Silva. Adaptive Compressed Caching: Design and Implementation. In SBAC-PAD, 2003. Google ScholarDigital Library
F. Douglis. The Compression Cache: Using On-line Compression to Extend Physical Memory. In Winter USENIX Conference, 1993.Google Scholar
J. Dusser et al. Zero-Content Augmented Caches. In ICS, 2009. Google ScholarDigital Library
M. Ekman and P. Stenström. A Robust Main-Memory Compression Scheme. In ISCA-32, 2005. Google ScholarDigital Library
M. Farrens and A. Park. Dynamic Base Register Caching: A Technique for Reducing Address Bus Width. In ISCA, 1991. Google ScholarDigital Library
E. G. Hallnor and S. K. Reinhardt. A Unified Compressed Memory Hierarchy. In HPCA-11, 2005. Google ScholarDigital Library
D. Huffman. A Method for the Construction of Minimum-Redundancy Codes. IRE, 1952.Google ScholarCross Ref
S. Iacobovici et al. Effective Stream-Based and Execution-Based Data Prefetching. In ICS, 2004. Google ScholarDigital Library
Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual, 2013.Google Scholar
JEDEC. GDDR3 Specific SGRAM Functions, JESD21-C, 2012.Google Scholar
U. Kang et al. 8Gb 3D DDR3 DRAM Using Through-Silicon-Via Technology. In ISSCC, 2009.Google Scholar
S. F. Kaplan. Compressed Caching and Modern Virtual Memory Simulation. PhD thesis, 1999. Google ScholarDigital Library
C. Lefurgy et al. Energy Management for Commercial Servers. In IEEE Computer, 2003. Google ScholarDigital Library
C. Li, C. Ding, and K. Shen. Quantifying the Cost of Context Switch. In ExpCS, 2007. Google ScholarDigital Library
S. Li et al. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In MICRO-42, 2009. Google ScholarDigital Library
P. S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 2002. Google ScholarDigital Library
Micron. 2Gb: x4, x8, x16, DDR3 SDRAM, 2012.Google Scholar
H. Patil et al. Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation. In MICRO-37, 2004. Google ScholarDigital Library
G. Pekhimenko et al. Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches. In PACT, 2012. Google ScholarDigital Library
G. Pekhimenko et al. Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency. In SAFARI Technical Report No. 2012--002, 2012.Google ScholarDigital Library
V. Sathish, M. J. Schulte, and N. S. Kim. Lossless and Lossy Memory I/O Link Compression for Improving Performance of GPGPU Workloads. In PACT, 2012. Google ScholarDigital Library
A. Snavely and D. M. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreaded Processor. In ASPLOS-9, 2000. Google ScholarDigital Library
SPEC CPU2006. http://www.spec.org/.Google Scholar
S. Srinath et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In HPCA-13, 2007. Google ScholarDigital Library
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical Report HPL-2008-20, HP Laboratories, 2008.Google Scholar
M. Thuresson et al. Memory-Link Compression Schemes: A Value Locality Perspective. IEEE TC, 2008. Google ScholarDigital Library
Transaction Processing Performance Council. http://www.tpc.org/.Google Scholar
R. B. Tremaine et al. Pinnacle: IBM MXT in a Memory Controller Chip. IEEE Micro, 2001. Google ScholarDigital Library
P. R. Wilson, S. F. Kaplan, and Y. Smaragdakis. The Case for Compressed Caching in Virtual Memory Systems. In USENIX Annual Technical Conference, 1999. Google ScholarDigital Library
J. Yang, R. Gupta, and C. Zhang. Frequent Value Encoding for Low Power Data Buses. ACM TODAES, 2004. Google ScholarDigital Library
J. Yang, Y. Zhang, and R. Gupta. Frequent Value Compression in Data Caches. In MICRO-33, 2000. Google ScholarDigital Library
D. H. Yoon, M. K. Jeong, M. Sullivan, and M. Erez. The Dynamic Granularity Memory System. In ISCA, 2012. Google ScholarDigital Library
Y. Zhang, J. Yang, and R. Gupta. Frequent Value Locality and Value-Centric Data Cache Design. In ASPLOS-9, 2000. Google ScholarDigital Library
J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE TIT, 1977. Google ScholarDigital Library

Index Terms

Linearly compressed pages: a low-complexity, low-latency main memory compression framework

Recommendations

Coordinating DRAM and Last-Level-Cache Policies with the Virtual Write Queue

To alleviate bottlenecks in this era of many-core architectures, the authors propose a virtual write queue to expand the memory controller's scheduling window through visibility of cache behavior. Awareness of the physical main memory layout and a focus ...
Read More
Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture

Recent research advocates large die-stacked DRAM caches in many core servers to break the memory latency and bandwidth wall. To realize their full potential, die-stacked DRAM caches necessitate low lookup latencies, high hit rates and the efficient use ...
Read More
Challenges of High-Capacity DRAM Stacks and Potential Directions
MCHPC'18: Proceedings of the Workshop on Memory Centric High Performance Computing

With rapid growth in data volumes and an increase in number of CPU/GPU cores per chip, the capacity and bandwidth of main memory can be scaled up to accommodate performance requirements of data-intensive applications. Recent 3D-stacked in-package memory ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
December 2013
498 pages
ISBN:9781450326384
DOI:10.1145/2540708
General Chair:
Matthew Farrens
UC Davis
,
Program Chair:
Christos Kozyrakis
Stanford University
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 December 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DRAM
data compression
memory
memory bandwidth
memory capacity
memory controller
Qualifiers
- research-article
Conference

Acceptance Rates
MICRO-46 Paper Acceptance Rate39of239submissions,16%Overall Acceptance Rate484of2,242submissions,22%
More
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 91
  Total Citations
  View Citations
- 1,044
  Total Downloads
- Downloads (Last 12 months)70
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Linearly compressed pages: a low-complexity, low-latency main memory compression framework

MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Coordinating DRAM and Last-Level-Cache Policies with the Virtual Write Queue

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

Challenges of High-Capacity DRAM Stacks and Potential Directions