research-article

Hybrid cache architecture with disparate memory technologies

Authors:
Xiaoxia Wu

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Jian Li

IBM Austin Research Lab, Austin, TX, USA

IBM Austin Research Lab, Austin, TX, USA
View Profile

,
Lixin Zhang

IBM Austin Research Lab, Austin, TX, USA

IBM Austin Research Lab, Austin, TX, USA
View Profile

,
Evan Speight

IBM Austin Research Lab, Austin, TX, USA

IBM Austin Research Lab, Austin, TX, USA
View Profile

,
Ram Rajamony

IBM Austin Research Lab, Austin, TX, USA

IBM Austin Research Lab, Austin, TX, USA
View Profile

,
Yuan Xie

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

ISCA '09: Proceedings of the 36th annual international symposium on Computer architectureJune 2009Pages 34–45https://doi.org/10.1145/1555754.1555761

Published:20 June 2009Publication History

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Pages 34–45

ABSTRACT

Caching techniques have been an efficient mechanism for mitigating the effects of the processor-memory speed gap. Traditional multi-level SRAM-based cache hierarchies, especially in the context of chip multiprocessors (CMPs), present many challenges in area requirements, core-to-cache balance, power consumption, and design complexity. New advancements in technology enable caches to be built from other technologies, such as Embedded DRAM (EDRAM), Magnetic RAM (MRAM), and Phase-change RAM (PRAM), in both 2D chips or 3D stacked chips. Caches fabricated in these technologies offer dramatically different power and performance characteristics when compared with SRAM-based caches, particularly in the areas of access latency, cell density, and overall power consumption. In this paper, we propose to take advantage of the best characteristics that each technology offers, through the use of Hybrid Cache Architecture (HCA) designs. We discuss and evaluate two types of hybrid cache architectures: inter cache Level HCA (LHCA), in which the levels in a cache hierarchy can be made of disparate memory technologies; and intra cache level or cache Region based HCA (RHCA), where a single level of cache can be partitioned into multiple regions, each of a different memory technology. We have studied a number of different HCA architectures and explored the potential of hardware support for intra-cache data movement and power consumption management within HCA caches. Utilizing a full-system simulator that has been validated against real hardware, we demonstrate that an LHCA design can provide a geometric mean 7% IPC improvement over a baseline 3-level SRAM cache design under the same area constraint across a collection of 25 workloads. A more aggressive RHCA-based design provides 12% IPC improvement over the baseline. Finally, a 2-layer 3D cache stack (3DHCA) of high density memory technology within the same chip footprint gives 18% IPC improvement over the baseline. Furthermore, up to 70% reduction in power consumption over a baseline SRAM-only design is achieved.

References

D. A. Bader, Y. Li, T. Li, and V. Sachdeva. BioPerf: A Benchmark Suite to Evaluate High-performance Computer Architecture on Bioinformatics Applications. In Proceedings of the 2005 IEEE International Symposium on Workload Characterization, pages 163--173, 2005.Google ScholarCross Ref
D. Bailey, J. Barton, T. Lasinski, and H. Simon. The NAS parallel benchmarks. In Technical report RNR-91-002 revision2, pages 453--464, 1991.Google Scholar
B. M. Beckmann and D. A. Wood. Managing Wire Delay in Large Chip-Multiprocessor Caches. In International Symposium on Microarchitecture, pages 319--330, 2004. Google ScholarDigital Library
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008. Google ScholarDigital Library
P. Bohrer, J. Peterson, M. Elnozahy, R. Rajamony, A. Gheith, R. Rockhold, C. Lefurgy, H. Shafi, T. Nakra, R. Simpson, E. Speight, K. Sudeep, E. V. Hensbergen, and L. Zhang. Mambo: a full system simulator for the powerpc architecture. SIGMETRICS Perform. Eval. Rev., 31(4):8--12, 2004. Google ScholarDigital Library
B. Bryan, A. Murali, B. Ned, D. John, J. Lei, H. L. Gabriel, M. Don, M. Pat, W. N. Donald, P. Daniel, R. Paul, R. Jeff, S. Sadasivan, S. John, and W. Clair. Die Stacking (3D) Microarchitecture. In International Symposium on Microarchitecture, pages 469--479, 2006. Google ScholarDigital Library
Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing Replication, Communication, and Capacity Allocation in CMPs. SIGARCH Comput. Archit. News, 33(2):357--368, 2005. Google ScholarDigital Library
L. Chung. Cell Design Considerations for Phase Change Memory as a Universal Memory. In International Symposium on VLSI Technology, Systems and Applications, pages 132--133, 2008.Google Scholar
W. R. Davis, J. Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A. M. Sule, M. Steer, and P. D. Franzon. Demystifying 3D ICs: the Pros and Cons of Going Vertical. IEEE Design and Test of Computers, 22(6):498--510, 2005. Google ScholarDigital Library
X. Dong, X. Wu, G. Sun, Y. Xie, H. Li, and Y. Chen. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Design Automation Conference, pages 554--559, 2008. Google ScholarDigital Library
X. Dong and Y. Xie. System-level Cost Analysis and Design Exploration for Three-Dimensional Integrated Circuits (3D ICs). In Asia and South Pacific Design Automation Conference, 2009. Google ScholarDigital Library
K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy caches: simple techniques for reducing leakage power. SIGARCH Comput. Archit. News, 30(2):148--157, 2002. Google ScholarDigital Library
S. Hanzawa, N. Kitai, K. Osada, A. Kotabe, Y. Matsui, N. Matsuzaki, N. Takaura, M. Moniwa, and T. Kawahara. A 512KB Embedded Phase Change Memory with 416kB/s Write Throughput at 100uA Cell Write Current. In IEEE International Solid-State Circuits Conference, pages 474--616, 2007.Google Scholar
M. Hosomi, H. Yamagishi, T. Yamamoto, and et al. A Novel Nonvolatile Memory with Spin Torque Transfer Magnetization Switching: Spin-RAM. In International Electron Devices Meeting, pages 459--462, 2005.Google Scholar
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A NUCA substrate for flexible CMP cache sharing. In International Conference on Supercomputing, pages 31--40, 2005. Google ScholarDigital Library
J. W. Joyner and J. D. Meindl. Opportunities for Reduced Power Dissipation Using Three-dimensional Integration. In Interconnect Technology Conference, pages 148--150, 2002.Google Scholar
C. Kim, D. Burger, and S. W. Keckler. An adaptive, non-uniform Cache Structure for Wire-delay Dominated On-chip Caches. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 211--222, 2002. Google ScholarDigital Library
F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, and M. Kandemir. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory. International Symposium on Computer Architecture, 34(2):130--141, 2006. Google ScholarDigital Library
C. Liu, I. Ganusov, M. Burtscher, and S. Tiwari. Bridging the Processor-memory Performance Gap with 3D IC Technology. IEEE Design and Test of Computers, 22(6):556--564, 2005. Google ScholarDigital Library
G. H. Loh. 3D-Stacked Memory Architectures for Multi-core Processors. In International Symposium on Computer Architecture, pages 453--464, 2008. Google ScholarDigital Library
N. Madan, L. Zhao, N. Muralimanohar, A. Udipi, R. Balasubramonian, R. Iyer, S. Makineni, and D. Newell. Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy. In High Performance Computer Architecture, pages 262--274, Feb. 2009.Google ScholarCross Ref
R. Morin, A. Kumar, and E. Ilyina. A multi-level comparative performance characterization of specjbb2005 versus specjbb2000. In Proceedings of the IEEE International Workload Characterization, pages 67--75, Oct. 2005.Google ScholarCross Ref
N. Muralimanohar, R. Balasubramonian, and N. Jouppi. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 3--14, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
F. Pellizzer, A. Pirovano, F. Ottogalli, M. Magistretti, M. Scaravaggi, P. Zuliani, M. Tosi, A. Benvenuti, P. Besana, S. Cadeo, T. Marangon, R. Morandi, R. Piva, A. Spandre, R. Zonca, A. Modelli, E. Varesi, T. Lowrey, A. Lacaita, G. Casagrande, P. Cappelletti, and R. Bez. Novel utrench Phase-change Memory Cell for Embedded and Stand-alone Non-volatile Memory Applications. In Symposium on VLSI Technology, pages 18--19, 2004.Google Scholar
B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. Power5 system microarchitecture. IBM J. Res. Dev., 49(4/5):505--521, 2005. Google ScholarDigital Library
SPEC. Standard Performance Evaluation Corporation. http://www.spec.org/cpu2006/. 2006.Google Scholar
G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In High Performance Computer Architecture, pages 239--249, Feb. 2009.Google ScholarCross Ref
X. Wu, J. Li, L. Zhang, E. Speight, and Y. Xie. Power and Performance of Read-Write Aware Hybrid Caches with Non-volatile Memories. In Design, Automation and Test in Europe, 2009. Google ScholarDigital Library
Y. Xie, G. H. Loh, B. Black, and K. Bernstein. Design Space Exploration for 3D architectures. J. Emerg. Technol. Comput. Syst., 2(2):65--103, 2006. Google ScholarDigital Library
W. Zhao, E. Belhaire, Q. Mistral, C. Chappert, V. Javerliac, B. Dieny, and E. Nicolle. Macro-model of Spin-Transfer Torque based Magnetic Tunnel Junction device for hybrid Magnetic--CMOS design. In IEEE International Behavioral Modeling and Simulation Workshop, pages 40--43, 2006.Google ScholarCross Ref

Index Terms

Hybrid cache architecture with disparate memory technologies
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Hybrid cache architecture with disparate memory technologies

Caching techniques have been an efficient mechanism for mitigating the effects of the processor-memory speed gap. Traditional multi-level SRAM-based cache hierarchies, especially in the context of chip multiprocessors (CMPs), present many challenges in ...
Read More
Design exploration of hybrid caches with disparate memory technologies

Traditional multilevel SRAM-based cache hierarchies, especially in the context of chip multiprocessors (CMPs), present many challenges in area requirements, core--to--cache balance, power consumption, and design complexity. New advancements in ...
Read More
High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement
ISLPED '11: Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design

In this paper, we propose a run-time strategy for managing writes onto last level cache in chip multiprocessors where STT-RAM memory is used as baseline technology. To this end, we assume that each cache set is decomposed into limited SRAM lines and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
June 2009
510 pages
ISBN:9781605585260
DOI:10.1145/1555754
General Chair:
Steve Keckler
University of Texas at Austin
,
Program Chair:
Luiz André Barroso
Google Inc.
ACM SIGARCH Computer Architecture News Volume 37, Issue 3
June 2009
495 pages
ISSN:0163-5964
DOI:10.1145/1555815
Issue’s Table of Contents
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
hybrid cache architecture
three-dimensional ic
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 365
  Total Citations
  View Citations
- 3,371
  Total Downloads
- Downloads (Last 12 months)81
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hybrid cache architecture with disparate memory technologies

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hybrid cache architecture with disparate memory technologies

Design exploration of hybrid caches with disparate memory technologies

High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement