research-article

A memory system design framework: creating smart memories

Authors:
Amin Firoozshahian

Hicamp Systems Inc., Menlo Park, CA, USA

Hicamp Systems Inc., Menlo Park, CA, USA
View Profile

,
Alex Solomatnikov

Hicamp Systems Inc., Menlo Park, CA, USA

Hicamp Systems Inc., Menlo Park, CA, USA
View Profile

,
Ofer Shacham

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Zain Asgar

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Stephen Richardson

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Christos Kozyrakis

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Mark Horowitz

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

ISCA '09: Proceedings of the 36th annual international symposium on Computer architectureJune 2009Pages 406–417https://doi.org/10.1145/1555754.1555805

Published:20 June 2009Publication History

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Pages 406–417

ABSTRACT

As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and latency characteristics significantly affect the performance of any CMP. To address this problem, this paper presents a microarchitecture framework for cache and protocol controllers, which can aid in generating the RTL for new memory systems. The framework consists of three pipelined engines' request-tracking, state-manipulation, and data movement' which are programmed to implement a higher-level memory model. This approach simplifies the design and verification of CMP systems by decomposing the memory model into sequences of state and data manipulations. Moreover, implementing the framework itself produces a polymorphic memory system.

To validate the approach, we implemented a scalable, flexible CMP in silicon. The memory system was then programmed to support three disparate memory models' cache coherent shared memory, streams and transactional memory. Measured overheads of this approach seem promising. Our system generates controllers with performance overheads of less than 20% compared to an ideal controller with zero internal latency. Even the overhead of directly implementing a fully programmable controller was modest. While it did double the controller's area, the amortized effective area in the system grew by roughly 7%.

References

P. Kongetira, K. Aingaran, K. Olukotun, "Niagara: A 32-Way Multithreaded Sparc Processor," IEEE Micro Magazine, Vol. 25, No. 2, pp. 21--29, March/April 2005. Google ScholarDigital Library
G. Grohoski, "Niagara-2: A Highly Threaded Server-on-a-Chip," 18th Hot Chips Symposium, August 2006.Google Scholar
B. Khailany, W.J. Dally, U.J. Kapasi, P. Mattson, J. Namkoong, J.D. Owens, B. Towels, A. Chang, S. Rixner, "Imagine: Media Processing with Streams," IEEE Micro Magazine, Vol. 21, No. 2, pp. 35--46, April/March 2001. Google ScholarDigital Library
D. Pham, S. Asano, M. Bolliger, M.N. Day, H.P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, K. Yazawa, "The Design and Implementation of a First-Generation CELL Processor," Digest of Technical Papers, ISSCC, Vol. 1, pp. 184--185, February 2005.Google Scholar
L. Hammond, B. Hubbert , M. Siu, M. Prabhu , M. Chen , and K. Olukotun, The Stanford Hydra CMP, IEEE Micro Magazine, Vol. 20, Issue 2., pp. 71--84, March/April 2000. Google ScholarDigital Library
JG Steffan, C Colohan, A Zhai, TC Mowry, "The STAMPede Approach to Thread-Level Speculation," ACM Transactions on Computer Systems (TOCS), Vol. 23, Issue 3, pp. 253--300, August 2005. Google ScholarDigital Library
M. Herlihy and J.E.B. Moss, Transactional Memory: Architectural Support for Lock-Free Data Structures, ISCA-20, pp. 289--300, 1993. Google ScholarDigital Library
K. Mai, T. Paaske, N. Jayasena, R. Ho, W.J. Dally, M. Horowitz, "Smart Memories: A Modular Reconfigurable Architecture," ISCA-27, pp. 161--171, 2000. Google ScholarDigital Library
K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, C.R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, ISCA-30, pp. 422--433, June 2003. Google ScholarDigital Library
L. Hammond et al., Transactional Memory Coherence and Consistency, ISCA-31, p. 102, June 2004. Google ScholarDigital Library
K.E. Moore, J. Bobba, M.J. Moravan, M.D. Hill, D.A. Wood, "LogTM: Log-Based Transactional Memory," HPCA-12, pp. 254--265, 2006.Google Scholar
S. K. Reinhardt, J. R. Larus, D. A. Wood, "Tempest and typhoon: user-level shared memory," ISCA-21, pp. 325--336, 1994. Google ScholarDigital Library
J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chaplin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, J. Hennessy, The Stanford FLASH multiprocessor, ISCA-21, pp. 302--313, 1994. Google ScholarDigital Library
J.R. Larus, R. Rajwar, "Transactional Memory," Synthesis Lectures On Computer Architecture, Morgan&Claypool Publishers, 2007. Google ScholarDigital Library
R.E. Gonzalez, "Xtensa: a configurable and extensible processor," Micro, IEEE Magazine, Vol.20, Issue 2., pp. 60--70, Mar/Apr 2000. Google ScholarDigital Library
Tensilica, Webpage: http://www.tensilica.com/Google Scholar
D. Culler, J.P. Singh, A. Gupta, Parallel Computer Architecture, A Hardware/Software Approach, Morgan-Kaufman Publishers Inc, 1999. Google ScholarDigital Library
J.B. Carter, W.C. Hsieh, L.B. Stoller, M.R. Swanson, L. Zhang, E.L. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M.A. Parker, L. Schaelicke, and T. Tateyama, Impulse: Building a Smarter Memory Controller, HPCA-5, pp 70--79, 1999. Google ScholarDigital Library
F. Pong, M. Browne, A. Nowatzyk, M., Dubois, Design and Verification of the S3.mp Cache-Coherent Shared-Memory System, IEEE Transactions On Computers, Vol. 47, No. 1, pp. 135--140, January 1998. Google ScholarDigital Library
A. Agarwal, R., Bianchini, D. Chaiken, K.L. Johnson, D. Kranz, J. Kubiatowicz, B-H., Lim, K., Mackenzie, D. Yeung, The MIT Alewife Machine: Architecture and Performance, ISCA-22, pp 2--13, June 1995. Google ScholarDigital Library
S. Narayanasamy, B. Carneal, B. Calder, Patching Processor Design Errors, ICCD, pp. 491--498, October 2006.Google Scholar
S.R. Sarangi, A. Tiwari, J., Torrellas, Phoenix: Detecting and Recovering from Permanent Processor Design Bugs with Programmable Hardware, MICRO-39, pp. 26--37, 2006. Google ScholarDigital Library
I. Wagner, V. Bertacco, T. Austin, "Using Field-Repairable Control Logic to Correct Design Errors in Microprocessors", IEEE Transactions on Computer-Aided Design (TCAD), Vol. 27, Issue 2, pp. 380--393, February 2008. Google ScholarDigital Library
A.K. Nanda, A.-T. Nguyen, M.M. Michael, D.J. Joseph, "High-Throughput Coherence Controllers," HPCA-6, pp. 145--155, 2000.Google Scholar
A.-T. Nguyen, J. Torrellas, Design Trade-Offs in High-Throughput Coherence Controllers, PACT-12, p. 194, 2003. Google ScholarDigital Library
M.M. Michael, A.K. Nanda, B.-H. Lim, M.L. Scott, "Coherence Controller Architectures for SMP-Based CC-NUMA Multiprocessors," ISCA-24, pp. 219--228, 1997. Google ScholarDigital Library
IBM tutorial, "Cell Broadband Engine solution, Software Development Kit v3.1: SPE configuration."Google Scholar
http://publib.boulder.ibm.com/infocenter/systems/scope/syssw/index.jsp?topic=/eiccj/tutorial/cbet_3memfc.htmlGoogle Scholar
P. Conway, B. Hughes, "The AMD Opteron Northbridge Architecture," IEEE Micro, vol.27, no.2, pp.10--21, March-April 2007. http://ieeexplore.ieee.org/stamp/stamp.jsp? arnumber=4287392&isnumber=4287384 Google ScholarDigital Library
Sun Microsystems, Inc., "OpenSPARC(tm) T1 Microarchitecture Specification," Part No. 819-6650-10, August 2006, Revision A. http://opensparc-t1.sunsource.net/ specs/OpenSPARCT1_Micro_Arch.pdfGoogle Scholar

Index Terms

A memory system design framework: creating smart memories
1. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
  2. Integrated circuits
    1. Semiconductor memory

Recommendations

A memory system design framework: creating smart memories

As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and ...
Read More
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

Spin-Torque Transfer RAM (STT-RAM) is a promising candidate for SRAM replacement because of its excellent features, such as fast read access, high density, low leakage power, and CMOS technology compatibility. However, wide adoption of STT-RAM as cache ...
Read More
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms
Performance evaluation review

Variation has been shown to exist across the cells within a modern DRAM chip. Prior work has studied and exploited several forms of variation, such as manufacturing-process- or temperature-induced variation. We empirically demonstrate a new form of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
June 2009
510 pages
ISBN:9781605585260
DOI:10.1145/1555754
General Chair:
Steve Keckler
University of Texas at Austin
,
Program Chair:
Luiz André Barroso
Google Inc.
ACM SIGARCH Computer Architecture News Volume 37, Issue 3
June 2009
495 pages
ISSN:0163-5964
DOI:10.1145/1555815
Issue’s Table of Contents
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache coherence
memory access protocol
memory systems
multi-core processors
protocol controller
reconfigurable architecture
stream programming
transactional memory
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 1,038
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A memory system design framework: creating smart memories

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

A memory system design framework: creating smart memories

Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms