ABSTRACT
As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and latency characteristics significantly affect the performance of any CMP. To address this problem, this paper presents a microarchitecture framework for cache and protocol controllers, which can aid in generating the RTL for new memory systems. The framework consists of three pipelined engines' request-tracking, state-manipulation, and data movement' which are programmed to implement a higher-level memory model. This approach simplifies the design and verification of CMP systems by decomposing the memory model into sequences of state and data manipulations. Moreover, implementing the framework itself produces a polymorphic memory system.
To validate the approach, we implemented a scalable, flexible CMP in silicon. The memory system was then programmed to support three disparate memory models' cache coherent shared memory, streams and transactional memory. Measured overheads of this approach seem promising. Our system generates controllers with performance overheads of less than 20% compared to an ideal controller with zero internal latency. Even the overhead of directly implementing a fully programmable controller was modest. While it did double the controller's area, the amortized effective area in the system grew by roughly 7%.
- P. Kongetira, K. Aingaran, K. Olukotun, "Niagara: A 32-Way Multithreaded Sparc Processor," IEEE Micro Magazine, Vol. 25, No. 2, pp. 21--29, March/April 2005. Google ScholarDigital Library
- G. Grohoski, "Niagara-2: A Highly Threaded Server-on-a-Chip," 18th Hot Chips Symposium, August 2006.Google Scholar
- B. Khailany, W.J. Dally, U.J. Kapasi, P. Mattson, J. Namkoong, J.D. Owens, B. Towels, A. Chang, S. Rixner, "Imagine: Media Processing with Streams," IEEE Micro Magazine, Vol. 21, No. 2, pp. 35--46, April/March 2001. Google ScholarDigital Library
- D. Pham, S. Asano, M. Bolliger, M.N. Day, H.P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, K. Yazawa, "The Design and Implementation of a First-Generation CELL Processor," Digest of Technical Papers, ISSCC, Vol. 1, pp. 184--185, February 2005.Google Scholar
- L. Hammond, B. Hubbert , M. Siu, M. Prabhu , M. Chen , and K. Olukotun, The Stanford Hydra CMP, IEEE Micro Magazine, Vol. 20, Issue 2., pp. 71--84, March/April 2000. Google ScholarDigital Library
- JG Steffan, C Colohan, A Zhai, TC Mowry, "The STAMPede Approach to Thread-Level Speculation," ACM Transactions on Computer Systems (TOCS), Vol. 23, Issue 3, pp. 253--300, August 2005. Google ScholarDigital Library
- M. Herlihy and J.E.B. Moss, Transactional Memory: Architectural Support for Lock-Free Data Structures, ISCA-20, pp. 289--300, 1993. Google ScholarDigital Library
- K. Mai, T. Paaske, N. Jayasena, R. Ho, W.J. Dally, M. Horowitz, "Smart Memories: A Modular Reconfigurable Architecture," ISCA-27, pp. 161--171, 2000. Google ScholarDigital Library
- K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, C.R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, ISCA-30, pp. 422--433, June 2003. Google ScholarDigital Library
- L. Hammond et al., Transactional Memory Coherence and Consistency, ISCA-31, p. 102, June 2004. Google ScholarDigital Library
- K.E. Moore, J. Bobba, M.J. Moravan, M.D. Hill, D.A. Wood, "LogTM: Log-Based Transactional Memory," HPCA-12, pp. 254--265, 2006.Google Scholar
- S. K. Reinhardt, J. R. Larus, D. A. Wood, "Tempest and typhoon: user-level shared memory," ISCA-21, pp. 325--336, 1994. Google ScholarDigital Library
- J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chaplin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, J. Hennessy, The Stanford FLASH multiprocessor, ISCA-21, pp. 302--313, 1994. Google ScholarDigital Library
- J.R. Larus, R. Rajwar, "Transactional Memory," Synthesis Lectures On Computer Architecture, Morgan&Claypool Publishers, 2007. Google ScholarDigital Library
- R.E. Gonzalez, "Xtensa: a configurable and extensible processor," Micro, IEEE Magazine, Vol.20, Issue 2., pp. 60--70, Mar/Apr 2000. Google ScholarDigital Library
- Tensilica, Webpage: http://www.tensilica.com/Google Scholar
- D. Culler, J.P. Singh, A. Gupta, Parallel Computer Architecture, A Hardware/Software Approach, Morgan-Kaufman Publishers Inc, 1999. Google ScholarDigital Library
- J.B. Carter, W.C. Hsieh, L.B. Stoller, M.R. Swanson, L. Zhang, E.L. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M.A. Parker, L. Schaelicke, and T. Tateyama, Impulse: Building a Smarter Memory Controller, HPCA-5, pp 70--79, 1999. Google ScholarDigital Library
- F. Pong, M. Browne, A. Nowatzyk, M., Dubois, Design and Verification of the S3.mp Cache-Coherent Shared-Memory System, IEEE Transactions On Computers, Vol. 47, No. 1, pp. 135--140, January 1998. Google ScholarDigital Library
- A. Agarwal, R., Bianchini, D. Chaiken, K.L. Johnson, D. Kranz, J. Kubiatowicz, B-H., Lim, K., Mackenzie, D. Yeung, The MIT Alewife Machine: Architecture and Performance, ISCA-22, pp 2--13, June 1995. Google ScholarDigital Library
- S. Narayanasamy, B. Carneal, B. Calder, Patching Processor Design Errors, ICCD, pp. 491--498, October 2006.Google Scholar
- S.R. Sarangi, A. Tiwari, J., Torrellas, Phoenix: Detecting and Recovering from Permanent Processor Design Bugs with Programmable Hardware, MICRO-39, pp. 26--37, 2006. Google ScholarDigital Library
- I. Wagner, V. Bertacco, T. Austin, "Using Field-Repairable Control Logic to Correct Design Errors in Microprocessors", IEEE Transactions on Computer-Aided Design (TCAD), Vol. 27, Issue 2, pp. 380--393, February 2008. Google ScholarDigital Library
- A.K. Nanda, A.-T. Nguyen, M.M. Michael, D.J. Joseph, "High-Throughput Coherence Controllers," HPCA-6, pp. 145--155, 2000.Google Scholar
- A.-T. Nguyen, J. Torrellas, Design Trade-Offs in High-Throughput Coherence Controllers, PACT-12, p. 194, 2003. Google ScholarDigital Library
- M.M. Michael, A.K. Nanda, B.-H. Lim, M.L. Scott, "Coherence Controller Architectures for SMP-Based CC-NUMA Multiprocessors," ISCA-24, pp. 219--228, 1997. Google ScholarDigital Library
- IBM tutorial, "Cell Broadband Engine solution, Software Development Kit v3.1: SPE configuration."Google Scholar
- http://publib.boulder.ibm.com/infocenter/systems/scope/syssw/index.jsp?topic=/eiccj/tutorial/cbet_3memfc.htmlGoogle Scholar
- P. Conway, B. Hughes, "The AMD Opteron Northbridge Architecture," IEEE Micro, vol.27, no.2, pp.10--21, March-April 2007. http://ieeexplore.ieee.org/stamp/stamp.jsp? arnumber=4287392&isnumber=4287384 Google ScholarDigital Library
- Sun Microsystems, Inc., "OpenSPARC(tm) T1 Microarchitecture Specification," Part No. 819-6650-10, August 2006, Revision A. http://opensparc-t1.sunsource.net/ specs/OpenSPARCT1_Micro_Arch.pdfGoogle Scholar
Index Terms
- A memory system design framework: creating smart memories
Recommendations
A memory system design framework: creating smart memories
As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and ...
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh
Spin-Torque Transfer RAM (STT-RAM) is a promising candidate for SRAM replacement because of its excellent features, such as fast read access, high density, low leakage power, and CMOS technology compatibility. However, wide adoption of STT-RAM as cache ...
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms
Performance evaluation reviewVariation has been shown to exist across the cells within a modern DRAM chip. Prior work has studied and exploited several forms of variation, such as manufacturing-process- or temperature-induced variation. We empirically demonstrate a new form of ...
Comments