skip to main content
10.1145/1555754.1555805acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

A memory system design framework: creating smart memories

Published:20 June 2009Publication History

ABSTRACT

As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and latency characteristics significantly affect the performance of any CMP. To address this problem, this paper presents a microarchitecture framework for cache and protocol controllers, which can aid in generating the RTL for new memory systems. The framework consists of three pipelined engines' request-tracking, state-manipulation, and data movement' which are programmed to implement a higher-level memory model. This approach simplifies the design and verification of CMP systems by decomposing the memory model into sequences of state and data manipulations. Moreover, implementing the framework itself produces a polymorphic memory system.

To validate the approach, we implemented a scalable, flexible CMP in silicon. The memory system was then programmed to support three disparate memory models' cache coherent shared memory, streams and transactional memory. Measured overheads of this approach seem promising. Our system generates controllers with performance overheads of less than 20% compared to an ideal controller with zero internal latency. Even the overhead of directly implementing a fully programmable controller was modest. While it did double the controller's area, the amortized effective area in the system grew by roughly 7%.

References

  1. P. Kongetira, K. Aingaran, K. Olukotun, "Niagara: A 32-Way Multithreaded Sparc Processor," IEEE Micro Magazine, Vol. 25, No. 2, pp. 21--29, March/April 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Grohoski, "Niagara-2: A Highly Threaded Server-on-a-Chip," 18th Hot Chips Symposium, August 2006.Google ScholarGoogle Scholar
  3. B. Khailany, W.J. Dally, U.J. Kapasi, P. Mattson, J. Namkoong, J.D. Owens, B. Towels, A. Chang, S. Rixner, "Imagine: Media Processing with Streams," IEEE Micro Magazine, Vol. 21, No. 2, pp. 35--46, April/March 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Pham, S. Asano, M. Bolliger, M.N. Day, H.P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, K. Yazawa, "The Design and Implementation of a First-Generation CELL Processor," Digest of Technical Papers, ISSCC, Vol. 1, pp. 184--185, February 2005.Google ScholarGoogle Scholar
  5. L. Hammond, B. Hubbert , M. Siu, M. Prabhu , M. Chen , and K. Olukotun, The Stanford Hydra CMP, IEEE Micro Magazine, Vol. 20, Issue 2., pp. 71--84, March/April 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. JG Steffan, C Colohan, A Zhai, TC Mowry, "The STAMPede Approach to Thread-Level Speculation," ACM Transactions on Computer Systems (TOCS), Vol. 23, Issue 3, pp. 253--300, August 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Herlihy and J.E.B. Moss, Transactional Memory: Architectural Support for Lock-Free Data Structures, ISCA-20, pp. 289--300, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Mai, T. Paaske, N. Jayasena, R. Ho, W.J. Dally, M. Horowitz, "Smart Memories: A Modular Reconfigurable Architecture," ISCA-27, pp. 161--171, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, C.R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, ISCA-30, pp. 422--433, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Hammond et al., Transactional Memory Coherence and Consistency, ISCA-31, p. 102, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K.E. Moore, J. Bobba, M.J. Moravan, M.D. Hill, D.A. Wood, "LogTM: Log-Based Transactional Memory," HPCA-12, pp. 254--265, 2006.Google ScholarGoogle Scholar
  12. S. K. Reinhardt, J. R. Larus, D. A. Wood, "Tempest and typhoon: user-level shared memory," ISCA-21, pp. 325--336, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chaplin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, J. Hennessy, The Stanford FLASH multiprocessor, ISCA-21, pp. 302--313, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J.R. Larus, R. Rajwar, "Transactional Memory," Synthesis Lectures On Computer Architecture, Morgan&Claypool Publishers, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R.E. Gonzalez, "Xtensa: a configurable and extensible processor," Micro, IEEE Magazine, Vol.20, Issue 2., pp. 60--70, Mar/Apr 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tensilica, Webpage: http://www.tensilica.com/Google ScholarGoogle Scholar
  17. D. Culler, J.P. Singh, A. Gupta, Parallel Computer Architecture, A Hardware/Software Approach, Morgan-Kaufman Publishers Inc, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J.B. Carter, W.C. Hsieh, L.B. Stoller, M.R. Swanson, L. Zhang, E.L. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M.A. Parker, L. Schaelicke, and T. Tateyama, Impulse: Building a Smarter Memory Controller, HPCA-5, pp 70--79, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Pong, M. Browne, A. Nowatzyk, M., Dubois, Design and Verification of the S3.mp Cache-Coherent Shared-Memory System, IEEE Transactions On Computers, Vol. 47, No. 1, pp. 135--140, January 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Agarwal, R., Bianchini, D. Chaiken, K.L. Johnson, D. Kranz, J. Kubiatowicz, B-H., Lim, K., Mackenzie, D. Yeung, The MIT Alewife Machine: Architecture and Performance, ISCA-22, pp 2--13, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Narayanasamy, B. Carneal, B. Calder, Patching Processor Design Errors, ICCD, pp. 491--498, October 2006.Google ScholarGoogle Scholar
  22. S.R. Sarangi, A. Tiwari, J., Torrellas, Phoenix: Detecting and Recovering from Permanent Processor Design Bugs with Programmable Hardware, MICRO-39, pp. 26--37, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. I. Wagner, V. Bertacco, T. Austin, "Using Field-Repairable Control Logic to Correct Design Errors in Microprocessors", IEEE Transactions on Computer-Aided Design (TCAD), Vol. 27, Issue 2, pp. 380--393, February 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A.K. Nanda, A.-T. Nguyen, M.M. Michael, D.J. Joseph, "High-Throughput Coherence Controllers," HPCA-6, pp. 145--155, 2000.Google ScholarGoogle Scholar
  25. A.-T. Nguyen, J. Torrellas, Design Trade-Offs in High-Throughput Coherence Controllers, PACT-12, p. 194, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M.M. Michael, A.K. Nanda, B.-H. Lim, M.L. Scott, "Coherence Controller Architectures for SMP-Based CC-NUMA Multiprocessors," ISCA-24, pp. 219--228, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. IBM tutorial, "Cell Broadband Engine solution, Software Development Kit v3.1: SPE configuration."Google ScholarGoogle Scholar
  28. http://publib.boulder.ibm.com/infocenter/systems/scope/syssw/index.jsp?topic=/eiccj/tutorial/cbet_3memfc.htmlGoogle ScholarGoogle Scholar
  29. P. Conway, B. Hughes, "The AMD Opteron Northbridge Architecture," IEEE Micro, vol.27, no.2, pp.10--21, March-April 2007. http://ieeexplore.ieee.org/stamp/stamp.jsp? arnumber=4287392&isnumber=4287384 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sun Microsystems, Inc., "OpenSPARC(tm) T1 Microarchitecture Specification," Part No. 819-6650-10, August 2006, Revision A. http://opensparc-t1.sunsource.net/ specs/OpenSPARCT1_Micro_Arch.pdfGoogle ScholarGoogle Scholar

Index Terms

  1. A memory system design framework: creating smart memories

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader