skip to main content
article

Design and Management of 3D Chip Multiprocessors Using Network-in-Memory

Published:01 May 2006Publication History
Skip Abstract Section

Abstract

Long interconnects are becoming an increasingly important problem from both power and performance perspectives. This motivates designers to adopt on-chip network-based communication infrastructures and three-dimensional (3D) designs where multiple device layers are stacked together. Considering the current trends towards increasing use of chip multiprocessing, it is timely to consider 3D chip multiprocessor design and memory networking issues, especially in the context of data management in large L2 caches. The overall goal of this paper is to study the challenges for L2 design and management in 3D chip multiprocessors. Our first contribution is to propose a router architecture and a topology design that makes use of a network architecture embedded into the L2 cache memory. Our second contribution is to demonstrate, through extensive experiments, that a 3D L2 memory architecture generates much better results than the conventional two-dimensional (2D) designs under different number of layers and vertical (inter-wafer) connections. In particular, our experiments show that a 3D architecture with no dynamic data migration generates better performance than a 2D architecture that employs data migration. This also helps reduce power consumption in L2 due to a reduced number of data movements.

References

  1. {1} V. Agarwal, M. Hrishikesh, S. Keckler, and D. Burger. Clock Rate Versus IPC: The End of the Road for Conventional Microarchitectures. In Proc. the 27th International Symposium on Computer Architecture, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {2} B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. In Proc. the International Symposium on Microarchitecture, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {3} Benini and De Micheli. Networks on Chips: A New SoC Paradigm. IEEE Computer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {4} B. Black et al. 3D Processing technology and Its Impact on IA32 Microprocessors. In Proc. the International Conference on Computer Design, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {5} Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In Proc. the 36th annual IEEE/ACM International Symposium on Microarchitecture, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {6} Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimization replication, communication, and capacity allocation in CMPs. In Proc. the International Symposium on Computer Architectures , 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {7} J. Cong and Y. Zhang. Thermal-Driven Multilevel Routing for 3-D ICs. In Proc. the Asia South Pacific Design Automation Conference, Jan. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {8} W. Dally and B. Towles. Route Packets, Not Wires: On-Chip Inteconnection Networks. In Proc. the 38th Conference on Design Automation, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {9} S. Das et al. Technology, Performance, and Computer Aided Design of Three-Dimensional Integrated Circuits. In Proc. International Symposium on Physical Design, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {10} W. R. Davis et al. Demystifying 3d ics: The pros and cons of going vertical. IEEE Design and Test of Computers, 22(6), Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} Y. Deng et al. 2.5D System Integration: A Design Driven System Implementation Schema. In Proc. the Asia South Pacific Design Automation Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {12} L. Hammond, B. Nayfeh, and K. Olukotun. A Single-Chip Multiprocessor. IEEE Computer Special Issue on "Billion-Transistor Processors", Sept. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {13} R. Ho, K. Mai, and M. Horowitz. The Future of Wires. Proc. the IEEE, 89(4), Apr. 2001.Google ScholarGoogle ScholarCross RefCross Ref
  14. {14} J. Hu and R. Marculescu. Energy- and performance-aware mapping for regular NoC architectures. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 24(4), Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {15} J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. Keckler. A NUCA substrate for flexible CMP cache sharing. In Proc. the 19th Annual International Conference on Supercomputing , 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {16} M. Ieong et al. Three Dimensional CMOS Devices and Integrated Circuits. In Proc. IEEE Custom Integrated Circuits Conference, 2003.Google ScholarGoogle Scholar
  17. {17} J. Joyner, P. Zarkesh-Ha, and J. Meindl. A stochastic global net-length distribution for a three-dimensional system-on-a-chip (3D-SoC). In Proc. 14th Annual IEEE International ASIC/SOC Conference, Sept. 2001.Google ScholarGoogle ScholarCross RefCross Ref
  18. {18} S. Jung et al. The Revolutionary and Truly 3-Dimentional 25F2 SRAM Technology with the Smallest S3 Cell, 0.16um2 and SSTFF for Ultra High Density SRAM. In VLSI Technology Digest of Technical Papers. 2004.Google ScholarGoogle Scholar
  19. {19} J. Kahle, M. Day, H. Hofstee, C. Johns, T. Maeurer, and D. Shippy. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 49(4-5), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {20} C. Kim, D. Burger, and S. Keckler. An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches. In Proc. the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {21} J. Kim, D. Park, C. Nicopoulos, N. Vijaykrishnan, and C. Das. Design and analysis of an NoC architecture from performance, reliability and energy perspective. In Proc. the Symposium on Architecture for Networking and Communications Systems, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {22} P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32- Way Multithreaded SPARC Processor. IEEE MICRO Magazine , Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {23} G. M. Link and N. Vijaykrishnan. Thermal trends in emergent technologies. In Proc. International Symposium on Quality Electronic Design, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. {24} P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, and G. Hallberg. Simics: A full system simulation platform. IEEE Computer, 35(2), Feb. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {25} P. Morrow, M. Kobrinsky, S. Ramanathan, C.-M. Park, M. Harmes, V. Ramachandrarao, H. Park, G. Kloster, S. List, and S. Kim. Wafer-Level 3D Interconnects Via Cu Bonding. In Proc. the 21st Advanced Metallization Conference, Oct. 2004.Google ScholarGoogle Scholar
  26. {26} R. Mullins, A. West, and S. Moore. Low-latency virtual-channel routers for on-chip networks. In Proc. the 31st Annual International Symposium on Computer Architecture, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. {27} K. Olukotun, B. Nayfeh, L. Hammond, K.Wilson, and K.-Y. Chang. The Case for a Single-Chip Multiprocessor. In Proc. the 7th International Symposium on Architectural Support for Programming Languages and Operating Systems, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. {28} L.-S. Peh and W. Dally. A delay model and speculative architecture for pipelined routers. In The Seventh International Symposium on High-Performance Computer Architecture , Jan. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. {29} K. Puttaswamy and G. Loh. Implementing Caches in a 3D Technology for High Performance Processors. In Proc. the International Conference on Computer Design, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. {30} T. Richardson, C. Nicopoulos, D. Park, V. Narayanan, Y. Xie, C. Das, and V. Degalahal. A Hybrid SoC Interconnect with Dynamic TDMA-Based Transaction-Less Buses and On-Chip Networks. In Proc. VLSI Design, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. {31} P. Rickert. Problems or opportunities? Beyond the 90nm frontier. ICCAD Keynote Address, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  32. {32} P. Shivakumar and N. Jouppi. Cacti 3.0: An integrated cache timing, power and area model. Technical report, Compaq Computer Corporation, Aug. 2001.Google ScholarGoogle Scholar
  33. {33} Standard Performance Evaluation Corporation. SPEC OMP. http://www.spec.org/hpg/omp2001/, Dec. 2005.Google ScholarGoogle Scholar
  34. {34} Sun Microsystems Inc. Sun UltraSPARC T1 Overview. http://www.sun.com/processors/UltraSPARC-T1/, Dec. 2005.Google ScholarGoogle Scholar
  35. {35} Y.-F. Tsai, Y. Xie, N. Vijaykrishnan, and M. Irwin. Three-Dimensional Cache Design Exploration Using 3D Cacti. In Proc. the International Conference on Computer Design, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. {36} H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: A power-performance simulator for interconnection networks. In Proc. the 35th International Symposium on Microarchitecture , Nov. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. {37} A. Young. Perspectives on 3D-IC Technology. Presentation at the 2nd Annual Conference on 3D Architectures for Semiconductor Integration and Packaging, June 2005.Google ScholarGoogle Scholar
  38. {38} A. Zeng, J. Lu, K. Rose, and R. Gutmann. First-Order Performance Prediction of Cache Memory with Wafer-Level3D Integration. IEEE Design and Test of Computers, 22(6), June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. {39} A. Zhang and K. Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In Proc. the 32nd International Symposium on Computer Architecture , 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 34, Issue 2
          May 2006
          383 pages
          ISSN:0163-5964
          DOI:10.1145/1150019
          Issue’s Table of Contents
          • cover image ACM Conferences
            ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture
            June 2006
            383 pages
            ISBN:076952608X

          Copyright © 2006 Authors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 May 2006

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader