skip to main content
article

3D-Stacked Memory Architectures for Multi-core Processors

Published:01 June 2008Publication History
Skip Abstract Section

Abstract

Three-dimensional integration enables stacking memory directly on top of a microprocessor, thereby significantly reducing wire delay between the two. Previous studies have examined the performance benefits of such an approach, but all of these works only consider commodity 2D DRAM organizations. In this work, we explore more aggressive 3D DRAM organizations that make better use of the additional die-to-die bandwidth provided by 3D stacking, as well as the additional transistor count. Our simulation results show that with a few simple changes to the 3D-DRAM organization, we can achieve a 1.75x speedup over previously proposed 3D-DRAM approaches on our memory-intensive multi-programmed workloads on a quad-core processor. The significant increase in memory system performance makes the L2 miss handling architecture (MHA) a new bottleneck, which we address by combining a novel data structure called the Vector Bloom Filter with dynamic MSHR capacity tuning. Our scalable L2 MHA yields an additional 17.8% performance improvement over our 3D-stacked memory architecture.

References

  1. K. Albayraktaroglu, A. Jaleel, X. Wu, M. Franklin, B. Jacob, C.-W. Tseng, and D. Yeung. BioBench: A Benchmark Suite of Bioinformatics Applications. In Proceedings of the Intl. Symp. on Performance Analysis of Systems and Software, pages 2-9, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD Corporation. Barcelona PR Fact Sheet. http://www.amd.com, September 2007.Google ScholarGoogle Scholar
  3. T. Austin, E. Larson, and D. Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. IEEE Micro Magazine, pages 59- 67, February 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. I. Bahar and S. Manne. Power and Energy Reduction Via Pipeline Balancing. In Proceedings of the 28th Intl. Symp. on Microarchitecture , pages 218-229, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Black, M. M. Annavaram, E. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCauley, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. P. Shen, and C. Webb. Die-Stacking (3D) Microarchitecture. In Proceedings of the 39th Intl. Symp. on Microarchitecture, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. H. Bloom. Space/Time Tradeoffs in Hash Coding with Allowable Errors. Communications of the Association for Computing Machinery , 13(7):422-426, July 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Cuppu, B. Jacob, B. Davis, and T. Mudge. A Performance Comparison of Contemporary DRAM Architectures. In Proceedings of the 26th Intl. Symp. on Computer Architecture, pages 222-233, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Das, A. Fan, K.-N. Chen, and C. S. Tan. Technology, Performance, and Computer-Aided Design of Three-Dimensional Integrated Circuits. In Proceedings of the Intl. Symp. on Physical Design, pages 108-115, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Doweck. Inside Intel Core Microarchitecture and Smart Memory Access. White paper, Intel Corporation, 2006. http://download.intel.com/technology/architecture/sma.pdf.Google ScholarGoogle Scholar
  10. J. E. Fritts, F. W. Steiling, and J. A. Tucek. MediaBench II Video: Expediting the Next Generation of Video Systems Research. Embedded Processors for Multimedia and Communications II, Proceedings of the SPIE, 5683:79-93, March 2005.Google ScholarGoogle Scholar
  11. M. Ghosh and H.-H. S. Lee. Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs. In Proceedings of the 40th Intl. Symp. on Microarchitecture , 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Gove. CPU2006 Working Set Size. Computer Architecture News, 35(1):90-96, March 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. W. Guarini, A. W. Topol, M. Ieong, R. Yu, L. Shi, M. R. Newport, D. J. Frank, D. V. Singh, G. M. Cohen, S. V. Nitta, D. C. Boyd, P. A. O'Neil, S. L. Tempest, H. B. Pogge, S. Purushothaman, and W. E. Haensch. Electrical Integrity of State-of-the-Art 0.13μm SOI CMOS Devices and Circuits Transferred for Three-Dimensional (3D) Integrated Circuit (IC) Fabrication. In Proceedings of the Intl. Electron Devices Meeting, pages 943-945, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Gupta, M. Hilbert, S. Hong, and R. Patti. Techniques for Producing 3D ICs with High-Density Interconnect. In Proceedings of the 21st Intl. VLSI Multilevel Interconnection Conf., 2004.Google ScholarGoogle Scholar
  15. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A Free, Commerically Representative Embedded Benchmark Suite. In Proceedings of the 4th Work. on Workload Characterization, pages 83-94, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Hamerly, E. Perelman, J. Lau, and B. Calder. SimPoint 3.0: Faster and More Flexible Program Analysis. In Proceedings of the Work. on Modeling, Benchmarking and Simulation, 2005.Google ScholarGoogle Scholar
  17. H. Hidaka, Y. Matsuda, M. Asakura, and K. Fujishima. The Cache DRAM Architecture. IEEE Micro Magazine, 10(2):14-25, April 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Hur and C. Lin. Adaptive History-Based Memory Schedulers. In Proceedings of the 37th Intl. Symp. on Microarchitecture, pages 343- 354, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Intel Corporation. Introducing the 45nm Next Generation Intel Core Microarchitecture. Technology@IntelMagazine, 4(10), May 2007.Google ScholarGoogle Scholar
  20. T. H. Kgil, S. D'Souza, A. G. Saidi, N. Binkert, R. Dreslinski, S. Reinhardt, K. Flautner, and T. Mudge. PicoServer: Using 3D Stacking Technology to Enable a Compact Energy Efficient Chip Multiprocessor. In Proceedings of the 12th Symp. on Architectural Support for Programming Languages and Operating Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. Kim, M. S. Gupta, G.-Y. Wei, and D. M. Brooks. Enabling On-Chip Switching Regulators for Multi-Core Processors using Current Staggering. In Proceedings of the Work. on Architectural Support for Gigascale Integration, 2007.Google ScholarGoogle Scholar
  22. D. Kroft. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proceedings of the 8th Intl. Symp. on Computer Architecture, pages 81-87, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems. In Proceedings of the 30th Intl. Symp. on Microarchitecture , pages 330-335, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. C. Liu, I. Ganusov, M. Burtscher, and S. Tiwari. Bridging the Processor-Memory Performance Gap with 3D IC Technology. IEEE Design and Test of Computers, 22(6):556-564, November-December 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. H. Loh, Y. Xie, and B. Black. Processor Design in 3D Die-Stacking Technologies. IEEE Micro Magazine, 27(3), May-June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. L. Loi, B. Agarwal, N. Srivastava, S.-C. Lin, and T. Sherwood. A Thermally-Aware Performance Analysis of Vertically Integrated (3- D) Processor-Memory Hierarchy. In Proceedings of the 43rd Design Automation Conf., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Madan and R. Balasubramonian. Leveraging 3D Technology for Improved Reliability. In Proceedings of the 40th Intl. Symp. on Microarchitecture , 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. D. McCalpin. Stream: Sustainable Memory Bandwidth in High Performance Computers. Technical report, http://www.cs.virginia.edu/stream/.Google ScholarGoogle Scholar
  29. S. Mysore, B. Agarwal, S.-C. Lin, N. Srivastava, K. Banerjee, and T. Sherwood. Introspective 3D Chips. In Proceedings of the 12th Symp. on Architectural Support for Programming Languages and Operating Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Nelson, C. Webb, D. McCauley, K. Raol, J. Rupley, J. DeVale, and B. Black. A 3D Interconnect Methodology Applied to iA32-class Architectures for Performance Improvements through RC Mitigation. In Proceedings of the 21st Intl. VLSI Multilevel Interconnection Conf., 2004.Google ScholarGoogle Scholar
  31. D. V. Ponomarev, G. Kucuk, and K. Ghose. Dynamic Allocation of Datapath Resources for Low Power. In Proceedings of the Work. on Complexity-Effective Design, Göteborg, Sweden, June 2001.Google ScholarGoogle Scholar
  32. K. Puttaswamy and G. H. Loh. Thermal Herding: Microarchitecture Techniques for Controlling HotSpots in High-Performance 3D- Integrated Processors. In Proceedings of the 13th Intl. Symp. on High Performance Computer Architecture, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proceedings of the 39th Intl. Symp. on Microarchitecture , pages 423-432, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. In Proceedings of the 27th Intl. Symp. on Computer Architecture, pages 128-138, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Seznec and P. Michaud. A Case for (Partially) TAgges GEometric History Length Branch Prediction. Journal of Instruction Level Parallelism, 8:1-23, 2006.Google ScholarGoogle Scholar
  36. K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan. Temperature-Aware Microarchitecture. In Proceedings of the 30th Intl. Symp. on Computer Architecture, pages 2-13, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. S. Sohi and M. Franklin. High-Bandwidth Data Memory Systems for Superscalar Processors. In Proceedings of the 18th Intl. Symp. on Computer Architecture, pages 53-62, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tezzaron Semiconductors. Leo FaStack 1Gb DDR SDRAM Datasheet. http://www.tezzaron.com/memory/TSC_Leo.htm, August 2002.Google ScholarGoogle Scholar
  39. Tezzaron Semiconductors. Tezzaron Unveils 3D SRAM. Press Release from http://www.tezzaron.com, January 24 2005.Google ScholarGoogle Scholar
  40. J. M. Tuck, L. Ceze, and J. Torrellas. Scalable Cache Miss Handling for High Memory Level Parallelism. In Proceedings of the 39th Intl. Symp. on Microarchitecture, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. A. Wulf and S. A. McKee. Hitting the Memory Wall: Implications of the Obvious. Computer Architecture News, 23(1):20-24, March 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. L. Zhao, R. Iyer, S. Makineni, J. Moses, R. Illikkal, and D. Newell. Performance, Area and Bandwidth Implications on Large-Scale CMP Cache Design. In Proceedings of the Work. on Chip Multiprocessor Memory Systems and Interconnects, 2007.Google ScholarGoogle Scholar

Index Terms

  1. 3D-Stacked Memory Architectures for Multi-core Processors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 36, Issue 3
        June 2008
        449 pages
        ISSN:0163-5964
        DOI:10.1145/1394608
        Issue’s Table of Contents
        • cover image ACM Conferences
          ISCA '08: Proceedings of the 35th Annual International Symposium on Computer Architecture
          June 2008
          449 pages
          ISBN:9780769531748

        Copyright © 2008 Author

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 2008

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader