skip to main content
article

A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

Published:01 June 2008Publication History
Skip Abstract Section

Abstract

In this paper we introduce CACTI-D, a significant enhancement of CACTI 5.0. CACTI-D adds support for modeling of commodity DRAM technology and support for main memory DRAM chip organization. CACTI-D enables modeling of the complete memory hierarchy with consistent models all the way from SRAM based L1 caches through main memory DRAMs on DIMMs. We illustrate the potential applicability of CACTI-D in the design and analysis of future memory hierarchies by carrying out a last level cache study for a multicore multithreaded architecture at the 32nm technology node. In this study we use CACTI-D to model all components of the memory hierarchy including L1, L2, last level SRAM, logic process based DRAM or commodity DRAM L3 caches, and main memory DRAM chips. We carry out architectural simulation using benchmarks with large data sets and present results of their execution time, breakdown of power in the memory hierarchy, and system energy-delay product for the different system configurations. We find that commodity DRAM technology is most attractive for stacked last level caches, with significantly lower energy-delay products.

References

  1. Micron DDR3 SDRAM Products. http://www.micron.com/ products/dram/ddr3/.Google ScholarGoogle Scholar
  2. Micron System Power Calculator. http://www.micron.com/ support/part_info/powercalc.aspx.Google ScholarGoogle Scholar
  3. J. Amon, et al. A highly manufacturable deep trench based DRAMcell layout with a planar array device in a 70nm technology. In IEDM, 2004.Google ScholarGoogle Scholar
  4. B. S. Amrutur and M. A. Horowitz. Speed and Power Scaling of SRAM's. JSSC, 35(2), Feb 2000.Google ScholarGoogle Scholar
  5. B. S. Amrutur and M. A. Horowitz. Fast-Low Power Decoders for RAMs. JSSC, 36(10), Oct 2001.Google ScholarGoogle Scholar
  6. B. Black, et al. Die Stacking (3D) Microarchitecture. In MICRO 39, Dec 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In ISCA, Jun 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Chang, et al. The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series. JSSC, 42(4), Apr 2007.Google ScholarGoogle Scholar
  9. A. Falcon, P. Faraboschi, and D. Ortega. Combining Simulation and Virtualization through Dynamic Sampling. In ISPASS, Apr 2007.Google ScholarGoogle ScholarCross RefCross Ref
  10. R. Ho. On-chip Wires: Scaling and Efficiency. Ph.D. thesis, Stanford University, 2003.Google ScholarGoogle Scholar
  11. S. S. Iyer, et al. Embedded DRAM: Technology platform for the Blue Gene/L chip. IBM Journal of Research and Development, 49(2/3), Mar/May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. J. Barth, et al. A 500-MHz Multi-Banked Compilable DRAM Macro With Direct Write and Programmable Pipelining. JSSC, 40(1), Jan 2005.Google ScholarGoogle Scholar
  13. A. Jaleel, M. Mattina, and B. Jacob. Last Level Cache (LLC) Performance of Data Mining Workloads On a CMP - A Case Study of Parallel Bioinformatics Workloads. In HPCA, Feb 2006.Google ScholarGoogle ScholarCross RefCross Ref
  14. H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NASA Ames Research Center, 1999.Google ScholarGoogle Scholar
  15. B. Keeth and R. Baker. DRAM Circuit Design: A Tutorial. IEEE Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Kgil, et al. PicoServer: Using 3D Stacking Technology to Enable a Compact Energy Efficient Chip Multiprocessor. In ASPLOS, Oct 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, and M. Kandemir. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory. In ISCA, Jun 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Liang, K. Turgay, and D. Brooks. Architectural Power Models for SRAM and CAM Structures Based on Hybrid Analytical/Empirical Techniques. In ICCAD, Nov 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Mamidipaka and N. Dutt. eCACTI: An Enhanced Power Estimation Model for On-chip Caches. Technical Report TR-04-28, Center for Embedded Computer Systems, 2004.Google ScholarGoogle Scholar
  21. R. E. Matick and S. E. Schuster. Logic-based eDRAM: Origins and rationale for use. IBM Journal of Research and Development, 49(1), Jan 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. McIntyre, et al. A 4-MB On-Chip L2 Cache for a 90-nm 1.6-GHz 64-bit Microprocessor. JSSC, 40(1), Jan 2005.Google ScholarGoogle Scholar
  23. W. Mueller, et al. Trench DRAM Technologies for the 50nm Node and Beyond. In International Symposium on VLSI Technology, Systems, and Applications, Apr 2006.Google ScholarGoogle ScholarCross RefCross Ref
  24. W. Mueller, et al. Challenges for the DRAM Cell Scaling to 40nm. In IEDM, Dec 2005.Google ScholarGoogle ScholarCross RefCross Ref
  25. N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In MICRO, Dec 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Noh, et al. A 130nm 1.1V 143MHz SRAM-like Embedded DRAM COMPILER with Dual Asymmetric Bit Line Sensing Scheme and Quiet Unselected IO scheme. In Symposium on VLSI Circuits, Jun 2004.Google ScholarGoogle Scholar
  27. M. Oka and M. Suzuoki. Designing and Programming the Emotion Engine. IEEE Micro, 19(6), Nov/Dec 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Puttaswamy and G. H. Loh. Implementing Caches in a 3D Technology for High Performance Processors. In ICCD, Oct 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Rixner, W. J. Dally, U. J. Kapasi, P. R. Mattson, and J. D. Owens. Memory Access Scheduling. In ISCA, Jun 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Rodriguez and B. Jacob. Energy/Power Breakdown of Pipelined Nanometer Caches (90nm/65nm/45nm/32nm). In ISPLED, Oct 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ron Ho. Tutorial: Dealing with issues in VLSI interconnect scaling. In ISSCC, Feb 2007.Google ScholarGoogle Scholar
  32. Semiconductor Industries Association. International Technology Roadmap for Semiconductors. http://www.itrs.net/, 2006 Update.Google ScholarGoogle Scholar
  33. K. Skadron, M. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan. Temperature-Aware Microarchitecture. In ISCA, Jun 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. P. Jouppi. CACTI 5.1. Technical Report HPL-2008-20, HP Labs.Google ScholarGoogle Scholar
  35. S. Thoziyoor, N. Muralimanohar, and N. P. Jouppi. CACTI 5.0. Technical Report HPL-2007-167, HP Labs.Google ScholarGoogle Scholar
  36. Y.-F. Tsai, Y. Xie, V. Narayanan, and M. J. Irwin. Three-Dimensional Cache Design Exploration Using 3DCacti. In ICCD, Oct 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Varada, M. Sriram, K. Chou, and J. Guzzo. Design and Integration Methods for a Multi-threaded Dual Core 65nm Xeon® Processor. In ICCAD, Nov 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. Wang, et al. A 0.168μm 2/0.11μm 2 Highly Scalable High Performance embedded DRAM Cell for 90/65-nm Logic Applications. In Symposium on VLSI Circuits, Apr 2005.Google ScholarGoogle Scholar
  39. H. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: A Power-Performance Simulator for Interconnection Networks. In MICRO, Nov 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Wilton and N. P. Jouppi. An Enhanced Access and Cycle Time Model for On-Chip Caches. Technical Report 93/5, DEC WRL, 1994.Google ScholarGoogle Scholar
  41. A. Zeng, K. Rose, and R. J. Gutmann. Memory Performance Prediction for High-Performance Microprocessors at Deep Submicrometer Technologies. TCAD, 25(9), Sep 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGARCH Computer Architecture News
            ACM SIGARCH Computer Architecture News  Volume 36, Issue 3
            June 2008
            449 pages
            ISSN:0163-5964
            DOI:10.1145/1394608
            Issue’s Table of Contents
            • cover image ACM Conferences
              ISCA '08: Proceedings of the 35th Annual International Symposium on Computer Architecture
              June 2008
              449 pages
              ISBN:9780769531748

            Copyright © 2008 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 June 2008

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader