Abstract
In this paper we introduce CACTI-D, a significant enhancement of CACTI 5.0. CACTI-D adds support for modeling of commodity DRAM technology and support for main memory DRAM chip organization. CACTI-D enables modeling of the complete memory hierarchy with consistent models all the way from SRAM based L1 caches through main memory DRAMs on DIMMs. We illustrate the potential applicability of CACTI-D in the design and analysis of future memory hierarchies by carrying out a last level cache study for a multicore multithreaded architecture at the 32nm technology node. In this study we use CACTI-D to model all components of the memory hierarchy including L1, L2, last level SRAM, logic process based DRAM or commodity DRAM L3 caches, and main memory DRAM chips. We carry out architectural simulation using benchmarks with large data sets and present results of their execution time, breakdown of power in the memory hierarchy, and system energy-delay product for the different system configurations. We find that commodity DRAM technology is most attractive for stacked last level caches, with significantly lower energy-delay products.
- Micron DDR3 SDRAM Products. http://www.micron.com/ products/dram/ddr3/.Google Scholar
- Micron System Power Calculator. http://www.micron.com/ support/part_info/powercalc.aspx.Google Scholar
- J. Amon, et al. A highly manufacturable deep trench based DRAMcell layout with a planar array device in a 70nm technology. In IEDM, 2004.Google Scholar
- B. S. Amrutur and M. A. Horowitz. Speed and Power Scaling of SRAM's. JSSC, 35(2), Feb 2000.Google Scholar
- B. S. Amrutur and M. A. Horowitz. Fast-Low Power Decoders for RAMs. JSSC, 36(10), Oct 2001.Google Scholar
- B. Black, et al. Die Stacking (3D) Microarchitecture. In MICRO 39, Dec 2006. Google ScholarDigital Library
- D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In ISCA, Jun 2000. Google ScholarDigital Library
- J. Chang, et al. The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series. JSSC, 42(4), Apr 2007.Google Scholar
- A. Falcon, P. Faraboschi, and D. Ortega. Combining Simulation and Virtualization through Dynamic Sampling. In ISPASS, Apr 2007.Google ScholarCross Ref
- R. Ho. On-chip Wires: Scaling and Efficiency. Ph.D. thesis, Stanford University, 2003.Google Scholar
- S. S. Iyer, et al. Embedded DRAM: Technology platform for the Blue Gene/L chip. IBM Journal of Research and Development, 49(2/3), Mar/May 2005. Google ScholarDigital Library
- J. J. Barth, et al. A 500-MHz Multi-Banked Compilable DRAM Macro With Direct Write and Programmable Pipelining. JSSC, 40(1), Jan 2005.Google Scholar
- A. Jaleel, M. Mattina, and B. Jacob. Last Level Cache (LLC) Performance of Data Mining Workloads On a CMP - A Case Study of Parallel Bioinformatics Workloads. In HPCA, Feb 2006.Google ScholarCross Ref
- H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NASA Ames Research Center, 1999.Google Scholar
- B. Keeth and R. Baker. DRAM Circuit Design: A Tutorial. IEEE Press, 2000. Google ScholarDigital Library
- T. Kgil, et al. PicoServer: Using 3D Stacking Technology to Enable a Compact Energy Efficient Chip Multiprocessor. In ASPLOS, Oct 2006. Google ScholarDigital Library
- P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2), 2005. Google ScholarDigital Library
- F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, and M. Kandemir. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory. In ISCA, Jun 2006. Google ScholarDigital Library
- X. Liang, K. Turgay, and D. Brooks. Architectural Power Models for SRAM and CAM Structures Based on Hybrid Analytical/Empirical Techniques. In ICCAD, Nov 2007. Google ScholarDigital Library
- M. Mamidipaka and N. Dutt. eCACTI: An Enhanced Power Estimation Model for On-chip Caches. Technical Report TR-04-28, Center for Embedded Computer Systems, 2004.Google Scholar
- R. E. Matick and S. E. Schuster. Logic-based eDRAM: Origins and rationale for use. IBM Journal of Research and Development, 49(1), Jan 2005. Google ScholarDigital Library
- H. McIntyre, et al. A 4-MB On-Chip L2 Cache for a 90-nm 1.6-GHz 64-bit Microprocessor. JSSC, 40(1), Jan 2005.Google Scholar
- W. Mueller, et al. Trench DRAM Technologies for the 50nm Node and Beyond. In International Symposium on VLSI Technology, Systems, and Applications, Apr 2006.Google ScholarCross Ref
- W. Mueller, et al. Challenges for the DRAM Cell Scaling to 40nm. In IEDM, Dec 2005.Google ScholarCross Ref
- N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In MICRO, Dec 2007. Google ScholarDigital Library
- K. Noh, et al. A 130nm 1.1V 143MHz SRAM-like Embedded DRAM COMPILER with Dual Asymmetric Bit Line Sensing Scheme and Quiet Unselected IO scheme. In Symposium on VLSI Circuits, Jun 2004.Google Scholar
- M. Oka and M. Suzuoki. Designing and Programming the Emotion Engine. IEEE Micro, 19(6), Nov/Dec 1999. Google ScholarDigital Library
- K. Puttaswamy and G. H. Loh. Implementing Caches in a 3D Technology for High Performance Processors. In ICCD, Oct 2005. Google ScholarDigital Library
- S. Rixner, W. J. Dally, U. J. Kapasi, P. R. Mattson, and J. D. Owens. Memory Access Scheduling. In ISCA, Jun 2000. Google ScholarDigital Library
- S. Rodriguez and B. Jacob. Energy/Power Breakdown of Pipelined Nanometer Caches (90nm/65nm/45nm/32nm). In ISPLED, Oct 2006. Google ScholarDigital Library
- Ron Ho. Tutorial: Dealing with issues in VLSI interconnect scaling. In ISSCC, Feb 2007.Google Scholar
- Semiconductor Industries Association. International Technology Roadmap for Semiconductors. http://www.itrs.net/, 2006 Update.Google Scholar
- K. Skadron, M. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan. Temperature-Aware Microarchitecture. In ISCA, Jun 2003. Google ScholarDigital Library
- S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. P. Jouppi. CACTI 5.1. Technical Report HPL-2008-20, HP Labs.Google Scholar
- S. Thoziyoor, N. Muralimanohar, and N. P. Jouppi. CACTI 5.0. Technical Report HPL-2007-167, HP Labs.Google Scholar
- Y.-F. Tsai, Y. Xie, V. Narayanan, and M. J. Irwin. Three-Dimensional Cache Design Exploration Using 3DCacti. In ICCD, Oct 2005. Google ScholarDigital Library
- R. Varada, M. Sriram, K. Chou, and J. Guzzo. Design and Integration Methods for a Multi-threaded Dual Core 65nm Xeon® Processor. In ICCAD, Nov 2006. Google ScholarDigital Library
- G. Wang, et al. A 0.168μm 2/0.11μm 2 Highly Scalable High Performance embedded DRAM Cell for 90/65-nm Logic Applications. In Symposium on VLSI Circuits, Apr 2005.Google Scholar
- H. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: A Power-Performance Simulator for Interconnection Networks. In MICRO, Nov 2002. Google ScholarDigital Library
- S. Wilton and N. P. Jouppi. An Enhanced Access and Cycle Time Model for On-Chip Caches. Technical Report 93/5, DEC WRL, 1994.Google Scholar
- A. Zeng, K. Rose, and R. J. Gutmann. Memory Performance Prediction for High-Performance Microprocessors at Deep Submicrometer Technologies. TCAD, 25(9), Sep 2006. Google ScholarDigital Library
Index Terms
- A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies
Recommendations
A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies
ISCA '08: Proceedings of the 35th Annual International Symposium on Computer ArchitectureIn this paper we introduce CACTI-D, a significant enhancement of CACTI 5.0. CACTI-D adds support for modeling of commodity DRAM technology and support for main memory DRAM chip organization. CACTI-D enables modeling of the complete memory hierarchy with ...
Energy efficient Phase Change Memory based main memory for future high performance systems
IGCC '11: Proceedings of the 2011 International Green Computing Conference and WorkshopsPhase Change Memory (PCM) has recently attracted a lot of attention as a scalable alternative to DRAM for main memory systems. As the need for high-density memory increases, DRAM has proven to be less attractive from the point of view of scaling and ...
Comments