ABSTRACT
The evolution of DRAM technology has been driven by capacity and bandwidth during the last decade. In contrast, DRAM access latency stays relatively constant and is trending to increase. Much efforts have been devoted to tolerate memory access latency but these techniques have reached the point of diminishing returns. Having shorter bitline and wordline length in a DRAM device will reduce the access latency. However by doing so it will impact the array efficiency. In the mainstream market, manufacturers are not willing to trade capacity for latency. Prior works had proposed hybrid-bitline DRAM design to overcome this problem. However, those methods are either intrusive to the circuit and layout of the DRAM design, or there is no direct way to migrate data between the fast and slow levels.
In this paper, we proposed a novel asymmetric DRAM with capability to perform low cost data migration between subarrays. Having this design we determined a simple management mechanism and explored many management related policies. We showed that with this new design and our simple management technique we could achieve 7.25% and 11.77% performance improvement in single- and multi-programming workloads, respectively, over a system with traditional homogeneous DRAM. This gain is above 80% of the potential performance gain of a system based on a hypothetical DRAM which is made out of short bitlines entirely.
- T.-Y. Oh, H. Chung, Y.-C. Cho, J.-W. Ryu, K. Lee, C. Lee, J.-I. Lee, H.-J. Kim, M. S. Jang, G.-H. Han, K. Kim, D. Moon, S. Bae, J.-Y. Park, K.-S. Ha, J. Lee, S.-Y. Doo, J.-B. Shin, C.-H. Shin, K. Oh, D. Hwang, T. Jang, C. Park, K. Park, J.-B. Lee, and J. S. Choi, "A 3.2Gb/s/pin 8Gb 1.0V LPDDR4 SDRAM with integrated ECC engine for sub-1V DRAM core operation," in Proceedings of Int. Solid-State Circuits Conf., 2014.Google Scholar
- R. D. Williams, T. Sze, D. Huang, S. Pannala, and C. Fang, "Server memory road map," 2012. http://www.jedec.org/sites/default/files/Ricki_Dee_Williams-Final_0.pdf.Google Scholar
- Micron, "RLDRAM 2 SIO," 2004. http://www.micron.com/-/media/documents/products/data%20sheet/dram/576mb_rldram_2_sio.pdf.Google Scholar
- Y. H. Son, O. Seongil, Y. Ro, J. W. Lee, and J. H. Ahn, "Reducing memory access latency with asymmetric dram bank organizations," in Proceedings of the 40th International Symposium on Computer Architecture, 2013. Google ScholarDigital Library
- D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, "Tiered-latency dram: A low latency and low cost dram architecture," in Proceedings of the 19th Symp. on High Performance Computer Architecture, 2013. Google ScholarDigital Library
- V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "RowClone: Fast and Energy-efficient in-DRAM Bulk Data Copy and Initialization," in Proceedings of the 46th International Symposium on Microarchitecture, 2013. Google ScholarDigital Library
- T. Vogelsang, "Understanding the energy consumption of dynamic random access memories," in Proceedings of the 43rd Int. Symp. on Microarchitecture, 2010. Google ScholarDigital Library
- T. Takahashi, T. Sekiguchi, R. Takemura, S. Narui, H. Fujisawa, S. Miyatake, M. Morino, K. Arai, S. Yamada, S. Shukuri, M. Nakamura, Y. Tadaki, K. Kajigaya, K. Kimura, and B. Kiyoo Itoh, "A multigigabit dram technology with 6f2 open-bitline cell, distributed overdriven sensing, and stacked-ash fuse," IEEE JSSCC, Nov 2001.Google Scholar
- Y. Sato, T. Suzuki, T. Aikawa, S. Fujioka, W. Fujieda, H. Kobayashi, H. Ikeda, T. Nagasawa, A. Funyu, Y. Fuji, K. Kawasaki, M. Yamazaki, and M. Taguchi, "Fast cycle ram (fcram); a 20-ns random row access, pipe-lined operating dram," in Proceedings of the Symp. on VLSI Circuits, 1998.Google Scholar
- J. T. Pawlowski, "Hybrid memory cube (HMC)," in Hotchips 2011. http://www.hotchips.org/wp-content/uploads/hc_archives/hc23/HC23.18.3-memoryFPGA/HC23.18.320-HybridCube-Pawlowski-Micron.pdf.Google Scholar
- T. Schloesser, F. Jakubowski, J. v.Kluge, A. Graham, S. Slesazeck, M. Popp, P. Baars, K. Muemmler, P. Moll, K. Wilson, A. Buerke, D. Koehler, J. Radecker, E. Erben, U. Zimmermann, T. Vorrath, B. Fischer, G. Aichmayr, R. Agaiby, W. Pamler, T. Schuster, W. Bergner, and W. Mueller, "6f2 buried wordline dram cell for 40nm and beyond," in Proceedings of IEDM, pp. 1--4, Dec 2008.Google Scholar
- A. Kotabe, Y. Yanagawa, R. Takemura, T. Sekiguchi, and B. Kiyoo Itoh, "Asymmetric cross-coupled sense amplifier for small-sized 0.5-v gigabit-dram arrays," in Proceedings of the Asian Solid State Circuits Conference (A-SSCC), 2010.Google Scholar
- T. Schloesser, F. Jakubowski, J. v.Kluge, A. Graham, S. Slesazeck, M. Popp, P. Baars, K. Muemmler, P. Moll, K. Wilson, A. Buerke, D. Koehler, J. Radecker, E. Erben, U. Zimmermann, T. Vorrath, B. Fischer, G. Aichmayr, R. Agaiby, W. Pamler, T. Schuster, W. Bergner, and W. Mueller, "6f2 buried wordline dram cell for 40nm and beyond," in Proceedings of Int. Elec. Devices Meeting, 2008.Google Scholar
- A. Patel, F. Afram, S. Chen, and K. Ghose, "MARSS: A Full System Simulator for Multicore x86 CPUs," in Proceedings of the 48th DAC, 2011. Google ScholarDigital Library
- A. Jaleel, "Memory characterization of workloads using instrumentation-driven simulation," 2010. http://www.glue.umd.edu/ajaleel/workload.Google Scholar
- Samsung, "2Gb D-die DDR3 SDRAM," 2011. http://www.samsung.com/global/business/semiconductor/file/2011/product/2011/8/29/729200ds_k4b2gxx46d_rev113.pdf.Google Scholar
- Renesas, "1.1G-BIT Low Latency DRAM-III," 2013. http://documentation.renesas.com/doc/products/memory/r10ds0012ej0200_memory.pdf.Google Scholar
- Micron, "2gb: x4, x8, x16 ddr2 sdram," 2006. http://www.micron.com/-/media/documents/products/data%20sheet/dram/ddr2/2gb_ddr2.pdf.Google Scholar
- A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. P. Jouppi, "Rethinking dram design and organization for energy-constrained multi-cores," in Proceedings of the 37th Annual Int. Symposium on Computer Architecture, 2010. Google ScholarDigital Library
- Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, "A case for exploiting subarray-level parallelism (salp) in dram," in Proceedings of the 39th International Symposium on Computer Architecture, 2012. Google ScholarDigital Library
- F. Ware and C. Hampel, "Improving power and data efficiency with threaded memory modules," in Proceeding of the Int. Conf. on Computer Design (ICCD), 2006.Google Scholar
- H. Zheng, J. Lin, Z. Zhang, E. Gorbatov, H. David, and Z. Zhu, "Mini-rank: Adaptive dram architecture for improving memory power efficiency," in Proceedings of the 41th International Symposium on Microarchitecture, 2008. Google ScholarDigital Library
- D. H. Yoon, M. K. Jeong, and M. Erez, "Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput," SIGARCH Computer Architecture News 6/2011. Google ScholarDigital Library
- M. K. Jeong, D. H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez, "Balancing dram locality and parallelism in shared memory cmp systems," in Proceedings on the 18th High Performance Computer Architecture, 2012. Google ScholarDigital Library
- B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. Loh, D. McCauley, P. Morrow, D. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, and C. Webb, "Die Stacking (3D) Microarchitecture," in Proceedings of the 39th Int. Symposium on Microarchitecture, 2006. Google ScholarDigital Library
- G. H. Loh, "3D-Stacked Memory Architectures for Multi Core Processors," SIGARCH Computer Architecture News, 2008. Google ScholarDigital Library
- G. Dhiman, R. Ayoub, and T. Rosing, "Pdram: A hybrid pram and dram main memory system," in Proceddings of the 46th Design Automation Conference (DAC), pp. 664--669, July 2009. Google ScholarDigital Library
- L. Zhao, R. Iyer, R. Illikkal, and D. Newell, "Exploring dram cache architectures for cmp server platforms," in Proceedings of the 25th ICCD, 2007.Google Scholar
- X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, D. Solihin, and R. Balasubramonian, "Chop: Adaptive filter-based dram caching for cmp server platforms," in Proceedings of the 16th High Performance Computer Architecture, 2010.Google Scholar
- Improving DRAM latency with dynamic asymmetric subarray
Recommendations
VRL-DRAM: Improving DRAM Performance via Variable Refresh Latency
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)A DRAM chip requires periodic refresh operations to prevent data loss due to charge leakage in DRAM cells. Refresh operations incur significant performance overhead as a DRAM bank/rank becomes unavailable to service access requests while being refreshed. ...
Achieving DRAM-Like PCM by Trading Off Capacity for Latency
Phase Change Memory (PCM) is considered one of the most promising scalable non-volatile main memory alternatives to DRAM. It provides <inline-formula><tex-math notation="LaTeX">$\sim$</tex-math><alternatives><mml:math><mml:mo>∼</mml:mo></mml:math><...
Tiered-latency DRAM: A low latency and low cost DRAM architecture
HPCA '13: Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM latency has remained almost constant, making memory latency the performance bottleneck in today's ...
Comments