skip to main content
10.1145/2830772.2830827acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Improving DRAM latency with dynamic asymmetric subarray

Published:05 December 2015Publication History

ABSTRACT

The evolution of DRAM technology has been driven by capacity and bandwidth during the last decade. In contrast, DRAM access latency stays relatively constant and is trending to increase. Much efforts have been devoted to tolerate memory access latency but these techniques have reached the point of diminishing returns. Having shorter bitline and wordline length in a DRAM device will reduce the access latency. However by doing so it will impact the array efficiency. In the mainstream market, manufacturers are not willing to trade capacity for latency. Prior works had proposed hybrid-bitline DRAM design to overcome this problem. However, those methods are either intrusive to the circuit and layout of the DRAM design, or there is no direct way to migrate data between the fast and slow levels.

In this paper, we proposed a novel asymmetric DRAM with capability to perform low cost data migration between subarrays. Having this design we determined a simple management mechanism and explored many management related policies. We showed that with this new design and our simple management technique we could achieve 7.25% and 11.77% performance improvement in single- and multi-programming workloads, respectively, over a system with traditional homogeneous DRAM. This gain is above 80% of the potential performance gain of a system based on a hypothetical DRAM which is made out of short bitlines entirely.

References

  1. T.-Y. Oh, H. Chung, Y.-C. Cho, J.-W. Ryu, K. Lee, C. Lee, J.-I. Lee, H.-J. Kim, M. S. Jang, G.-H. Han, K. Kim, D. Moon, S. Bae, J.-Y. Park, K.-S. Ha, J. Lee, S.-Y. Doo, J.-B. Shin, C.-H. Shin, K. Oh, D. Hwang, T. Jang, C. Park, K. Park, J.-B. Lee, and J. S. Choi, "A 3.2Gb/s/pin 8Gb 1.0V LPDDR4 SDRAM with integrated ECC engine for sub-1V DRAM core operation," in Proceedings of Int. Solid-State Circuits Conf., 2014.Google ScholarGoogle Scholar
  2. R. D. Williams, T. Sze, D. Huang, S. Pannala, and C. Fang, "Server memory road map," 2012. http://www.jedec.org/sites/default/files/Ricki_Dee_Williams-Final_0.pdf.Google ScholarGoogle Scholar
  3. Micron, "RLDRAM 2 SIO," 2004. http://www.micron.com/-/media/documents/products/data%20sheet/dram/576mb_rldram_2_sio.pdf.Google ScholarGoogle Scholar
  4. Y. H. Son, O. Seongil, Y. Ro, J. W. Lee, and J. H. Ahn, "Reducing memory access latency with asymmetric dram bank organizations," in Proceedings of the 40th International Symposium on Computer Architecture, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, "Tiered-latency dram: A low latency and low cost dram architecture," in Proceedings of the 19th Symp. on High Performance Computer Architecture, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "RowClone: Fast and Energy-efficient in-DRAM Bulk Data Copy and Initialization," in Proceedings of the 46th International Symposium on Microarchitecture, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Vogelsang, "Understanding the energy consumption of dynamic random access memories," in Proceedings of the 43rd Int. Symp. on Microarchitecture, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Takahashi, T. Sekiguchi, R. Takemura, S. Narui, H. Fujisawa, S. Miyatake, M. Morino, K. Arai, S. Yamada, S. Shukuri, M. Nakamura, Y. Tadaki, K. Kajigaya, K. Kimura, and B. Kiyoo Itoh, "A multigigabit dram technology with 6f2 open-bitline cell, distributed overdriven sensing, and stacked-ash fuse," IEEE JSSCC, Nov 2001.Google ScholarGoogle Scholar
  9. Y. Sato, T. Suzuki, T. Aikawa, S. Fujioka, W. Fujieda, H. Kobayashi, H. Ikeda, T. Nagasawa, A. Funyu, Y. Fuji, K. Kawasaki, M. Yamazaki, and M. Taguchi, "Fast cycle ram (fcram); a 20-ns random row access, pipe-lined operating dram," in Proceedings of the Symp. on VLSI Circuits, 1998.Google ScholarGoogle Scholar
  10. J. T. Pawlowski, "Hybrid memory cube (HMC)," in Hotchips 2011. http://www.hotchips.org/wp-content/uploads/hc_archives/hc23/HC23.18.3-memoryFPGA/HC23.18.320-HybridCube-Pawlowski-Micron.pdf.Google ScholarGoogle Scholar
  11. T. Schloesser, F. Jakubowski, J. v.Kluge, A. Graham, S. Slesazeck, M. Popp, P. Baars, K. Muemmler, P. Moll, K. Wilson, A. Buerke, D. Koehler, J. Radecker, E. Erben, U. Zimmermann, T. Vorrath, B. Fischer, G. Aichmayr, R. Agaiby, W. Pamler, T. Schuster, W. Bergner, and W. Mueller, "6f2 buried wordline dram cell for 40nm and beyond," in Proceedings of IEDM, pp. 1--4, Dec 2008.Google ScholarGoogle Scholar
  12. A. Kotabe, Y. Yanagawa, R. Takemura, T. Sekiguchi, and B. Kiyoo Itoh, "Asymmetric cross-coupled sense amplifier for small-sized 0.5-v gigabit-dram arrays," in Proceedings of the Asian Solid State Circuits Conference (A-SSCC), 2010.Google ScholarGoogle Scholar
  13. T. Schloesser, F. Jakubowski, J. v.Kluge, A. Graham, S. Slesazeck, M. Popp, P. Baars, K. Muemmler, P. Moll, K. Wilson, A. Buerke, D. Koehler, J. Radecker, E. Erben, U. Zimmermann, T. Vorrath, B. Fischer, G. Aichmayr, R. Agaiby, W. Pamler, T. Schuster, W. Bergner, and W. Mueller, "6f2 buried wordline dram cell for 40nm and beyond," in Proceedings of Int. Elec. Devices Meeting, 2008.Google ScholarGoogle Scholar
  14. A. Patel, F. Afram, S. Chen, and K. Ghose, "MARSS: A Full System Simulator for Multicore x86 CPUs," in Proceedings of the 48th DAC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Jaleel, "Memory characterization of workloads using instrumentation-driven simulation," 2010. http://www.glue.umd.edu/ajaleel/workload.Google ScholarGoogle Scholar
  16. Samsung, "2Gb D-die DDR3 SDRAM," 2011. http://www.samsung.com/global/business/semiconductor/file/2011/product/2011/8/29/729200ds_k4b2gxx46d_rev113.pdf.Google ScholarGoogle Scholar
  17. Renesas, "1.1G-BIT Low Latency DRAM-III," 2013. http://documentation.renesas.com/doc/products/memory/r10ds0012ej0200_memory.pdf.Google ScholarGoogle Scholar
  18. Micron, "2gb: x4, x8, x16 ddr2 sdram," 2006. http://www.micron.com/-/media/documents/products/data%20sheet/dram/ddr2/2gb_ddr2.pdf.Google ScholarGoogle Scholar
  19. A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. P. Jouppi, "Rethinking dram design and organization for energy-constrained multi-cores," in Proceedings of the 37th Annual Int. Symposium on Computer Architecture, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, "A case for exploiting subarray-level parallelism (salp) in dram," in Proceedings of the 39th International Symposium on Computer Architecture, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. F. Ware and C. Hampel, "Improving power and data efficiency with threaded memory modules," in Proceeding of the Int. Conf. on Computer Design (ICCD), 2006.Google ScholarGoogle Scholar
  22. H. Zheng, J. Lin, Z. Zhang, E. Gorbatov, H. David, and Z. Zhu, "Mini-rank: Adaptive dram architecture for improving memory power efficiency," in Proceedings of the 41th International Symposium on Microarchitecture, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. H. Yoon, M. K. Jeong, and M. Erez, "Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput," SIGARCH Computer Architecture News 6/2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. K. Jeong, D. H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez, "Balancing dram locality and parallelism in shared memory cmp systems," in Proceedings on the 18th High Performance Computer Architecture, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. Loh, D. McCauley, P. Morrow, D. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, and C. Webb, "Die Stacking (3D) Microarchitecture," in Proceedings of the 39th Int. Symposium on Microarchitecture, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. H. Loh, "3D-Stacked Memory Architectures for Multi Core Processors," SIGARCH Computer Architecture News, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Dhiman, R. Ayoub, and T. Rosing, "Pdram: A hybrid pram and dram main memory system," in Proceddings of the 46th Design Automation Conference (DAC), pp. 664--669, July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Zhao, R. Iyer, R. Illikkal, and D. Newell, "Exploring dram cache architectures for cmp server platforms," in Proceedings of the 25th ICCD, 2007.Google ScholarGoogle Scholar
  29. X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, D. Solihin, and R. Balasubramonian, "Chop: Adaptive filter-based dram caching for cmp server platforms," in Proceedings of the 16th High Performance Computer Architecture, 2010.Google ScholarGoogle Scholar
  1. Improving DRAM latency with dynamic asymmetric subarray

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
      December 2015
      787 pages
      ISBN:9781450340342
      DOI:10.1145/2830772

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 December 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      MICRO-48 Paper Acceptance Rate61of283submissions,22%Overall Acceptance Rate484of2,242submissions,22%

      Upcoming Conference

      MICRO '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader