research-article

Improving DRAM latency with dynamic asymmetric subarray

Authors:
Shih-Lien Lu

Intel Corp., Hillsboro, Oregon

Intel Corp., Hillsboro, Oregon
View Profile

,
Ying-Chen Lin

National Taiwan University, Taipei, Taiwan ROC

National Taiwan University, Taipei, Taiwan ROC
View Profile

,
Chia-Lin Yang

National Taiwan University, Taipei, Taiwan ROC

National Taiwan University, Taipei, Taiwan ROC
View Profile

MICRO-48: Proceedings of the 48th International Symposium on MicroarchitectureDecember 2015Pages 255–266https://doi.org/10.1145/2830772.2830827

Published:05 December 2015Publication History

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

Pages 255–266

ABSTRACT

The evolution of DRAM technology has been driven by capacity and bandwidth during the last decade. In contrast, DRAM access latency stays relatively constant and is trending to increase. Much efforts have been devoted to tolerate memory access latency but these techniques have reached the point of diminishing returns. Having shorter bitline and wordline length in a DRAM device will reduce the access latency. However by doing so it will impact the array efficiency. In the mainstream market, manufacturers are not willing to trade capacity for latency. Prior works had proposed hybrid-bitline DRAM design to overcome this problem. However, those methods are either intrusive to the circuit and layout of the DRAM design, or there is no direct way to migrate data between the fast and slow levels.

In this paper, we proposed a novel asymmetric DRAM with capability to perform low cost data migration between subarrays. Having this design we determined a simple management mechanism and explored many management related policies. We showed that with this new design and our simple management technique we could achieve 7.25% and 11.77% performance improvement in single- and multi-programming workloads, respectively, over a system with traditional homogeneous DRAM. This gain is above 80% of the potential performance gain of a system based on a hypothetical DRAM which is made out of short bitlines entirely.

References

T.-Y. Oh, H. Chung, Y.-C. Cho, J.-W. Ryu, K. Lee, C. Lee, J.-I. Lee, H.-J. Kim, M. S. Jang, G.-H. Han, K. Kim, D. Moon, S. Bae, J.-Y. Park, K.-S. Ha, J. Lee, S.-Y. Doo, J.-B. Shin, C.-H. Shin, K. Oh, D. Hwang, T. Jang, C. Park, K. Park, J.-B. Lee, and J. S. Choi, "A 3.2Gb/s/pin 8Gb 1.0V LPDDR4 SDRAM with integrated ECC engine for sub-1V DRAM core operation," in Proceedings of Int. Solid-State Circuits Conf., 2014.Google Scholar
R. D. Williams, T. Sze, D. Huang, S. Pannala, and C. Fang, "Server memory road map," 2012. http://www.jedec.org/sites/default/files/Ricki_Dee_Williams-Final_0.pdf.Google Scholar
Micron, "RLDRAM 2 SIO," 2004. http://www.micron.com/-/media/documents/products/data%20sheet/dram/576mb_rldram_2_sio.pdf.Google Scholar
Y. H. Son, O. Seongil, Y. Ro, J. W. Lee, and J. H. Ahn, "Reducing memory access latency with asymmetric dram bank organizations," in Proceedings of the 40th International Symposium on Computer Architecture, 2013. Google ScholarDigital Library
D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, "Tiered-latency dram: A low latency and low cost dram architecture," in Proceedings of the 19th Symp. on High Performance Computer Architecture, 2013. Google ScholarDigital Library
V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "RowClone: Fast and Energy-efficient in-DRAM Bulk Data Copy and Initialization," in Proceedings of the 46th International Symposium on Microarchitecture, 2013. Google ScholarDigital Library
T. Vogelsang, "Understanding the energy consumption of dynamic random access memories," in Proceedings of the 43rd Int. Symp. on Microarchitecture, 2010. Google ScholarDigital Library
T. Takahashi, T. Sekiguchi, R. Takemura, S. Narui, H. Fujisawa, S. Miyatake, M. Morino, K. Arai, S. Yamada, S. Shukuri, M. Nakamura, Y. Tadaki, K. Kajigaya, K. Kimura, and B. Kiyoo Itoh, "A multigigabit dram technology with 6f2 open-bitline cell, distributed overdriven sensing, and stacked-ash fuse," IEEE JSSCC, Nov 2001.Google Scholar
Y. Sato, T. Suzuki, T. Aikawa, S. Fujioka, W. Fujieda, H. Kobayashi, H. Ikeda, T. Nagasawa, A. Funyu, Y. Fuji, K. Kawasaki, M. Yamazaki, and M. Taguchi, "Fast cycle ram (fcram); a 20-ns random row access, pipe-lined operating dram," in Proceedings of the Symp. on VLSI Circuits, 1998.Google Scholar
J. T. Pawlowski, "Hybrid memory cube (HMC)," in Hotchips 2011. http://www.hotchips.org/wp-content/uploads/hc_archives/hc23/HC23.18.3-memoryFPGA/HC23.18.320-HybridCube-Pawlowski-Micron.pdf.Google Scholar
T. Schloesser, F. Jakubowski, J. v.Kluge, A. Graham, S. Slesazeck, M. Popp, P. Baars, K. Muemmler, P. Moll, K. Wilson, A. Buerke, D. Koehler, J. Radecker, E. Erben, U. Zimmermann, T. Vorrath, B. Fischer, G. Aichmayr, R. Agaiby, W. Pamler, T. Schuster, W. Bergner, and W. Mueller, "6f2 buried wordline dram cell for 40nm and beyond," in Proceedings of IEDM, pp. 1--4, Dec 2008.Google Scholar
A. Kotabe, Y. Yanagawa, R. Takemura, T. Sekiguchi, and B. Kiyoo Itoh, "Asymmetric cross-coupled sense amplifier for small-sized 0.5-v gigabit-dram arrays," in Proceedings of the Asian Solid State Circuits Conference (A-SSCC), 2010.Google Scholar
T. Schloesser, F. Jakubowski, J. v.Kluge, A. Graham, S. Slesazeck, M. Popp, P. Baars, K. Muemmler, P. Moll, K. Wilson, A. Buerke, D. Koehler, J. Radecker, E. Erben, U. Zimmermann, T. Vorrath, B. Fischer, G. Aichmayr, R. Agaiby, W. Pamler, T. Schuster, W. Bergner, and W. Mueller, "6f2 buried wordline dram cell for 40nm and beyond," in Proceedings of Int. Elec. Devices Meeting, 2008.Google Scholar
A. Patel, F. Afram, S. Chen, and K. Ghose, "MARSS: A Full System Simulator for Multicore x86 CPUs," in Proceedings of the 48th DAC, 2011. Google ScholarDigital Library
A. Jaleel, "Memory characterization of workloads using instrumentation-driven simulation," 2010. http://www.glue.umd.edu/ajaleel/workload.Google Scholar
Samsung, "2Gb D-die DDR3 SDRAM," 2011. http://www.samsung.com/global/business/semiconductor/file/2011/product/2011/8/29/729200ds_k4b2gxx46d_rev113.pdf.Google Scholar
Renesas, "1.1G-BIT Low Latency DRAM-III," 2013. http://documentation.renesas.com/doc/products/memory/r10ds0012ej0200_memory.pdf.Google Scholar
Micron, "2gb: x4, x8, x16 ddr2 sdram," 2006. http://www.micron.com/-/media/documents/products/data%20sheet/dram/ddr2/2gb_ddr2.pdf.Google Scholar
A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. P. Jouppi, "Rethinking dram design and organization for energy-constrained multi-cores," in Proceedings of the 37th Annual Int. Symposium on Computer Architecture, 2010. Google ScholarDigital Library
Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, "A case for exploiting subarray-level parallelism (salp) in dram," in Proceedings of the 39th International Symposium on Computer Architecture, 2012. Google ScholarDigital Library
F. Ware and C. Hampel, "Improving power and data efficiency with threaded memory modules," in Proceeding of the Int. Conf. on Computer Design (ICCD), 2006.Google Scholar
H. Zheng, J. Lin, Z. Zhang, E. Gorbatov, H. David, and Z. Zhu, "Mini-rank: Adaptive dram architecture for improving memory power efficiency," in Proceedings of the 41th International Symposium on Microarchitecture, 2008. Google ScholarDigital Library
D. H. Yoon, M. K. Jeong, and M. Erez, "Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput," SIGARCH Computer Architecture News 6/2011. Google ScholarDigital Library
M. K. Jeong, D. H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez, "Balancing dram locality and parallelism in shared memory cmp systems," in Proceedings on the 18th High Performance Computer Architecture, 2012. Google ScholarDigital Library
B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. Loh, D. McCauley, P. Morrow, D. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, and C. Webb, "Die Stacking (3D) Microarchitecture," in Proceedings of the 39th Int. Symposium on Microarchitecture, 2006. Google ScholarDigital Library
G. H. Loh, "3D-Stacked Memory Architectures for Multi Core Processors," SIGARCH Computer Architecture News, 2008. Google ScholarDigital Library
G. Dhiman, R. Ayoub, and T. Rosing, "Pdram: A hybrid pram and dram main memory system," in Proceddings of the 46th Design Automation Conference (DAC), pp. 664--669, July 2009. Google ScholarDigital Library
L. Zhao, R. Iyer, R. Illikkal, and D. Newell, "Exploring dram cache architectures for cmp server platforms," in Proceedings of the 25th ICCD, 2007.Google Scholar
X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, D. Solihin, and R. Balasubramonian, "Chop: Adaptive filter-based dram caching for cmp server platforms," in Proceedings of the 16th High Performance Computer Architecture, 2010.Google Scholar

Improving DRAM latency with dynamic asymmetric subarray
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

VRL-DRAM: Improving DRAM Performance via Variable Refresh Latency
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
A DRAM chip requires periodic refresh operations to prevent data loss due to charge leakage in DRAM cells. Refresh operations incur significant performance overhead as a DRAM bank/rank becomes unavailable to service access requests while being refreshed. ...
Read More
Achieving DRAM-Like PCM by Trading Off Capacity for Latency
Phase Change Memory (PCM) is considered one of the most promising scalable non-volatile main memory alternatives to DRAM. It provides <inline-formula><tex-math notation="LaTeX">$\sim$</tex-math><alternatives><mml:math><mml:mo>∼</mml:mo></mml:math><...
Read More
Tiered-latency DRAM: A low latency and low cost DRAM architecture
HPCA '13: Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)

The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM latency has remained almost constant, making memory latency the performance bottleneck in today's ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
December 2015
787 pages
ISBN:9781450340342
DOI:10.1145/2830772
General Chair:
Milos Prvulovic
Georgia Tech
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 December 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
MICRO-48 Paper Acceptance Rate61of283submissions,22%Overall Acceptance Rate484of2,242submissions,22%
More
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 504
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving DRAM latency with dynamic asymmetric subarray

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Recommendations

VRL-DRAM: Improving DRAM Performance via Variable Refresh Latency

Achieving DRAM-Like PCM by Trading Off Capacity for Latency

Tiered-latency DRAM: A low latency and low cost DRAM architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving DRAM latency with dynamic asymmetric subarray

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Recommendations

VRL-DRAM: Improving DRAM Performance via Variable Refresh Latency

Achieving DRAM-Like PCM by Trading Off Capacity for Latency

Tiered-latency DRAM: A low latency and low cost DRAM architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media