research-article

Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication

Authors:
Miao Hu

Hewlett Packard Laboratories, Palo Alto, CA

Hewlett Packard Laboratories, Palo Alto, CA
View Profile

,
John Paul Strachan

Hewlett Packard Laboratories, Palo Alto, CA

Hewlett Packard Laboratories, Palo Alto, CA
View Profile

,
Zhiyong Li

Hewlett Packard Laboratories, Palo Alto, CA

Hewlett Packard Laboratories, Palo Alto, CA
View Profile

,
Emmanuelle M. Grafals

Hewlett Packard Laboratories, Palo Alto, CA

Hewlett Packard Laboratories, Palo Alto, CA
View Profile

,
Noraica Davila

Hewlett Packard Laboratories, Palo Alto, CA

Hewlett Packard Laboratories, Palo Alto, CA
View Profile

,
Catherine Graves

Hewlett Packard Laboratories, Palo Alto, CA

Hewlett Packard Laboratories, Palo Alto, CA
View Profile

,
Sity Lam

Hewlett Packard Laboratories, Palo Alto, CA

Hewlett Packard Laboratories, Palo Alto, CA
View Profile

,
Ning Ge

HP Inc, Palo Alto, CA

HP Inc, Palo Alto, CA
View Profile

,
Jianhua Joshua Yang

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

,
R. Stanley Williams

Hewlett Packard Laboratories, Palo Alto, CA

Hewlett Packard Laboratories, Palo Alto, CA
View Profile

DAC '16: Proceedings of the 53rd Annual Design Automation ConferenceJune 2016Article No.: 19Pages 1–6https://doi.org/10.1145/2897937.2898010

Published:05 June 2016Publication History

DAC '16: Proceedings of the 53rd Annual Design Automation Conference

Pages 1–6

ABSTRACT

Vector-matrix multiplication dominates the computation time and energy for many workloads, particularly neural network algorithms and linear transforms (e.g, the Discrete Fourier Transform). Utilizing the natural current accumulation feature of memristor crossbar, we developed the Dot-Product Engine (DPE) as a high density, high power efficiency accelerator for approximate matrix-vector multiplication. We firstly invented a conversion algorithm to map arbitrary matrix values appropriately to memristor conductances in a realistic crossbar array, accounting for device physics and circuit issues to reduce computational errors. The accurate device resistance programming in large arrays is enabled by close-loop pulse tuning and access transistors. To validate our approach, we simulated and benchmarked one of the state-of-the-art neural networks for pattern recognition on the DPEs. The result shows no accuracy degradation compared to software approach (99 % pattern recognition accuracy for MNIST data set) with only 4 Bit DAC/ADC requirement, while the DPE can achieve a speed-efficiency product of 1,000× to 10,000× compared to a custom digital ASIC.

References

S. K. Hsu et al., "A 280 mv-to-1.1 v 256b reconfigurable simd vector permutation engine with 2-dimensional shuffle in 22 nm tri-gate cmos," IEEE JSSC, vol. 48, no. 1, pp. 118--127, 2013.Google Scholar
J. J. Yang et al., "Memristive devices for computing," Nature nanotechnology, vol. 8, no. 1, pp. 13--24, 2013.Google ScholarCross Ref
M. Hu et al., "Hardware realization of bsb recall function using memristor crossbar arrays," in DAC. ACM, 2012, pp. 498--503. Google ScholarDigital Library
K. Fatahalian et al., "Understanding the efficiency of gpu algorithms for matrix-matrix multiplication," in ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware. ACM, 2004, pp. 133--137. Google ScholarDigital Library
P. Gu et al., "Technological exploration of rram crossbar array for matrix-vector multiplication," in ASP-DAC. IEEE, 2015, pp. 106--111.Google Scholar
G. Burr et al., "Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element," in IEEE IEDM. IEEE, 2014, pp. 29--5.Google ScholarCross Ref
B. Liu et al., "Vortex: variation-aware training for memristor x-bar," in DAC. ACM, 2015, p. 15. Google ScholarDigital Library
M. Prezioso et al., "Training and operation of an integrated neuromorphic network based on metal-oxide memristors," Nature, vol. 521, no. 7550, pp. 61--64, 2015.Google ScholarCross Ref
M. Hu et al., "Memristor crossbar-based neuromorphic computing system: A case study," IEEE TNNLS, vol. 25, no. 10, pp. 1864--1878, 2014.Google Scholar
R. Salakhutdinov and G. E. Hinton, "Learning a nonlinear embedding by preserving class neighbourhood structure," in ICAIS, 2007, pp. 412--419.Google Scholar
Y. Y. Chen et al., "Endurance/retention trade-off on cap 1t1r bipolar rram," TED, vol. 60, no. 3, pp. 1114--1121, 2013.Google ScholarCross Ref
H.-S. P. Wong et al., "Metal--oxide rram," Proceedings of the IEEE, vol. 100, no. 6, pp. 1951--1970, 2012.Google ScholarCross Ref
S. Jo et al., "Nanoscale Memristor Device as Synapse in Neuromorphic Systems," Nano Letter, vol. 10, no. 4, pp. 1297--1301, 2010.Google ScholarCross Ref
M. Tarkov, "Mapping weight matrix of a neural network?s layer onto memristor crossbar," Optical Memory and Neural Networks, vol. 24, no. 2, pp. 109--115, 2015. Google ScholarDigital Library
S. Choi et al., "Data clustering using memristor networks," Scientific Reports, vol. 5, 2015.Google Scholar
F. Alibart et al., "High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm," Nanotechnology, vol. 23, no. 7, p. 075201, 2012.Google ScholarCross Ref
S. Choi et al., "Random telegraph noise and resistance switching analysis of oxide based resistive memory," Nanoscale, vol. 6, no. 1, pp. 400--404, 2014.Google ScholarCross Ref
X. Dong et al., "Pcramsim: System-level performance, energy, and area modeling for phase-change ram," in ICCAD. ACM, 2009, pp. 269--275. Google ScholarDigital Library
S.-S. Sheu et al., "A 4mb embedded slc resistive-ram macro with 7.2 ns read-write random-access time and 160ns mlc-access capability," in IEEE ISSCC, 2011, pp. 200--202.Google Scholar

Recommendations

Computing discrete transforms on the Cell Broadband Engine

Discrete transforms are of primary importance and fundamental kernels in many computationally intensive scientific applications. In this paper, we investigate the performance of two such algorithms; Fast Fourier Transform (FFT) and Discrete Wavelet ...
Read More
Accelerating computing with the cell broadband engine processor
CF '08: Proceedings of the 5th conference on Computing frontiers

In this paper, we describe our approach to utilizing the compute power of the Cell Broadband Engine™ (Cell/B.E.)¹ processor as an accelerator for computationally intensive portions of high performance computing applications. We call this approach "...
Read More
Multi-functional floating-point MAF designs with dot product support

This paper presents multi-functional double-precision and quadruple-precision floating-point multiply-add fused (FPMAF) designs. The double-precision FPMAF design can execute adouble-precision floating-point multiply-add, or two single-precision ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAC '16: Proceedings of the 53rd Annual Design Automation Conference
June 2016
1048 pages
ISBN:9781450342360
DOI:10.1145/2897937

Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 477
  Total Citations
  View Citations
- 2,180
  Total Downloads
- Downloads (Last 12 months)446
- Downloads (Last 6 weeks)68
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication

DAC '16: Proceedings of the 53rd Annual Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

Computing discrete transforms on the Cell Broadband Engine

Accelerating computing with the cell broadband engine processor

Multi-functional floating-point MAF designs with dot product support

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication

DAC '16: Proceedings of the 53rd Annual Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

Computing discrete transforms on the Cell Broadband Engine

Accelerating computing with the cell broadband engine processor

Multi-functional floating-point MAF designs with dot product support

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media