research-article

Systolic-Array Deep-Learning Acceleration Exploring Pattern-Indexed Coordinate-Assisted Sparsity for Real-Time On-Device Speech Processing

Authors:
Shiwei Liu

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Zihao Zhao

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Yanhong Wang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Qiaosha Zou

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Yiyun Zhang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
C- J. Richard Shi

University of Washington, Seattle, WA, USA

University of Washington, Seattle, WA, USA
View Profile

GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSIJune 2021Pages 353–358https://doi.org/10.1145/3453688.3461530

Published:22 June 2021Publication History

GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSI

Pages 353–358

ABSTRACT

This paper presents a hardware-software co-design for efficient sparse deep neural networks (DNNs) implementation in a regular systolic array for real-time on-device speech processing. The weight pruning format, exploring pattern-based coordinate-assisted (PICA) sparsity, expands the pattern-based pruning into both convolutional neural networks (CNNs) and recurrent neural networks (RNNs). It reduces the index storage overhead as well as avoids accuracy degradation. The proposed systolic accelerator leverages the intrinsic data reuse and locality to accommodate the PICA-based sparsity without using complex data distribution networks. It also supports DNNs with different topologies. By reducing the model size by 16x, PICA sparsification reduces 6.02x index storage overhead while still achieving 20.7% WER in TIMIT dataset. For the pruned WaveNet and LSTM, the accelerator achieves 0.62 and 2.69 TOPS/W energy efficiency, 1.7x to 10x higher than the state-of-the-art.

Supplemental Material

glsv118p_video.mp4

mp4

184.5 MB

Download

References

A. Van Den Oord et. al., "Wavenet: A generative model for raw audio," arXiv preprint arXiv:1609.03499, 2016.Google Scholar
S. Hochreiter and J. Schmidhuber, "Long short-term memory," in Neural Computation, vol. 9, no. 8, pp. 1735--1780, 1997. doi: 10.1162/neco.1997.9.8.1735.Google ScholarDigital Library
J. Valin and J. Skoglund, "LPCNET: Improving neural speech synthesis through linear prediction," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 5891--5895. doi: 10.1109/ICASSP.2019.8682804.Google Scholar
A. Parashar et. al., "SCNN: An accelerator for compressed-sparse convolutional neural networks," in ACM/IEEE International Symposium on Computer Architecture (ISCA), 2017, pp. 27--40. doi: 10.1145/3079856.3080254.Google ScholarDigital Library
H.T. Kung, B. McDanel and S.Q. Zhang, "Packing sparse convolutional neural networks for efficient systolic array implementations: column combining under joint optimization," in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2019, pp. 821--834. doi: 10.1145/3297858.3304028.Google ScholarDigital Library
D. Kadetotad et. al., "An 8.93 TOPS/W LSTM recurrent neural network accelerator featuring hierarchical coarse-grain sparsity for on-device speech recognition," in IEEE Journal of Solid-State Circuits (JSSC), vol. 55, no. 7, pp. 1877--1887, 2020. doi: 10.1109/JSSC.2020.2992900.Google ScholarCross Ref
W. Niu et. al., "PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning," in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020, PP. 907--922. doi: 10.1145/3373376.3378534.Google ScholarDigital Library
J. Wang et. al., "High PE utilization CNN accelerator with channel fusion supporting pattern-compressed sparse neural networks," in ACM/IEEE Design Automation Conference (DAC), 2020, pp. 1--6. doi: 10.1109/DAC18072.2020.9218630.Google Scholar
E. Qin et. al., "SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for DNN Ttraining," in IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, pp. 58--70. doi: 10.1109/HPCA47549.2020.00015.Google Scholar
H. Kwon et. al., "Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects," in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2018, pp. 461--475. doi: 10.1145/3296957.3173176.Google ScholarDigital Library
C. Chen et. al., "iFPNA: A flexible and efficient deep neural network accelerator with a programmable data flow engine in 28nm CMOS," in IEEE European Solid State Circuits Conference (ESSCIRC), 2018, pp. 170--173. doi: 10.1109/ESSCIRC.2018.8494327.Google Scholar
B. Moons et. al., "Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy frequency-scalable convolutional neural network processor in 28 nm FDSOI," in IEEE International Solid-State Circuits Conference (ISSCC), 2017, pp. 246--247. doi: 10.1109/ISSCC.2017.7870353.Google Scholar

Index Terms

Systolic-Array Deep-Learning Acceleration Exploring Pattern-Indexed Coordinate-Assisted Sparsity for Real-Time On-Device Speech Processing
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
    2. Parallel architectures
      1. Systolic arrays
2. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Hardware-software codesign
  2. Very large scale integration design
    1. Application-specific VLSI designs
      1. Application specific processors

Recommendations

Survey of Deep Learning Paradigms for Speech Processing
Abstract
Over the past decades, a particular focus is given to research on machine learning techniques for speech processing applications. However, in the past few years, research has focused on using deep learning for speech processing applications. This ...
Read More
Hybrid wavelet-neural network models for time series
Abstract
The use of wavelet analysis contributes to better modeling for financial time series in the sense of both frequency and time. In this study, S&P500 and NASDAQ data are separated into several components utilizing multiresolution ...
Graphical abstract

Display Omitted
Highlights
- The study bridges the gap between hybrid models using MRA and hybrid models with WNN.
Read More
Multi-channel spectrograms for speech processing applications using deep learning methods
Abstract
Time–frequency representations of the speech signals provide dynamic information about how the frequency component changes with time. In order to process this information, deep learning models with convolution layers can be used to obtain feature ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSI
June 2021
504 pages
ISBN:9781450383936
DOI:10.1145/3453688
General Chairs:
Yiran Chen
Duke University, USA
,
Victor Zhirnov
Semiconductor Research Corporation, USA
,
Program Chairs:
Avesta Sasan
George Mason University, USA
,
Ioannis Savidis
Drexel University, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
lstm
pattern-based coordinate-assisted sparsification
speech processing
systolic array accelerator
wavenet
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate312of1,156submissions,27%
Upcoming Conference
GLSVLSI '24

Sponsor:

sigda

Great Lakes Symposium on VLSI 2024

June 12 - 14, 2024

Clearwater , FL , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 206
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Systolic-Array Deep-Learning Acceleration Exploring Pattern-Indexed Coordinate-Assisted Sparsity for Real-Time On-Device Speech Processing

GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSI

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Survey of Deep Learning Paradigms for Speech Processing

Hybrid wavelet-neural network models for time series

Multi-channel spectrograms for speech processing applications using deep learning methods

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Systolic-Array Deep-Learning Acceleration Exploring Pattern-Indexed Coordinate-Assisted Sparsity for Real-Time On-Device Speech Processing

GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSI

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Survey of Deep Learning Paradigms for Speech Processing

Hybrid wavelet-neural network models for time series

Multi-channel spectrograms for speech processing applications using deep learning methods

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media