ABSTRACT
This paper presents a hardware-software co-design for efficient sparse deep neural networks (DNNs) implementation in a regular systolic array for real-time on-device speech processing. The weight pruning format, exploring pattern-based coordinate-assisted (PICA) sparsity, expands the pattern-based pruning into both convolutional neural networks (CNNs) and recurrent neural networks (RNNs). It reduces the index storage overhead as well as avoids accuracy degradation. The proposed systolic accelerator leverages the intrinsic data reuse and locality to accommodate the PICA-based sparsity without using complex data distribution networks. It also supports DNNs with different topologies. By reducing the model size by 16x, PICA sparsification reduces 6.02x index storage overhead while still achieving 20.7% WER in TIMIT dataset. For the pruned WaveNet and LSTM, the accelerator achieves 0.62 and 2.69 TOPS/W energy efficiency, 1.7x to 10x higher than the state-of-the-art.
Supplemental Material
- A. Van Den Oord et. al., "Wavenet: A generative model for raw audio," arXiv preprint arXiv:1609.03499, 2016.Google Scholar
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," in Neural Computation, vol. 9, no. 8, pp. 1735--1780, 1997. doi: 10.1162/neco.1997.9.8.1735.Google ScholarDigital Library
- J. Valin and J. Skoglund, "LPCNET: Improving neural speech synthesis through linear prediction," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 5891--5895. doi: 10.1109/ICASSP.2019.8682804.Google Scholar
- A. Parashar et. al., "SCNN: An accelerator for compressed-sparse convolutional neural networks," in ACM/IEEE International Symposium on Computer Architecture (ISCA), 2017, pp. 27--40. doi: 10.1145/3079856.3080254.Google ScholarDigital Library
- H.T. Kung, B. McDanel and S.Q. Zhang, "Packing sparse convolutional neural networks for efficient systolic array implementations: column combining under joint optimization," in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2019, pp. 821--834. doi: 10.1145/3297858.3304028.Google ScholarDigital Library
- D. Kadetotad et. al., "An 8.93 TOPS/W LSTM recurrent neural network accelerator featuring hierarchical coarse-grain sparsity for on-device speech recognition," in IEEE Journal of Solid-State Circuits (JSSC), vol. 55, no. 7, pp. 1877--1887, 2020. doi: 10.1109/JSSC.2020.2992900.Google ScholarCross Ref
- W. Niu et. al., "PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning," in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020, PP. 907--922. doi: 10.1145/3373376.3378534.Google ScholarDigital Library
- J. Wang et. al., "High PE utilization CNN accelerator with channel fusion supporting pattern-compressed sparse neural networks," in ACM/IEEE Design Automation Conference (DAC), 2020, pp. 1--6. doi: 10.1109/DAC18072.2020.9218630.Google Scholar
- E. Qin et. al., "SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for DNN Ttraining," in IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, pp. 58--70. doi: 10.1109/HPCA47549.2020.00015.Google Scholar
- H. Kwon et. al., "Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects," in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2018, pp. 461--475. doi: 10.1145/3296957.3173176.Google ScholarDigital Library
- C. Chen et. al., "iFPNA: A flexible and efficient deep neural network accelerator with a programmable data flow engine in 28nm CMOS," in IEEE European Solid State Circuits Conference (ESSCIRC), 2018, pp. 170--173. doi: 10.1109/ESSCIRC.2018.8494327.Google Scholar
- B. Moons et. al., "Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy frequency-scalable convolutional neural network processor in 28 nm FDSOI," in IEEE International Solid-State Circuits Conference (ISSCC), 2017, pp. 246--247. doi: 10.1109/ISSCC.2017.7870353.Google Scholar
Index Terms
- Systolic-Array Deep-Learning Acceleration Exploring Pattern-Indexed Coordinate-Assisted Sparsity for Real-Time On-Device Speech Processing
Recommendations
Survey of Deep Learning Paradigms for Speech Processing
AbstractOver the past decades, a particular focus is given to research on machine learning techniques for speech processing applications. However, in the past few years, research has focused on using deep learning for speech processing applications. This ...
Hybrid wavelet-neural network models for time series
AbstractThe use of wavelet analysis contributes to better modeling for financial time series in the sense of both frequency and time. In this study, S&P500 and NASDAQ data are separated into several components utilizing multiresolution ...
Graphical abstractDisplay Omitted
Highlights- The study bridges the gap between hybrid models using MRA and hybrid models with WNN.
Multi-channel spectrograms for speech processing applications using deep learning methods
AbstractTime–frequency representations of the speech signals provide dynamic information about how the frequency component changes with time. In order to process this information, deep learning models with convolution layers can be used to obtain feature ...
Comments