skip to main content
10.1145/3453688.3461530acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

Systolic-Array Deep-Learning Acceleration Exploring Pattern-Indexed Coordinate-Assisted Sparsity for Real-Time On-Device Speech Processing

Published:22 June 2021Publication History

ABSTRACT

This paper presents a hardware-software co-design for efficient sparse deep neural networks (DNNs) implementation in a regular systolic array for real-time on-device speech processing. The weight pruning format, exploring pattern-based coordinate-assisted (PICA) sparsity, expands the pattern-based pruning into both convolutional neural networks (CNNs) and recurrent neural networks (RNNs). It reduces the index storage overhead as well as avoids accuracy degradation. The proposed systolic accelerator leverages the intrinsic data reuse and locality to accommodate the PICA-based sparsity without using complex data distribution networks. It also supports DNNs with different topologies. By reducing the model size by 16x, PICA sparsification reduces 6.02x index storage overhead while still achieving 20.7% WER in TIMIT dataset. For the pruned WaveNet and LSTM, the accelerator achieves 0.62 and 2.69 TOPS/W energy efficiency, 1.7x to 10x higher than the state-of-the-art.

Skip Supplemental Material Section

Supplemental Material

glsv118p_video.mp4

mp4

184.5 MB

References

  1. A. Van Den Oord et. al., "Wavenet: A generative model for raw audio," arXiv preprint arXiv:1609.03499, 2016.Google ScholarGoogle Scholar
  2. S. Hochreiter and J. Schmidhuber, "Long short-term memory," in Neural Computation, vol. 9, no. 8, pp. 1735--1780, 1997. doi: 10.1162/neco.1997.9.8.1735.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Valin and J. Skoglund, "LPCNET: Improving neural speech synthesis through linear prediction," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 5891--5895. doi: 10.1109/ICASSP.2019.8682804.Google ScholarGoogle Scholar
  4. A. Parashar et. al., "SCNN: An accelerator for compressed-sparse convolutional neural networks," in ACM/IEEE International Symposium on Computer Architecture (ISCA), 2017, pp. 27--40. doi: 10.1145/3079856.3080254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H.T. Kung, B. McDanel and S.Q. Zhang, "Packing sparse convolutional neural networks for efficient systolic array implementations: column combining under joint optimization," in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2019, pp. 821--834. doi: 10.1145/3297858.3304028.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Kadetotad et. al., "An 8.93 TOPS/W LSTM recurrent neural network accelerator featuring hierarchical coarse-grain sparsity for on-device speech recognition," in IEEE Journal of Solid-State Circuits (JSSC), vol. 55, no. 7, pp. 1877--1887, 2020. doi: 10.1109/JSSC.2020.2992900.Google ScholarGoogle ScholarCross RefCross Ref
  7. W. Niu et. al., "PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning," in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020, PP. 907--922. doi: 10.1145/3373376.3378534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Wang et. al., "High PE utilization CNN accelerator with channel fusion supporting pattern-compressed sparse neural networks," in ACM/IEEE Design Automation Conference (DAC), 2020, pp. 1--6. doi: 10.1109/DAC18072.2020.9218630.Google ScholarGoogle Scholar
  9. E. Qin et. al., "SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for DNN Ttraining," in IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, pp. 58--70. doi: 10.1109/HPCA47549.2020.00015.Google ScholarGoogle Scholar
  10. H. Kwon et. al., "Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects," in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2018, pp. 461--475. doi: 10.1145/3296957.3173176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Chen et. al., "iFPNA: A flexible and efficient deep neural network accelerator with a programmable data flow engine in 28nm CMOS," in IEEE European Solid State Circuits Conference (ESSCIRC), 2018, pp. 170--173. doi: 10.1109/ESSCIRC.2018.8494327.Google ScholarGoogle Scholar
  12. B. Moons et. al., "Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy frequency-scalable convolutional neural network processor in 28 nm FDSOI," in IEEE International Solid-State Circuits Conference (ISSCC), 2017, pp. 246--247. doi: 10.1109/ISSCC.2017.7870353.Google ScholarGoogle Scholar

Index Terms

  1. Systolic-Array Deep-Learning Acceleration Exploring Pattern-Indexed Coordinate-Assisted Sparsity for Real-Time On-Device Speech Processing

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSI
            June 2021
            504 pages
            ISBN:9781450383936
            DOI:10.1145/3453688

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 22 June 2021

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate312of1,156submissions,27%

            Upcoming Conference

            GLSVLSI '24
            Great Lakes Symposium on VLSI 2024
            June 12 - 14, 2024
            Clearwater , FL , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader