ABSTRACT
Machine learning (ML) and deep learning algorithms are well suited to process and analyze large amounts of data, as it has been repeatedly proven in applications such as image classification, natural language processing, or recommendation systems. Both ML training and inference are compute- and memory-intensive, leading to widespread adoption of heterogeneous systems containing specialized accelerators. While graphic processing units (GPUs) are the established platform of choice to accelerate training, they are often too power-hungry to run inference tasks, or cannot meet the strict latency requirements of scientific experiments. A variety of custom solutions implemented as field programmable gate arrays (FPGAs) or application-specific circuit (ASICs) have been proposed in their place, ranging from generic "neural processors" to accelerators that focus on a narrow set of models with great efficiency.
- M. Blott et al. 2018. FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 11, 3 (2018), 1--23.Google ScholarDigital Library
- S. Curzel et al. 2021. Automated Generation of Integrated Digital and Spiking Neuromorphic Machine Learning Accelerators. In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). 1--7.Google Scholar
- S. Curzel et al. 2021. De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml. arXiv:2103.13060 (2021).Google Scholar
- J. Duarte et al. 2018. Fast inference of deep neural networks in FPGAs for particle physics. Journal of Instrumentation 13, 07 (2018), P07027.Google ScholarCross Ref
- F. Ferrandi et al. 2021. Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications. In DAC 2021: 58th ACM/IEEE Design Automation Conference. 1327--1330.Google ScholarDigital Library
- C. Lattner et al. 2020. MLIR: A Compiler Infrastructure for the End of Moore's Law. arXiv:2002.11054 (2020).Google Scholar
- J. Zhang et al. 2021. Towards Automatic and Agile AI/ML Accelerator Design with End-to-End Synthesis. In IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE.Google Scholar
Index Terms
- Hardware acceleration of complex machine learning models through modern high-level synthesis
Recommendations
From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded SystemsEmbedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
Hardware and software infrastructure to implement many-core systems in modern FPGAs
SBCCI '17: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the SandsMany-core systems are increasingly popular in embedded systems due to their high-performance and flexibility to execute different workloads. These many-core systems provide a rich processing fabric but lack the flexibility to accelerate critical ...
Bit-level optimization for high-level synthesis and FPGA-based acceleration
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arraysAutomated hardware design from behavior-level abstraction has drawn wide interest in FPGA-based acceleration and configurable computing research field. However, for many high-level programming languages, such as C/C++, the description of bitwise access ...
Comments