ABSTRACT
Next-generation systems, such as edge devices, will have to provide efficient processing of machine learning (ML) algorithms, along with several metrics, including energy, performance, area, and latency. However, the quickly evolving field of ML makes it extremely difficult to generate accelerators able to support a wide variety of algorithms. Simultaneously, designing accelerators in hardware description languages (HDLs) by hand is laborious and time-consuming, and does not allow quick exploration of the design space. This paper discusses the SODA synthesizer, an automated open-source high-level ML framework-to-Verilog compiler targeting ML Application-Specific Integrated Circuits (ASICs) chiplets based on the LLVM infrastructure. The SODA synthesizers will allow implementing optimal designs by combining templated and fully tunable IPs and macros, and fully custom components generated through high-level synthesis. All these components will be provided through an extendable resource library, characterized by commercial and open-source logic design flows. Through a closed-loop design space exploration engine, developers will quickly explore their hardware designs along different dimensions.
- The LLVM Compiler Infrastructure. http://llvm.org/.Google Scholar
- Open neural network exchange format, January 2019.Google Scholar
- Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. Mlir: A compiler infrastructure for the end of moore's law, 2020.Google Scholar
- Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Q. Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. TVM: end-to-end optimization stack for deep learning. CoRR, abs/1802.04799, 2018.Google Scholar
- Jared Roesch, Steven Lyubomirsky, Logan Weber, Josh Pollock, Marisa Kirisame, Tianqi Chen, and Zachary Tatlock. Relay: A new ir for machine learning frameworks. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2018, page 58--68, New York, NY, USA, 2018. Association for Computing Machinery.Google ScholarDigital Library
- A. Izraelevitz, J. Koenig, P. Li, R. Lin, A. Wang, A. Magyar, D. Kim, C. Schmidt, C. Markley, J. Lawson, and J. Bachrach. Reusability is firrtl ground: Hardware construction languages, compiler frameworks, and transformations. In 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 209--216, Nov 2017.Google ScholarDigital Library
- Tutu Ajayi, Vidya A. Chhabria, Mateus Fogaça, Soheil Hashemi, Abdelrahman Hosny, Andrew B. Kahng, Minsoo Kim, Jeongsup Lee, Uday Mallappa, Marina Neseem, Geraldo Pradipta, Sherief Reda, Mehdi Saligane, Sachin S. Sapatnekar, Carl Sechen, Mohamed Shalan, William Swartz, Lutong Wang, Zhehong Wang, Mingyu Woo, and Bangqi Xu. Toward an open-source digital flow: First learnings from the openroad project. In Proceedings of the 56th Annual Design Automation Conference 2019, DAC '19, 2019.Google ScholarDigital Library
- Marco Ceriani, Fabrizio Ferrandi, Pier Luca Lanzi, Donatella Sciuto, and Antonino Tumeo. Multiprocessor systems-on-chip synthesis using multi-objective evolutionary computation. In GECCO 2010: the 12th Annual Conference on Genetic and Evolutionary Computation, pages 1267--1274, 2010.Google ScholarDigital Library
- F. Ferrandi, P. L. Lanzi, G. Palermo, C. Pilato, D. Sciuto, and A. Tumeo. An evolutionary approach to area-time optimization of fpga designs. In 2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pages 145--152, July 2007.Google ScholarCross Ref
- Christian Pilato, Antonino Tumeo, Gianluca Palermo, Fabrizio Ferrandi, Pier Luca Lanzi, and Donatella Sciuto. Improving evolutionary exploration to area-time optimization of fpga designs. Journal of Systems Architecture, 54(11):1046 -- 1057, 2008.Google ScholarDigital Library
- Fabrizio Ferrandi, Pier Luca Lanzi, Christian Pilato, Donatella Sciuto, and Antonino Tumeo. Ant colony heuristic for mapping and scheduling tasks and communications on heterogeneous embedded systems. IEEE Trans. on CAD of Integrated Circuits and Systems, 29(6):911--924, 2010.Google ScholarDigital Library
- F. Ferrandi, P. L. Lanzi, G. Palermo, C. Pilato, D. Sciuto, and A. Tumeo. An evolutionary approach to area-time optimization of fpga designs. In 2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pages 145--152, July 2007.Google ScholarCross Ref
- Fabrizio Ferrandi, Christian Pilato, Donatella Sciuto, and Antonino Tumeo. Mapping and scheduling of parallel C applications with ant colony optimization onto heterogeneous reconfigurable mpsocs. In ASP-DAC 2010: the 15th Asia South Pacific Design Automation Conference, pages 799--804, 2010.Google ScholarDigital Library
- Antonino Tumeo, Marco Branca, Lorenzo Camerini, Christian Pilato, Pier Luca Lanzi, Fabrizio Ferrandi, and Donatella Sciuto. Mapping pipelined applications onto heterogeneous embedded systems: A bayesian optimization algorithm based approach. In CODES+ISSS '09: 7th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis, pages 443--452, 2009.Google Scholar
- Max Willsey, Vincent T Lee, Alvin Cheung, Rastislav Bodík, and Luis Ceze. Iterative search for reconfigurable accelerator blocks with a compiler in the loop. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38(3):407--418, 2018.Google ScholarCross Ref
- J. Weng, S. Liu, V. Dadu, Z. Wang, P. Shah, and T. Nowatzki. Dsagen: Synthesizing programmable spatial accelerators. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 268--281, 2020.Google ScholarDigital Library
- Google Inc. Xla is a compiler that optimizes tensorflow computations, 2019.Google Scholar
- Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Summer Deng, Roman Dzhabarov, James Hegeman, Roman Levenstein, Bert Maher, Nadathur Satish, Jakob Olesen, Jongsoo Park, Artem Rakhov, and Misha Smelyanskiy. Glow: Graph lowering compiler techniques for neural networks. CoRR, abs/1805.00907, 2018.Google Scholar
- Jonathan Ragan-Kelley, Andrew Adams, Dillon Sharlet, Connelly Barnes, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. Halide: Decoupling algorithms from schedules for high-performance image processing. Commun. ACM, 61(1):106--115, December 2017.Google ScholarDigital Library
- Intel. Plaidml. available at: https://www.intel.com/content/www/us/en/artificial-intelligence/plaidml.html.Google Scholar
- Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. Pudiannao: A polyvalent machine learning accelerator. In ASPLOS '15: the 20th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 369--381, 2015.Google ScholarDigital Library
- D. Mahajan, J. Park, E. Amaro, H. Sharma, A. Yazdanbakhsh, J. K. Kim, and H. Esmaeilzadeh. Tabla: A unified template-based framework for accelerating statistical machine learning. In HPCA 2016: IEEE International Symposium on High Performance Computer Architecture, pages 14--26, 2016.Google ScholarCross Ref
- Y. Chen, J. Emer, and V. Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In ISCA 2016: ACM/IEEE 43rd Annual International Symposium on Computer Architecture, pages 367--379, 2016.Google ScholarDigital Library
- Luis A. Plana, David Clark, Simon Davidson, Steve Furber, Jim Garside, Eustace Painkras, Jeffrey Pepper, Steve Temple, and John Bainbridge. Spinnaker: Design and implementation of a gals multicore system-on-chip. J. Emerg. Technol. Comput. Syst., 7(4):17:1--17:18, December 2011.Google ScholarDigital Library
- R. Prabhakar, Y. Zhang, D. Koeplinger, M. Feldman, T. Zhao, S. Hadjis, A. Pedram, C. Kozyrakis, and K. Olukotun. Plasticine: A reconfigurable architecture for parallel patterns. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 389--402, 2017.Google ScholarDigital Library
- Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Nate Chizgi, Ross G Daly, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, Pat Hanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Taeyoung Kong, Zheng Liang, Qiaoyi Liu, Makai Mann, Zachary Alexander Myers, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, Stephen Richardson, Raj Setaluri, Jeff Setter, Daniel Stanley, Maxwell Strange, Charles Tsao, James Thomas, Leonard Truong, Xuan Yang, and Keyi Zhang. Creating an agile hardware flow. In 2019 IEEE Hot Chips 31 Symposium (HCS), 2019.Google Scholar
- Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices, 53(2):461--475, 2018.Google ScholarDigital Library
- H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. From high-level deep neural models to fpgas. In MICRO 2016: 49th Annual IEEE/ACM International Symposium on Microarchitecture, pages 1--12, 2016.Google ScholarCross Ref
- LegUp High-Level Synthesis. http://legup.eecg.utoronto.ca.Google Scholar
- PandA: on Open Source Framework for Hardware-Software Codesign. https://panda.dei.polimi.it.Google Scholar
Index Terms
- SODA: a new synthesis infrastructure for agile hardware design of machine learning accelerators
Recommendations
FPGA HLS Today: Successes, Challenges, and Opportunities
The year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it went from prototyping to deployment. A decade later, in this article, we assess the progress of the deployment of HLS technology and highlight the successes in several ...
SODA Synthesizer: An Open-Source, Multi-Level, Modular, Extensible Compiler from High-Level Frameworks to Silicon
ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided DesignThe SODA Synthesizer is an open-source, modular, end-to-end hardware compiler framework. The SODA frontend, developed in MLIR, performs system-level design, code partitioning, and high-level optimizations to prepare the specifications for the hardware ...
Automated Synthesis of Streaming Transfer Level Hardware Designs
As modern field-programmable gate arrays (FPGA) enable high computing performance and efficiency, their programming with low-level hardware description languages is time-consuming and remains a major obstacle to their adoption. High-level synthesis ...
Comments