2L-3W: 2-Level 3-Way Hardware–Software Co-verification for the Mapping of Convolutional Neural Network (CNN) onto FPGA Boards

Odetola, Tolulope A.; Groves, Katie M.; Mohammed, Yousufuddin; Khalid, Faiq; Hasan, Syed Rafay

doi:10.1007/s42979-021-00954-5

2L-3W: 2-Level 3-Way Hardware–Software Co-verification for the Mapping of Convolutional Neural Network (CNN) onto FPGA Boards

Original Research
Published: 11 November 2021

Volume 3, article number 60, (2022)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Tolulope A. Odetola ORCID: orcid.org/0000-0002-6199-6249¹,
Katie M. Groves¹,
Yousufuddin Mohammed¹,
Faiq Khalid² &
…
Syed Rafay Hasan¹

400 Accesses
3 Citations
Explore all metrics

Abstract

FPGAs have become a popular choice for deploying Convolutional Neural Networks (CNNs). As a result, many researchers have explored the deployment and mapping of CNN on FPGA. However, the verification of these deployments at the design time is one of the biggest challenges. The need for design-time verification is growing exponentially because of its use in safety-critical applications. To the best of our knowledge, this is the first work that proposes a 2-Level 3-Way (2L-3W) hardware–software co-verification methodology at design time. 2L-3W provides a step-by-step guide for the successful mapping, deployment, and verification of CNN on FPGA boards. The 2-Level verification serves the purpose of ensuring the implementation in each stage (software and hardware) is following the desired behavior. The 3-Way co-verification provides a cross-paradigm (software, design architecture, and hardware) layer-by-layer parameter check to assure the correct implementation and mapping of the CNNs onto FPGA boards. The proposed 2L-3W co-verification methodology has been evaluated over several test cases. In each case, the prediction and layer-by-layer output of the CNN deployed on the PYNQ FPGA board (hardware), intermediate design results of the layer-by-layer output of the CNN implemented on Vivado HLS, and the prediction and layer-by-layer output of the software level (Caffe) are compared to obtain a similarity score with a Python script. The comparison provides the degree of success of the CNN mapping to the FPGA and helps identify in design time the layer to be debugged in the case of unsuccessful mapping. We demonstrated our technique on LeNet CNN and LeNet-3D CNN (a Caffe-inspired network for the Cifar10 dataset), and the co-verification results yielded layer-by-layer similarity scores of 99% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 16

Fig. 19

Towards Beneficial Hardware Acceleration in HAVEN: Evaluation of Testbed Architectures

A Decomposition Workflow for Integrated Circuit Verification and Validation

Article 02 January 2020

How to Train Accurate BNNs for Embedded Systems?

References

Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170 (2015). ACM.
Odetola TA, Oderhohwo O, Hasan SR. A scalable multilabel classification to deploy deep learning architectures for edge devices. arXiv preprint 2019. http://arxiv.org/abs/1911.02098.
Wang C, Gong L, Yu Q, Li X, Xie Y, Zhou X. DLAU: a scalable deep learning accelerator unit on FPGA. IEEE Trans Comput Aided Des Integr Circuits Syst. 2017;36(3):513–7.
Google Scholar
Odetola TA, Mohammed HR, Hasan, SR. A stealthy hardware trojan exploiting the architectural vulnerability of deep learning architectures: Input interception attack (iia). arXiv preprint 2019. http://arxiv.org/abs/1911.00783.
Bacis M, Natale G, Del Sozzo E, Santambrogio MD. A pipelined and scalable dataflow implementation of convolutional neural networks on fpga. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 90–97 (2017). IEEE.
Hailesellasie MT, Hasan SR. Mulnet: a flexible CNN processor with higher resource utilization efficiency for constrained devices. IEEE Access. 2019;7:47509–24.
Article Google Scholar
Guo K, Sui L, Qiu J, Yu J, Wang J, Yao S, Han S, Wang Y, Yang H. Angel-Eye: a complete design flow for mapping CNN onto embedded FPGA. IEEE Trans Comput Aided Des Integr Circuits Syst. 2017;37(1):35–47.
Article Google Scholar
Park J, Sung W. FPGA based implementation of deep neural networks using on-chip memory only. In: Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference On, pp. 1011–1015 (2016). IEEE.
Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-net: Imagenet classification using binary convolutional neural networks. In: European Conference on Computer Vision, pp. 525–542. Springer; 2016.
Zhang X, Ramachandran A, Zhuge C, He D, Zuo W, Cheng Z, Rupnow K, Chen D. Machine learning on fpgas to face the iot revolution. In: Proceedings of the 36th International Conference on Computer-Aided Design, pp. 819–826. IEEE Press; 2017.
Wang L-T, Chang Y-W, Cheng K-TT. Electronic design automation: synthesis, verification, and test. Burlington: Morgan Kaufmann; 2009.
Google Scholar
Xiang W, Tran H-D, Johnson TT. Output reachable set estimation and verification for multilayer neural networks. IEEE Trans Neural Netw Learn Syst. 2018;29(11):5777–83.
Article MathSciNet Google Scholar
Dwarakanath A, Ahuja M, Sikand S, Rao RM, Bose R, Dubash N, Podder S. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 118–128. ACM; 2018.
Mu J, Zhang W, Liang H, Sinha S. A Collaborative Framework for FPGA-based CNN Design Modeling and Optimization. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL), pp. 139–1397. IEEE; 2018.
Hao C, Zhang X, Li Y, Huang S, Xiong J, Rupnow K, Hwu W-m, Chen D. Fpga/dnn co-design: an efficient design methodology for 1ot intelligence on the edge. In: 2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE; 2019.
Park H, Lee C, Lee H, Yoo Y, Park Y, Kim I, Yi K.:Optimizing DCNN FPGA accelerator design for handwritten hangul character recognition: work-in-progress. In: Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion, p. 11. ACM; 2017.
O’Loughlin D, Coffey A, Callaly F, Lyons D, Morgan F. Xilinx vivado high level synthesis: Case studies 2014.
Lacey G, Taylor GW, Areibi S. Deep learning on FPGAs: past, present, and future. arXiv preprint 2016. http://arxiv.org/abs/1602.04283.
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM; 2014.
Guo Y, Yao A, Chen Y. Dynamic network surgery for efficient DNNs. In: Advances in neural information processing systems, 2016; pp. 1379–1387.
Janßen B, Wingender T, Hübner M. Hardware accelerator framework approach for dynamic partial reconfigurable overlays on xilinx pynq. Informatik 2017.
Janßen B, Zimprich, P, Hübner M. A dynamic partial reconfigurable overlay concept for pynq. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–4. IEEE; 2017.
Xilinx: Python productivity for Zynq (Pynq) Documentation Release 2.2. https://buildmedia.readthedocs.org/media/pdf/pynq/latest/pynq.pdf 2019.
Johnson J. Using the AXI DMA in Vivado. 2014. http://www.fpgadeveloper.com/2014/08/using-the-axi-dma-in-vivado.html.
Xilinx: AXI DMA Controller. 2019. https://www.xilinx.com/products/intellectual-property/axi_dma.html.
Odetola TA, Mohammed Y. Similarity Map. 2020. https://github.com/yousufm97/similarity_maps.
Kästner F, Janßen B, Kautz F, Hübner M, Corradi G. Hardware/software codesign for convolutional neural networks exploiting dynamic partial reconfiguration on pynq. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 154–161. IEEE; 2018.
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ. EIE: efficient inference engine on compressed deep neural network. In: Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium On, pp. 243–254. IEEE; 2016.
Choi J, Irick KM, Hardin J, Qiu W, Yuille A, Sampson J, Narayanan V. Stochastic functional verification of DNN design through progressive virtual dataset generation. In: 2018 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. IEEE; 2018.
Lee C-w. FPGA Accelerator for CNN using Vivado HLS. GitHub 2018.
Evanczuk S. Get started with machine learning using readily available hardware and software. Digi-Key 2018.
Xilinx: Accelerating DNNs with Xilinx Alveo Accelerator Cards. Xilinx 2018.

Download references

Funding

This work is partially supported by the National Science Foundation NSF CNS #1852126, the Carnegie Classification Funding from College of Engineering, and the Center for Manufacturing Research (CMR) at Tennessee Technological University.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Tennessee Technological University, Cookeville, TN, 38505, USA
Tolulope A. Odetola, Katie M. Groves, Yousufuddin Mohammed & Syed Rafay Hasan
Department of Computer Engineering, Technische Universität Wien (TU Wien), Vienna, Austria
Faiq Khalid

Authors

Tolulope A. Odetola
View author publications
You can also search for this author in PubMed Google Scholar
Katie M. Groves
View author publications
You can also search for this author in PubMed Google Scholar
Yousufuddin Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Faiq Khalid
View author publications
You can also search for this author in PubMed Google Scholar
Syed Rafay Hasan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tolulope A. Odetola.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Odetola, T.A., Groves, K.M., Mohammed, Y. et al. 2L-3W: 2-Level 3-Way Hardware–Software Co-verification for the Mapping of Convolutional Neural Network (CNN) onto FPGA Boards. SN COMPUT. SCI. 3, 60 (2022). https://doi.org/10.1007/s42979-021-00954-5

Download citation

Received: 04 June 2021
Accepted: 26 October 2021
Published: 11 November 2021
DOI: https://doi.org/10.1007/s42979-021-00954-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

2L-3W: 2-Level 3-Way Hardware–Software Co-verification for the Mapping of Convolutional Neural Network (CNN) onto FPGA Boards

Abstract

Access this article

Similar content being viewed by others

Towards Beneficial Hardware Acceleration in HAVEN: Evaluation of Testbed Architectures

A Decomposition Workflow for Integrated Circuit Verification and Validation

How to Train Accurate BNNs for Embedded Systems?

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

2L-3W: 2-Level 3-Way Hardware–Software Co-verification for the Mapping of Convolutional Neural Network (CNN) onto FPGA Boards

Abstract

Access this article

Similar content being viewed by others

Towards Beneficial Hardware Acceleration in HAVEN: Evaluation of Testbed Architectures

A Decomposition Workflow for Integrated Circuit Verification and Validation

How to Train Accurate BNNs for Embedded Systems?

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation