research-article

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network

Authors:
Jiantao Qiu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jie Wang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Song Yao

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Kaiyuan Guo

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Boxun Li

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Erjin Zhou

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jincheng Yu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Tianqi Tang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Ningyi Xu

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Sen Song

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Yu Wang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Huazhong Yang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2016Pages 26–35https://doi.org/10.1145/2847263.2847265

Published:21 February 2016Publication History

FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 26–35

ABSTRACT

In recent years, convolutional neural network (CNN) based methods have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. However, CNN-based methods are com-putational-intensive and resource-consuming, and thus are hard to be integrated into embedded systems such as smart phones, smart glasses, and robots. FPGA is one of the most promising platforms for accelerating CNN, but the limited bandwidth and on-chip memory size limit the performance of FPGA accelerator for CNN.

In this paper, we go deeper with the embedded FPGA platform on accelerating CNNs and propose a CNN accelerator design on embedded FPGA for Image-Net large-scale image classification. We first present an in-depth analysis of state-of-the-art CNN models and show that Convolutional layers are computational-centric and Fully-Connected layers are memory-centric.

Then the dynamic-precision data quantization method and a convolver design that is efficient for all layer types in CNN are proposed to improve the bandwidth and resource utilization. Results show that only 0.4% accuracy loss is introduced by our data quantization flow for the very deep VGG16 model when 8/4-bit quantization is used. A data arrangement method is proposed to further ensure a high utilization of the external memory bandwidth. Finally, a state-of-the-art CNN, VGG16-SVD, is implemented on an embedded FPGA platform as a case study. VGG16-SVD is the largest and most accurate network that has been implemented on FPGA end-to-end so far. The system on Xilinx Zynq ZC706 board achieves a frame rate at 4.45 fps with the top-5 accuracy of 86.66% using 16-bit quantization. The average performance of convolutional layers and the full CNN is 187.8 GOP/s and 137.0 GOP/s under 150MHz working frequency, which outperform previous approaches significantly.

References

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge," pp. 211--252, 2015. Google ScholarDigital Library
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in NIPS, 2012, pp. 1097--1105.Google ScholarDigital Library
M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in ECCV, 2014, pp. 818--833.Google Scholar
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," arXiv preprint arXiv:1409.4842, 2014.Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," arXiv preprint arXiv:1512.03385, 2015.Google Scholar
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of ISFPGA. ACM, 2015, pp. 161--170. Google ScholarDigital Library
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in ASPLOS, vol. 49, no. 4. ACM, 2014, pp. 269--284. Google ScholarDigital Library
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun et al., "Dadiannao: A machine-learning supercomputer," in MICRO. IEEE, 2014, pp. 609--622. Google ScholarDigital Library
D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen, "Pudiannao: A polyvalent machine learning accelerator," in ASPLOS. ACM, 2015, pp. 369--381. Google ScholarDigital Library
Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, "Shidiannao: shifting vision processing closer to the sensor," in ISCA. ACM, 2015, pp. 92--104. Google ScholarDigital Library
S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, "A dynamically configurable coprocessor for convolutional neural networks," in ISCA, vol. 38, no. 3. ACM, 2010, pp. 247--257. Google ScholarDigital Library
C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, "Neuflow: A runtime reconfigurable dataflow processor for vision," in CVPRW. IEEE, 2011, pp. 109--116.Google Scholar
C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun, "Cnp: An fpga-based processor for convolutional networks," in FPL. IEEE, 2009, pp. 32--37.Google Scholar
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278--2324, 1998.Google ScholarCross Ref
A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, "Cnn features off-the-shelf: an astounding baseline for recognition," in CVPRW. IEEE, 2014, pp. 512--519. Google ScholarDigital Library
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.Google Scholar
S. J. Hanson and L. Y. Pratt, "Comparing biases for minimal network construction with back-propagation," in NIPS, 1989, pp. 177--185. Google ScholarDigital Library
Y. LeCun, J. S. Denker, S. A. Solla, R. E. Howard, and L. D. Jackel, "Optimal brain damage," in NIPS, vol. 89, 1989.Google Scholar
B. Hassibi and D. G. Stork, Second order derivatives for network pruning: Optimal brain surgeon. Morgan Kaufmann, 1993.Google Scholar
S. Han, J. Pool, J. Tran, and W. J. Dally, "Learning both weights and connections for efficient neural networks," arXiv preprint arXiv:1506.02626, 2015.Google Scholar
G. H. Golub and C. F. Van Loan, "Matrix computations. 1996," Johns Hopkins University, Press, Baltimore, MD, USA, pp. 374--426, 1996.Google Scholar
E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, "Exploiting linear structure within convolutional networks for efficient evaluation," in NIPS, 2014, pp. 1269--1277.Google Scholar
X. Zhang, J. Zou, X. Ming, K. He, and J. Sun, "Efficient and accurate approximations of nonlinear convolutional networks," arXiv preprint arXiv:1411.4229, 2014.Google Scholar
M. Jaderberg, A. Vedaldi, and A. Zisserman, "Speeding up convolutional neural networks with low rank expansions," arXiv preprint arXiv:1405.3866, 2014.Google Scholar
C. Farabet, Y. LeCun, K. Kavukcuoglu, E. Culurciello, B. Martini, P. Akselrod, and S. Talay, "Large-scale fpga-based convolutional networks," Machine Learning on Very Large Data Sets, vol. 1, 2011.Google Scholar
M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. Graf, "A massively parallel coprocessor for convolutional neural networks," in ASAP, July 2009, pp. 53--60. Google ScholarDigital Library
D. Larkin, A. Kinane, and N. O'Connor, "Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices," in Neural Information Processing. Springer, 2006, pp. 1178--1188. Google ScholarDigital Library
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," arXiv preprint arXiv:1408.5093, 2014.Google Scholar
B. Bosi, G. Bois, and Y. Savaria, "Reconfigurable pipelined 2-d convolvers for fast digital signal processing," VLSI, vol. 7, no. 3, pp. 299--308, 1999. Google ScholarDigital Library
V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, "A 240 g-ops/s mobile coprocessor for deep neural networks," in CVPRW. IEEE, 2014, pp. 696--701. Google ScholarDigital Library

Index Terms

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Special purpose systems
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

A Fully Onchip Binarized Convolutional Neural Network FPGA Impelmentation with Accurate Inference
ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

Deep convolutional neural network has taken an important role in machine learning algorithm which has been widely used in computer vision tasks. However, its enormous model size and massive computation cost have became the main obstacle for deployment ...
Read More
An FPGA-based accelerator platform implements for convolutional neural network
HP3C '19: Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications

In recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and ...
Read More
FPGA Implementation of an Ultrasonic Flaw Detection Algorithm Based on Convolutional Neural Networks
Abstract
Convolutional Neural Networks (CNN) and derivative architectures have been increasingly popular for image and signal processing applications such as detection and classification. Recently, a CNN architecture with a Wavelet Packet feature selector (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2016
298 pages
ISBN:9781450338561
DOI:10.1145/2847263
General Chair:
Deming Chen
University of Illinois at Urbana-Champaign, USA
,
Program Chair:
Jonathan Greene
Microsemi, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 February 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bandwidth utilization
convolutional neural network (cnn)
dynamic-precision data quantization
embedded fpga
Qualifiers
- research-article
Conference

Acceptance Rates
FPGA '16 Paper Acceptance Rate20of111submissions,18%Overall Acceptance Rate125of627submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 831
  Total Citations
  View Citations
- 11,362
  Total Downloads
- Downloads (Last 12 months)951
- Downloads (Last 6 weeks)168
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network

FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Fully Onchip Binarized Convolutional Neural Network FPGA Impelmentation with Accurate Inference

An FPGA-based accelerator platform implements for convolutional neural network

FPGA Implementation of an Ultrasonic Flaw Detection Algorithm Based on Convolutional Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network

FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Fully Onchip Binarized Convolutional Neural Network FPGA Impelmentation with Accurate Inference

An FPGA-based accelerator platform implements for convolutional neural network

FPGA Implementation of an Ultrasonic Flaw Detection Algorithm Based on Convolutional Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media