skip to main content
10.1145/2847263.2847265acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network

Published:21 February 2016Publication History

ABSTRACT

In recent years, convolutional neural network (CNN) based methods have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. However, CNN-based methods are com-putational-intensive and resource-consuming, and thus are hard to be integrated into embedded systems such as smart phones, smart glasses, and robots. FPGA is one of the most promising platforms for accelerating CNN, but the limited bandwidth and on-chip memory size limit the performance of FPGA accelerator for CNN.

In this paper, we go deeper with the embedded FPGA platform on accelerating CNNs and propose a CNN accelerator design on embedded FPGA for Image-Net large-scale image classification. We first present an in-depth analysis of state-of-the-art CNN models and show that Convolutional layers are computational-centric and Fully-Connected layers are memory-centric.

Then the dynamic-precision data quantization method and a convolver design that is efficient for all layer types in CNN are proposed to improve the bandwidth and resource utilization. Results show that only 0.4% accuracy loss is introduced by our data quantization flow for the very deep VGG16 model when 8/4-bit quantization is used. A data arrangement method is proposed to further ensure a high utilization of the external memory bandwidth. Finally, a state-of-the-art CNN, VGG16-SVD, is implemented on an embedded FPGA platform as a case study. VGG16-SVD is the largest and most accurate network that has been implemented on FPGA end-to-end so far. The system on Xilinx Zynq ZC706 board achieves a frame rate at 4.45 fps with the top-5 accuracy of 86.66% using 16-bit quantization. The average performance of convolutional layers and the full CNN is 187.8 GOP/s and 137.0 GOP/s under 150MHz working frequency, which outperform previous approaches significantly.

References

  1. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge," pp. 211--252, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in NIPS, 2012, pp. 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in ECCV, 2014, pp. 818--833.Google ScholarGoogle Scholar
  4. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," arXiv preprint arXiv:1409.4842, 2014.Google ScholarGoogle Scholar
  5. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," arXiv preprint arXiv:1512.03385, 2015.Google ScholarGoogle Scholar
  6. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of ISFPGA. ACM, 2015, pp. 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in ASPLOS, vol. 49, no. 4. ACM, 2014, pp. 269--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun et al., "Dadiannao: A machine-learning supercomputer," in MICRO. IEEE, 2014, pp. 609--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen, "Pudiannao: A polyvalent machine learning accelerator," in ASPLOS. ACM, 2015, pp. 369--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, "Shidiannao: shifting vision processing closer to the sensor," in ISCA. ACM, 2015, pp. 92--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, "A dynamically configurable coprocessor for convolutional neural networks," in ISCA, vol. 38, no. 3. ACM, 2010, pp. 247--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, "Neuflow: A runtime reconfigurable dataflow processor for vision," in CVPRW. IEEE, 2011, pp. 109--116.Google ScholarGoogle Scholar
  13. C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun, "Cnp: An fpga-based processor for convolutional networks," in FPL. IEEE, 2009, pp. 32--37.Google ScholarGoogle Scholar
  14. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278--2324, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, "Cnn features off-the-shelf: an astounding baseline for recognition," in CVPRW. IEEE, 2014, pp. 512--519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.Google ScholarGoogle Scholar
  17. S. J. Hanson and L. Y. Pratt, "Comparing biases for minimal network construction with back-propagation," in NIPS, 1989, pp. 177--185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. LeCun, J. S. Denker, S. A. Solla, R. E. Howard, and L. D. Jackel, "Optimal brain damage," in NIPS, vol. 89, 1989.Google ScholarGoogle Scholar
  19. B. Hassibi and D. G. Stork, Second order derivatives for network pruning: Optimal brain surgeon. Morgan Kaufmann, 1993.Google ScholarGoogle Scholar
  20. S. Han, J. Pool, J. Tran, and W. J. Dally, "Learning both weights and connections for efficient neural networks," arXiv preprint arXiv:1506.02626, 2015.Google ScholarGoogle Scholar
  21. G. H. Golub and C. F. Van Loan, "Matrix computations. 1996," Johns Hopkins University, Press, Baltimore, MD, USA, pp. 374--426, 1996.Google ScholarGoogle Scholar
  22. E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, "Exploiting linear structure within convolutional networks for efficient evaluation," in NIPS, 2014, pp. 1269--1277.Google ScholarGoogle Scholar
  23. X. Zhang, J. Zou, X. Ming, K. He, and J. Sun, "Efficient and accurate approximations of nonlinear convolutional networks," arXiv preprint arXiv:1411.4229, 2014.Google ScholarGoogle Scholar
  24. M. Jaderberg, A. Vedaldi, and A. Zisserman, "Speeding up convolutional neural networks with low rank expansions," arXiv preprint arXiv:1405.3866, 2014.Google ScholarGoogle Scholar
  25. C. Farabet, Y. LeCun, K. Kavukcuoglu, E. Culurciello, B. Martini, P. Akselrod, and S. Talay, "Large-scale fpga-based convolutional networks," Machine Learning on Very Large Data Sets, vol. 1, 2011.Google ScholarGoogle Scholar
  26. M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. Graf, "A massively parallel coprocessor for convolutional neural networks," in ASAP, July 2009, pp. 53--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Larkin, A. Kinane, and N. O'Connor, "Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices," in Neural Information Processing. Springer, 2006, pp. 1178--1188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," arXiv preprint arXiv:1408.5093, 2014.Google ScholarGoogle Scholar
  29. B. Bosi, G. Bois, and Y. Savaria, "Reconfigurable pipelined 2-d convolvers for fast digital signal processing," VLSI, vol. 7, no. 3, pp. 299--308, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, "A 240 g-ops/s mobile coprocessor for deep neural networks," in CVPRW. IEEE, 2014, pp. 696--701. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
        February 2016
        298 pages
        ISBN:9781450338561
        DOI:10.1145/2847263

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 February 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        FPGA '16 Paper Acceptance Rate20of111submissions,18%Overall Acceptance Rate125of627submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader