skip to main content
10.1145/2145694.2145704acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Published:22 February 2012Publication History

ABSTRACT

With the emergence of accelerator devices such as multicores, graphics-processing units (GPUs), and field-programmable gate arrays (FPGAs), application designers are confronted with the problem of searching a huge design space that has been shown to have widely varying performance and energy metrics for different accelerators, different application domains, and different use cases. To address this problem, numerous studies have evaluated specific applications across different accelerators. In this paper, we analyze an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores. For each device, we present optimization strategies and analyze use cases where each device is most effective. The results show that FPGAs can achieve speedup of up to 11x and 57x compared to GPUs and multicores, respectively, while also using orders of magnitude less energy.

References

  1. Altera, Inc. 2011 Stratix III Early Power Estimator. http://www.altera.com/support/devices/estimator/st3-estimator/st3-power-estimator.html.Google ScholarGoogle Scholar
  2. Asano, S., Maruyama, T., and Yamaguchi, Y. 2009. Performance comparison of FPGA, GPU and CPU in image processing. In Proc. of Int. Conf. on Field Prog, Logic and App. FPL '09. 126--131.Google ScholarGoogle Scholar
  3. Baker, Z.K., Gokhale, M.B., and Tripp, J.L. 2007. Matched filter computation on FPGA, Cell and GPU. In Proc. of the IEEE Symp. on Field-Prog. Custom Computing Machines. FCCM'07. 207--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chase, J., Nelson, B., Bodily, J., Zhaoyi W., and Dah-Jye, L. 2008. Real-time optical flow calculations on FPGA and GPU architectures: a comparison study. In Proc. of the Int. Symp. on Field-Prog. Custom Computing Machines. FCCM '08. 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Che, S., Li, J., Sheaffer, J.W., Skadron, K., and Lach, J. 2008. Accelerating compute-intensive applications with GPUs and FPGAs. In Proc. of the Symp. on Application Specific Processors. SASP'08. 101--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cope, B., Cheung, P.Y.K., Luk, W., and Witt, S. 2005. Have GPUs made FPGAs redundant in the field of video processing? In Proc. of the IEEE Int. Conf. on Field-Prog. Technology. 111--118.Google ScholarGoogle Scholar
  7. Dong, Y., Dou, Y., and Zhou, J. 2007. Optimized generation of memory structure in compiling window operations onto reconfigurable hardware," in Proc. of the Int. Symp. on Applied Reconfigurable Computing, ARC '07. 110--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Friemel, B.H., Bohs, L.N., and Trahey, G.E. 1995. Relative performance of two-dimensional speckle-tracking techniques: normalized correlation, non-normalized correlation and sum-absolute-difference. In Proc. of the IEEE Ultrasonics Symp.. 2, 1481--1484.Google ScholarGoogle Scholar
  9. Frigo, M., and Johnson, S. 2009. FFTW Library. http://fftw.orgGoogle ScholarGoogle Scholar
  10. Guo, Z., Najjar, W., Vahid, F., and Vissers, K. 2004. A quantitative analysis of the speedup factors of FPGAs over processors. In Proc. of the ACM/SIGDA Int. Symp. on Field Prog. gate arrays. FPGA '04. 162--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Harris, M. 2007. "Optimizing Parallel Reduction in CUDA," NVIDIA Developer Technology.Google ScholarGoogle Scholar
  12. Hunt, L. 2009. Fault-aware machine vision in small unmanned systems. In Proc. of the Florida Conf. on Recent Advances in Robotics. FCRAR'09.Google ScholarGoogle Scholar
  13. Intel. 2010. Writing Optimal OpenCL Code with Intel OpenCL SDK: Performance Guide. http://software.intel.com/file/37171/.Google ScholarGoogle Scholar
  14. Liu, W., Pokharel, P., and Principe, J. 2007. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Tranactions on. Signal Processing, 55, 11 (Nov. 2007), 5286--5298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mehta, S., Misra, A., Singhal, A., Kumar, P., and Mittal, A. 2010. A high-performance parallel implementation of sum of absolute differences algorithm for motion estimation using CUDA. HiPC Conf. 2010.Google ScholarGoogle Scholar
  16. Munshi, A. The OpenCL Specification. http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf.Google ScholarGoogle Scholar
  17. NVIDIA. 2001. CUDA. http://developer.nvidia.com/object/cuda.html.Google ScholarGoogle Scholar
  18. NVIDIA. 2011. CUDA CUFFT Library. http://developer.nvidia.com/cuda-toolkit-40.Google ScholarGoogle Scholar
  19. NVIDIA. 2011. NVIDIA Tegra 2. http://www.nvidia.com/object/tegra-2.html.Google ScholarGoogle Scholar
  20. Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., and Phillips, J.C. 2008. GPU computing. Proc. of the IEEE. 96, 5, 879--899.Google ScholarGoogle ScholarCross RefCross Ref
  21. Pauwels, K., Tomasi, M., Diaz Alonso, J., Ros, E., and Van Hulle, M. 2011. A comparison of FPGA and GPU for real-time phase-based optical flow, stereo, and local image features. IEEE Transactions on Computers. 99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Podlozhnyuk, V. 2007. FFT-based 2D convolution. White Paper. NVIDIA Corporation.Google ScholarGoogle Scholar
  23. Porter, R.B. and Bergmann, N.W. A generic implementation framework for FPGA based stereo matching. In Proc. of the IEEE Speech and Image Technologies for Computing and Telecommunications, TENCON '97. 461--464.Google ScholarGoogle Scholar
  24. Principe, J., Fisher III, J., Xu, D. 2000. Information theoretic learning. In S. Haykin (Ed.), Unsupervised adaptive filtering. New York, NY: Wiley.Google ScholarGoogle Scholar
  25. Sinha, S., Frahm, J.M., and Pollefeys M. 2006. GPU-based Video Feature Tracking and Matching. Technical Report TR06-012, University of North Carolina at Chapel Hill.Google ScholarGoogle Scholar
  26. Underwood, K.D. and Hemmert, K.S. 2004. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance. In Proc. of the IEEE Symp. on Field-Prog. Custom Computing Machines, FCCM'04. 219--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xilinx. 2010. Virtex-4 Family Overview v3.1. (Aug 30, 2010). http://www.xilinx.com/support/documentation/data_sheets/ds112.pdfGoogle ScholarGoogle Scholar
  28. Yu, H. and Leeser, M. 2006. Automatic sliding window operation optimization for FPGA-based computing boards. In Proc. of the IEEE Symp. on Field-Prog. Custom Computing Machines. FCCM '06. 76--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhang, J., He, Y., Yang S., and Zhong, Y. 2003. Performance and complexity joint optimization for H.264 video coding. In Proc. of the Int. Symp. on Circuits and Systems. ISCAS '03. 2, 888--891.Google ScholarGoogle Scholar
  30. Zhi G., Betul B., and Walid N. 2004. Input data reuse in compiling window operations onto reconfigurable hardware. In Proc. of the ACM SIGPLAN/SIGBED Conf. on Languages, compilers, and tools for embedded systems. LCTES '04. 249--256. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
          February 2012
          352 pages
          ISBN:9781450311557
          DOI:10.1145/2145694

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 February 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          FPGA '12 Paper Acceptance Rate20of87submissions,23%Overall Acceptance Rate125of627submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader