skip to main content
research-article

A Comparison Study on Implementing Optical Flow and Digital Communications on FPGAs and GPUs

Authors Info & Claims
Published:01 May 2010Publication History
Skip Abstract Section

Abstract

FPGA devices have often found use as higher-performance alternatives to programmable processors for implementing computations. Applications successfully implemented on FPGAs typically contain high levels of parallelism and often use simple statically scheduled control and modest arithmetic. Recently introduced computing devices such as coarse-grain reconfigurable arrays, multi-core processors, and graphical processing units promise to significantly change the computational landscape and take advantage of many of the same application characteristics that fit well on FPGAs. One real-time computing task, optical flow, is difficult to apply in robotic vision applications because of its high computational and data rate requirements, and so is a good candidate for implementation on FPGAs and other custom computing architectures. This article reports on a series of experiments mapping a collection of different algorithms onto both an FPGA and a GPU. For two different optical flow algorithms the GPU had better performance, while for a set of digital comm MIMO computations, they had similar performance. In all cases the FPGA implementations required 10x the development time. Finally, a discussion of the two technology’s characteristics is given to show they achieve high performance in different ways.

References

  1. Alamouti, S. 1998. A simple transmit diversity technique for wireless communication. IEEE J. Selected Areas Comm. 16, 1451--1458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arribas, P. C. and Macia, F. M. H. 2001. FPGA implementation of camus correlation optical flow algorithm for real time images. In Proceedings of the 14th International Conference on Vision Interface. 32--38.Google ScholarGoogle Scholar
  3. Baker, Z. K., Gokhale, M. B., and Tripp, J. L. 2007. Matched filter computation on FPGA, cell and GPU. In Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’07). 207--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chase, J., Nelson, B., Bodily, J., Z., W., and D.J., L. 2008. Real-Time optical flow calculations on FPGA and GPU architectures: A comparison study. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM’08). IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cope, B., Cheung, P., Luk, W., and Witt, S. 2005. Have GPUs made FPGAs redundant in the field of video processing? In Proceedings of the IEEE International Conference on Field-Programmable Technology. 111--118.Google ScholarGoogle Scholar
  6. Correia, M. and Campilho, A. 2002. Real-Time implementation of an optical flow algorithm. In Proceedings of the IEEE International Conference on Image Processing (ICIP’02). Vol. 4. 247--250.Google ScholarGoogle Scholar
  7. Diaz, J., Ros, E., Pelayo, F., Ortigosa, E. M., and Mota, S. 2006. FPGA-Based real-time optical-flow system. IEEE Trans. Circ. Syst. Video Technol. 16, 2, 274--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Diepold, K., Durkovic, M., Obermeier, F., and Zwick, M. 2006. Performance of optical flow techniques on graphics hardware. In Proceedings of the International Congress on Mathematical Education (ICME’06). 241--244.Google ScholarGoogle Scholar
  9. Farneback, G. 2000a. Fast and accurate motion estimation using orientation tensors and parametric motion models. In Proceedings of the International Conference on Pattern Recognition (ICPR’00). Vol. 1. 135--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Farneback, G. 2000b. Orientation estimation based on weighted projection onto quadratic polynomials. In Proceedings of the Conference on Vision, Modeling, and Visualization. 89--96.Google ScholarGoogle Scholar
  11. Farneback, G. 2001. Very high accuracy velocity estimation using orientation tensors, parametric motion, and simultaneous segmentation of the motion field. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’01). Vol. 1. 77--80.Google ScholarGoogle ScholarCross RefCross Ref
  12. Graham, P. and Nelson, B. 1996. Genetic algorithms in software and in hardware---A performance analysis of workstation and custom computing machine implementations. In Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines. J. Arnold and K. Pocek, Eds. 216--225.Google ScholarGoogle Scholar
  13. Graham, P. and Nelson, B. 1998. FPGA-Based sonar processing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. J. Cong and S. Kaptanoglu, Eds. ACM Press, 201--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Grob, J. 2003. Linear regression. Lecture Notes in Statistics.Google ScholarGoogle Scholar
  15. Haussecker, H. and Spies, H. 1999. Handbook of Computer Vision and Application. Vol. 2. Academic Press, New York.Google ScholarGoogle Scholar
  16. He, S. and Torkelson, M. 1996. A new approach to pipeline fft processor. In Proceedings of the 10th International Parallel Processing Symposium (IPPS’96). 766--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hoerl, A. and Kennard, R. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1, 55--67.Google ScholarGoogle ScholarCross RefCross Ref
  18. Howes, L., Price, P., Mencer, O., Beckmann, O., and Pell, O. 2006. Comparing FPGAs to graphics accelerators and the playstation 2 using a unified source description. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’06). 1--6.Google ScholarGoogle Scholar
  19. Johansson, B. and Farneback, G. 2002. A theoretical comparison of different orientation tensors. In Proceedings of the Symposium on Image Analysis (SSAB’02). 69--73.Google ScholarGoogle Scholar
  20. Martin, J. L., Zuloaga, A., Cuadrado, C., Lazaro, J., and Bidarte, U. 2005. Hardware implementation of optical flow constraint equation using FPGAs. Comput. Vis. Image Understand. 98, 462--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mizukami, Y. and Tadamura, K. 2007. Optical flow computation on compute unified device architecture. In Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP’07). 179--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Niitsuma, H. and Maruyama, T. 2005. High speed computation of the optical flow. Lecture Notes in Computer Science, vol. 3617. Springer, 287--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Palmer, J. and Nelson, B. 2004. A parallel FFT architecture for FPGAs. In Proceedings of the 14th International Conference on Field Programmable Logic and Applications (FPL’04). 948--953.Google ScholarGoogle Scholar
  24. Strzodka, R. and Garbe, C. 2004. Real-Time motion estimation and visualization on graphics cards. In Proceedings of the Conference on Visualization (VIS’04). IEEE Computer Society, 545--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wei, Z., Lee, D., Nelson, B., and Archibald, J. 2008. Real-Time accurate optical flow sensor. In Proceedings of the International Conference on Pattern Recognition (ICPR’08).Google ScholarGoogle Scholar
  26. Wei, Z., Lee, D. J., Nelson, B., and Martineau, M. 2007. A fast and accurate tensor-based optical flow algorithm implemented in FPGA. In Proceedings of the IEEE Workshop on Application of Computer Vision (WACV’07). 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zach, C., Pock, T., and Bischof, H. 2007. A duality based approach for realtime TV-L1 optical flow. In Proceedings of the DAGM Symposium on Pattern Recognition. 214--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zuloaga, A., Martin, J. L., and Ezquerra, J. 1998. Hardware architecture for optical flow estimation in real time. In Proceedings of the IEEE International Conference on Image Processing (ICIP’98). Vol. 3. 972--976.Google ScholarGoogle Scholar

Index Terms

  1. A Comparison Study on Implementing Optical Flow and Digital Communications on FPGAs and GPUs

          Recommendations

          Reviews

          Vivek Venugopal

          Bodily et al. compare applications prototyped on both field-programmable gate arrays (FPGAs) and graphics processing units (GPUs). The authors describe the design effort and performance parameters, such as pipelining and parallelism, when implementing on both platforms. The first application is an optical flow calculation implemented using two algorithms: tensor based and ridge regression based. The tensor-based algorithm was implemented on a Xilinx XUP V2P board consisting of a Virtex-2 Pro XC2VP30, and it was implemented using an embedded development kit (EDK) that indicates the usage of embedded PowerPC and very-high-speed integrated circuits hardware description language (VHDL). It used 10,288 slices, which resulted in a processing power of 64 frames per second (fps) for 640x480 images and 258 fps for 320x240 images. The GPU implementation was done on a NVIDIA 8800 GTX, and the host machine was an Intel Xeon 1.86 gigahertz (GHz) with 1 gigabyte (GB) of random access memory (RAM). The GPU implementation was highly optimized for block sizes, and it resulted in 238 fps for 640x480 images and 847 fps for 320x240 images. The FPGA provided better estimates for power consumption, memory architecture, and flexibility than the GPU. The ridge regression algorithm was implemented on a custom FPGA platform consisting of a Xilinx Virtex-4 FX60 FPGA with two PowerPC embedded processors. The FPGA implementation processed 640x480 images at 15 fps. The GPU implementation was able to process 640x480 images at 158 fps. Both algorithms had better accuracy on the GPU, but the GPU used a single precision floating-point implementation whereas the FPGA used a fixed-point implementation. The development time was longer on the FPGA, resulting in ten to 12 times the design effort as compared to the GPU. The second application is based on the performance evaluation of blocks in communication systems, including the Viterbi decoder, the timing and channel estimator, and the pilot detector. The FPGA implementation of the Viterbi decoder met the timing requirement of 320 microseconds per frame, using 8,790 slices on a Xilinx Virtex-2 Pro. The GPU implementation of the Viterbi decoder provided better throughput over the FPGA by 20 percent, with an increase in latency of 32 times. The estimator used 7,000 slices on a Xilinx Virtex-2 Pro and calculated 70 surface points within the 320 microseconds time frame. The GPU implementation met the 320-microsecond time frame using a combination of coarse and fine search algorithms over the sample sizes. The pilot detector was implemented using seven 1,024-point complex fast Fourier transforms (FFTs), using 16,000 slices on the Xilinx Virtex-2 Pro FPGA. Each FFT on the FPGA took 12 microseconds and used additional logic blocks to keep up with the 41.6M sample data rate. The GPU implementation used the CUFFT library for the FFT processing and took about 60 microseconds for a single 1,024-point FFT. In summary, FPGAs offer more flexibility for custom input/output (I/O) computation, whereas GPUs offer better compute-to-I/O ratio. The memory bottleneck is present for both, as data needs to be sent to both the FPGA and the GPU onboard memory. By leveraging the pipelining of FPGAs and the parallelism of GPUs, readers can use a combination of these architectures to solve real-time applications. Online Computing Reviews Service

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Reconfigurable Technology and Systems
            ACM Transactions on Reconfigurable Technology and Systems  Volume 3, Issue 2
            May 2010
            141 pages
            ISSN:1936-7406
            EISSN:1936-7414
            DOI:10.1145/1754386
            Issue’s Table of Contents

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 May 2010
            • Accepted: 1 April 2009
            • Revised: 1 November 2008
            • Received: 1 July 2008
            Published in trets Volume 3, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader