research-article

A Comparison Study on Implementing Optical Flow and Digital Communications on FPGAs and GPUs

Authors:
John Bodily

NSF Center for High Performance Reconfigurable Computing (CHREC), Brigham Young University

NSF Center for High Performance Reconfigurable Computing (CHREC), Brigham Young University
View Profile

,
Brent Nelson

NSF Center for High Performance Reconfigurable Computing (CHREC), Brigham Young University

NSF Center for High Performance Reconfigurable Computing (CHREC), Brigham Young University
View Profile

,
Zhaoyi Wei

NSF Center for High Performance Reconfigurable Computing (CHREC), Brigham Young University

NSF Center for High Performance Reconfigurable Computing (CHREC), Brigham Young University
View Profile

,
Dah-Jye Lee

NSF Center for High Performance Reconfigurable Computing (CHREC), Brigham Young University

NSF Center for High Performance Reconfigurable Computing (CHREC), Brigham Young University
View Profile

,
Jeff Chase

NSF Center for High Performance Reconfigurable Computing (CHREC), Brigham Young University

NSF Center for High Performance Reconfigurable Computing (CHREC), Brigham Young University
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 3 Issue 2Article No.: 6pp 1–22https://doi.org/10.1145/1754386.1754387

Published:01 May 2010Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

FPGA devices have often found use as higher-performance alternatives to programmable processors for implementing computations. Applications successfully implemented on FPGAs typically contain high levels of parallelism and often use simple statically scheduled control and modest arithmetic. Recently introduced computing devices such as coarse-grain reconfigurable arrays, multi-core processors, and graphical processing units promise to significantly change the computational landscape and take advantage of many of the same application characteristics that fit well on FPGAs. One real-time computing task, optical flow, is difficult to apply in robotic vision applications because of its high computational and data rate requirements, and so is a good candidate for implementation on FPGAs and other custom computing architectures. This article reports on a series of experiments mapping a collection of different algorithms onto both an FPGA and a GPU. For two different optical flow algorithms the GPU had better performance, while for a set of digital comm MIMO computations, they had similar performance. In all cases the FPGA implementations required 10x the development time. Finally, a discussion of the two technology’s characteristics is given to show they achieve high performance in different ways.

References

Alamouti, S. 1998. A simple transmit diversity technique for wireless communication. IEEE J. Selected Areas Comm. 16, 1451--1458. Google ScholarDigital Library
Arribas, P. C. and Macia, F. M. H. 2001. FPGA implementation of camus correlation optical flow algorithm for real time images. In Proceedings of the 14th International Conference on Vision Interface. 32--38.Google Scholar
Baker, Z. K., Gokhale, M. B., and Tripp, J. L. 2007. Matched filter computation on FPGA, cell and GPU. In Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’07). 207--218. Google ScholarDigital Library
Chase, J., Nelson, B., Bodily, J., Z., W., and D.J., L. 2008. Real-Time optical flow calculations on FPGA and GPU architectures: A comparison study. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM’08). IEEE Computer Society Press. Google ScholarDigital Library
Cope, B., Cheung, P., Luk, W., and Witt, S. 2005. Have GPUs made FPGAs redundant in the field of video processing? In Proceedings of the IEEE International Conference on Field-Programmable Technology. 111--118.Google Scholar
Correia, M. and Campilho, A. 2002. Real-Time implementation of an optical flow algorithm. In Proceedings of the IEEE International Conference on Image Processing (ICIP’02). Vol. 4. 247--250.Google Scholar
Diaz, J., Ros, E., Pelayo, F., Ortigosa, E. M., and Mota, S. 2006. FPGA-Based real-time optical-flow system. IEEE Trans. Circ. Syst. Video Technol. 16, 2, 274--279. Google ScholarDigital Library
Diepold, K., Durkovic, M., Obermeier, F., and Zwick, M. 2006. Performance of optical flow techniques on graphics hardware. In Proceedings of the International Congress on Mathematical Education (ICME’06). 241--244.Google Scholar
Farneback, G. 2000a. Fast and accurate motion estimation using orientation tensors and parametric motion models. In Proceedings of the International Conference on Pattern Recognition (ICPR’00). Vol. 1. 135--139. Google ScholarDigital Library
Farneback, G. 2000b. Orientation estimation based on weighted projection onto quadratic polynomials. In Proceedings of the Conference on Vision, Modeling, and Visualization. 89--96.Google Scholar
Farneback, G. 2001. Very high accuracy velocity estimation using orientation tensors, parametric motion, and simultaneous segmentation of the motion field. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’01). Vol. 1. 77--80.Google ScholarCross Ref
Graham, P. and Nelson, B. 1996. Genetic algorithms in software and in hardware---A performance analysis of workstation and custom computing machine implementations. In Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines. J. Arnold and K. Pocek, Eds. 216--225.Google Scholar
Graham, P. and Nelson, B. 1998. FPGA-Based sonar processing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. J. Cong and S. Kaptanoglu, Eds. ACM Press, 201--208. Google ScholarDigital Library
Grob, J. 2003. Linear regression. Lecture Notes in Statistics.Google Scholar
Haussecker, H. and Spies, H. 1999. Handbook of Computer Vision and Application. Vol. 2. Academic Press, New York.Google Scholar
He, S. and Torkelson, M. 1996. A new approach to pipeline fft processor. In Proceedings of the 10th International Parallel Processing Symposium (IPPS’96). 766--770. Google ScholarDigital Library
Hoerl, A. and Kennard, R. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1, 55--67.Google ScholarCross Ref
Howes, L., Price, P., Mencer, O., Beckmann, O., and Pell, O. 2006. Comparing FPGAs to graphics accelerators and the playstation 2 using a unified source description. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’06). 1--6.Google Scholar
Johansson, B. and Farneback, G. 2002. A theoretical comparison of different orientation tensors. In Proceedings of the Symposium on Image Analysis (SSAB’02). 69--73.Google Scholar
Martin, J. L., Zuloaga, A., Cuadrado, C., Lazaro, J., and Bidarte, U. 2005. Hardware implementation of optical flow constraint equation using FPGAs. Comput. Vis. Image Understand. 98, 462--490. Google ScholarDigital Library
Mizukami, Y. and Tadamura, K. 2007. Optical flow computation on compute unified device architecture. In Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP’07). 179--184. Google ScholarDigital Library
Niitsuma, H. and Maruyama, T. 2005. High speed computation of the optical flow. Lecture Notes in Computer Science, vol. 3617. Springer, 287--295. Google ScholarDigital Library
Palmer, J. and Nelson, B. 2004. A parallel FFT architecture for FPGAs. In Proceedings of the 14th International Conference on Field Programmable Logic and Applications (FPL’04). 948--953.Google Scholar
Strzodka, R. and Garbe, C. 2004. Real-Time motion estimation and visualization on graphics cards. In Proceedings of the Conference on Visualization (VIS’04). IEEE Computer Society, 545--552. Google ScholarDigital Library
Wei, Z., Lee, D., Nelson, B., and Archibald, J. 2008. Real-Time accurate optical flow sensor. In Proceedings of the International Conference on Pattern Recognition (ICPR’08).Google Scholar
Wei, Z., Lee, D. J., Nelson, B., and Martineau, M. 2007. A fast and accurate tensor-based optical flow algorithm implemented in FPGA. In Proceedings of the IEEE Workshop on Application of Computer Vision (WACV’07). 18. Google ScholarDigital Library
Zach, C., Pock, T., and Bischof, H. 2007. A duality based approach for realtime TV-L1 optical flow. In Proceedings of the DAGM Symposium on Pattern Recognition. 214--223. Google ScholarDigital Library
Zuloaga, A., Martin, J. L., and Ezquerra, J. 1998. Hardware architecture for optical flow estimation in real time. In Proceedings of the IEEE International Conference on Image Processing (ICIP’98). Vol. 3. 972--976.Google Scholar

Index Terms

A Comparison Study on Implementing Optical Flow and Digital Communications on FPGAs and GPUs

Recommendations

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Current-generation Deep Neural Networks (DNNs), such as AlexNet and VGG, rely heavily on dense floating-point matrix multiplication (GEMM), which maps well to GPUs (regular parallelism, high TFLOP/s). Because of this, GPUs are widely used for ...
Read More
Understanding Performance Differences of FPGAs and GPUs: (Abtract Only)
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

The notorious power wall has significantly limited the scaling for general-purpose processors. To address this issue, various accelerators, such as GPUs and FPGAs, emerged to achieve better performance and energy-efficiency. Between these two ...
Read More
Exploiting Parallelism on GPUs and FPGAs with OmpSs
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems

This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives ...
Read More

Reviews

Reviewer: Vivek Venugopal

Bodily et al. compare applications prototyped on both field-programmable gate arrays (FPGAs) and graphics processing units (GPUs). The authors describe the design effort and performance parameters, such as pipelining and parallelism, when implementing on both platforms. The first application is an optical flow calculation implemented using two algorithms: tensor based and ridge regression based. The tensor-based algorithm was implemented on a Xilinx XUP V2P board consisting of a Virtex-2 Pro XC2VP30, and it was implemented using an embedded development kit (EDK) that indicates the usage of embedded PowerPC and very-high-speed integrated circuits hardware description language (VHDL). It used 10,288 slices, which resulted in a processing power of 64 frames per second (fps) for 640x480 images and 258 fps for 320x240 images. The GPU implementation was done on a NVIDIA 8800 GTX, and the host machine was an Intel Xeon 1.86 gigahertz (GHz) with 1 gigabyte (GB) of random access memory (RAM). The GPU implementation was highly optimized for block sizes, and it resulted in 238 fps for 640x480 images and 847 fps for 320x240 images. The FPGA provided better estimates for power consumption, memory architecture, and flexibility than the GPU. The ridge regression algorithm was implemented on a custom FPGA platform consisting of a Xilinx Virtex-4 FX60 FPGA with two PowerPC embedded processors. The FPGA implementation processed 640x480 images at 15 fps. The GPU implementation was able to process 640x480 images at 158 fps. Both algorithms had better accuracy on the GPU, but the GPU used a single precision floating-point implementation whereas the FPGA used a fixed-point implementation. The development time was longer on the FPGA, resulting in ten to 12 times the design effort as compared to the GPU. The second application is based on the performance evaluation of blocks in communication systems, including the Viterbi decoder, the timing and channel estimator, and the pilot detector. The FPGA implementation of the Viterbi decoder met the timing requirement of 320 microseconds per frame, using 8,790 slices on a Xilinx Virtex-2 Pro. The GPU implementation of the Viterbi decoder provided better throughput over the FPGA by 20 percent, with an increase in latency of 32 times. The estimator used 7,000 slices on a Xilinx Virtex-2 Pro and calculated 70 surface points within the 320 microseconds time frame. The GPU implementation met the 320-microsecond time frame using a combination of coarse and fine search algorithms over the sample sizes. The pilot detector was implemented using seven 1,024-point complex fast Fourier transforms (FFTs), using 16,000 slices on the Xilinx Virtex-2 Pro FPGA. Each FFT on the FPGA took 12 microseconds and used additional logic blocks to keep up with the 41.6M sample data rate. The GPU implementation used the CUFFT library for the FFT processing and took about 60 microseconds for a single 1,024-point FFT. In summary, FPGAs offer more flexibility for custom input/output (I/O) computation, whereas GPUs offer better compute-to-I/O ratio. The memory bottleneck is present for both, as data needs to be sent to both the FPGA and the GPU onboard memory. By leveraging the pipelining of FPGAs and the parallelism of GPUs, readers can use a combination of these architectures to solve real-time applications. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Reconfigurable Technology and Systems Volume 3, Issue 2
May 2010
141 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/1754386
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2010
- Accepted: 1 April 2009
- Revised: 1 November 2008
- Received: 1 July 2008
Published in trets Volume 3, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Digital communications
FPGA
GPU
optical flow
reconfigurable computing
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 638
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Comparison Study on Implementing Optical Flow and Digital Communications on FPGAs and GPUs

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?

Understanding Performance Differences of FPGAs and GPUs: (Abtract Only)

Exploiting Parallelism on GPUs and FPGAs with OmpSs

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Comparison Study on Implementing Optical Flow and Digital Communications on FPGAs and GPUs

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?

Understanding Performance Differences of FPGAs and GPUs: (Abtract Only)

Exploiting Parallelism on GPUs and FPGAs with OmpSs

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media