Abstract
FPGA devices have often found use as higher-performance alternatives to programmable processors for implementing computations. Applications successfully implemented on FPGAs typically contain high levels of parallelism and often use simple statically scheduled control and modest arithmetic. Recently introduced computing devices such as coarse-grain reconfigurable arrays, multi-core processors, and graphical processing units promise to significantly change the computational landscape and take advantage of many of the same application characteristics that fit well on FPGAs. One real-time computing task, optical flow, is difficult to apply in robotic vision applications because of its high computational and data rate requirements, and so is a good candidate for implementation on FPGAs and other custom computing architectures. This article reports on a series of experiments mapping a collection of different algorithms onto both an FPGA and a GPU. For two different optical flow algorithms the GPU had better performance, while for a set of digital comm MIMO computations, they had similar performance. In all cases the FPGA implementations required 10x the development time. Finally, a discussion of the two technology’s characteristics is given to show they achieve high performance in different ways.
- Alamouti, S. 1998. A simple transmit diversity technique for wireless communication. IEEE J. Selected Areas Comm. 16, 1451--1458. Google ScholarDigital Library
- Arribas, P. C. and Macia, F. M. H. 2001. FPGA implementation of camus correlation optical flow algorithm for real time images. In Proceedings of the 14th International Conference on Vision Interface. 32--38.Google Scholar
- Baker, Z. K., Gokhale, M. B., and Tripp, J. L. 2007. Matched filter computation on FPGA, cell and GPU. In Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’07). 207--218. Google ScholarDigital Library
- Chase, J., Nelson, B., Bodily, J., Z., W., and D.J., L. 2008. Real-Time optical flow calculations on FPGA and GPU architectures: A comparison study. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM’08). IEEE Computer Society Press. Google ScholarDigital Library
- Cope, B., Cheung, P., Luk, W., and Witt, S. 2005. Have GPUs made FPGAs redundant in the field of video processing? In Proceedings of the IEEE International Conference on Field-Programmable Technology. 111--118.Google Scholar
- Correia, M. and Campilho, A. 2002. Real-Time implementation of an optical flow algorithm. In Proceedings of the IEEE International Conference on Image Processing (ICIP’02). Vol. 4. 247--250.Google Scholar
- Diaz, J., Ros, E., Pelayo, F., Ortigosa, E. M., and Mota, S. 2006. FPGA-Based real-time optical-flow system. IEEE Trans. Circ. Syst. Video Technol. 16, 2, 274--279. Google ScholarDigital Library
- Diepold, K., Durkovic, M., Obermeier, F., and Zwick, M. 2006. Performance of optical flow techniques on graphics hardware. In Proceedings of the International Congress on Mathematical Education (ICME’06). 241--244.Google Scholar
- Farneback, G. 2000a. Fast and accurate motion estimation using orientation tensors and parametric motion models. In Proceedings of the International Conference on Pattern Recognition (ICPR’00). Vol. 1. 135--139. Google ScholarDigital Library
- Farneback, G. 2000b. Orientation estimation based on weighted projection onto quadratic polynomials. In Proceedings of the Conference on Vision, Modeling, and Visualization. 89--96.Google Scholar
- Farneback, G. 2001. Very high accuracy velocity estimation using orientation tensors, parametric motion, and simultaneous segmentation of the motion field. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’01). Vol. 1. 77--80.Google ScholarCross Ref
- Graham, P. and Nelson, B. 1996. Genetic algorithms in software and in hardware---A performance analysis of workstation and custom computing machine implementations. In Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines. J. Arnold and K. Pocek, Eds. 216--225.Google Scholar
- Graham, P. and Nelson, B. 1998. FPGA-Based sonar processing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. J. Cong and S. Kaptanoglu, Eds. ACM Press, 201--208. Google ScholarDigital Library
- Grob, J. 2003. Linear regression. Lecture Notes in Statistics.Google Scholar
- Haussecker, H. and Spies, H. 1999. Handbook of Computer Vision and Application. Vol. 2. Academic Press, New York.Google Scholar
- He, S. and Torkelson, M. 1996. A new approach to pipeline fft processor. In Proceedings of the 10th International Parallel Processing Symposium (IPPS’96). 766--770. Google ScholarDigital Library
- Hoerl, A. and Kennard, R. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1, 55--67.Google ScholarCross Ref
- Howes, L., Price, P., Mencer, O., Beckmann, O., and Pell, O. 2006. Comparing FPGAs to graphics accelerators and the playstation 2 using a unified source description. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’06). 1--6.Google Scholar
- Johansson, B. and Farneback, G. 2002. A theoretical comparison of different orientation tensors. In Proceedings of the Symposium on Image Analysis (SSAB’02). 69--73.Google Scholar
- Martin, J. L., Zuloaga, A., Cuadrado, C., Lazaro, J., and Bidarte, U. 2005. Hardware implementation of optical flow constraint equation using FPGAs. Comput. Vis. Image Understand. 98, 462--490. Google ScholarDigital Library
- Mizukami, Y. and Tadamura, K. 2007. Optical flow computation on compute unified device architecture. In Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP’07). 179--184. Google ScholarDigital Library
- Niitsuma, H. and Maruyama, T. 2005. High speed computation of the optical flow. Lecture Notes in Computer Science, vol. 3617. Springer, 287--295. Google ScholarDigital Library
- Palmer, J. and Nelson, B. 2004. A parallel FFT architecture for FPGAs. In Proceedings of the 14th International Conference on Field Programmable Logic and Applications (FPL’04). 948--953.Google Scholar
- Strzodka, R. and Garbe, C. 2004. Real-Time motion estimation and visualization on graphics cards. In Proceedings of the Conference on Visualization (VIS’04). IEEE Computer Society, 545--552. Google ScholarDigital Library
- Wei, Z., Lee, D., Nelson, B., and Archibald, J. 2008. Real-Time accurate optical flow sensor. In Proceedings of the International Conference on Pattern Recognition (ICPR’08).Google Scholar
- Wei, Z., Lee, D. J., Nelson, B., and Martineau, M. 2007. A fast and accurate tensor-based optical flow algorithm implemented in FPGA. In Proceedings of the IEEE Workshop on Application of Computer Vision (WACV’07). 18. Google ScholarDigital Library
- Zach, C., Pock, T., and Bischof, H. 2007. A duality based approach for realtime TV-L1 optical flow. In Proceedings of the DAGM Symposium on Pattern Recognition. 214--223. Google ScholarDigital Library
- Zuloaga, A., Martin, J. L., and Ezquerra, J. 1998. Hardware architecture for optical flow estimation in real time. In Proceedings of the IEEE International Conference on Image Processing (ICIP’98). Vol. 3. 972--976.Google Scholar
Index Terms
- A Comparison Study on Implementing Optical Flow and Digital Communications on FPGAs and GPUs
Recommendations
Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysCurrent-generation Deep Neural Networks (DNNs), such as AlexNet and VGG, rely heavily on dense floating-point matrix multiplication (GEMM), which maps well to GPUs (regular parallelism, high TFLOP/s). Because of this, GPUs are widely used for ...
Understanding Performance Differences of FPGAs and GPUs: (Abtract Only)
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysThe notorious power wall has significantly limited the scaling for general-purpose processors. To address this issue, various accelerators, such as GPUs and FPGAs, emerged to achieve better performance and energy-efficiency. Between these two ...
Exploiting Parallelism on GPUs and FPGAs with OmpSs
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC SystemsThis paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives ...
Comments