ABSTRACT
This paper describes an automatic approach to accelerate image processing pipelines using FPGAs. An image processing pipeline can be viewed as a graph of interconnected stages that processes images successively. Each stage typically performs a point-wise, stencil, or other more complex operations on image pixels. Recent efforts have led to the development of domain-specific languages (DSL) and optimization frameworks for image processing pipelines. In this paper, we develop an approach to map image processing pipelines expressed in the PolyMage DSL to efficient parallel FPGA designs. Our approach exploits reuse and available memory bandwidth (or chip resources) maximally. When compared to Darkroom, a state-of-the-art approach to compile high-level DSL to FPGAs, our approach (a) leads to designs that deliver significantly higher throughput, and (b) supports a greater variety of filters. Furthermore, the designs we generate obtain an improvement even over pre-optimized FPGA implementations provided by vendor libraries for some of the benchmarks.
- C. Alias, A. Darte, and A. Plesco. Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA. In International workshop on Polyhedral Compilation Techniques (IMPACT), 2012. Google ScholarDigital Library
- J. Auerbach, D. F. Bacon, I. Burcea, P. Cheng, S. J. Fink, R. Rabbah, and S. Shukla. A compiler and runtime for heterogeneous computing. In Design Automation Conference, pages 271--276, 2012. Google ScholarDigital Library
- D. F. Bacon, R. M. Rabbah, and S. Shukla. FPGA programming for the masses. Commun. ACM, 56(4):56--63, 2013. Google ScholarDigital Library
- Blender Foundation. Big Buck Bunny, 2008. The movie. http://www.bigbuckbunny.org/ License: CC BY 3.0 https://creativecommons.org/licenses/by/3.0/.Google Scholar
- U. Bondhugula, J. Ramanujam, and P. Sadayappan. Automatic mapping of nested loops to FPGAs. In ACM SIGPLAN PPoPP, Mar. 2007. Google ScholarDigital Library
- J. M. Cardoso and D. P. C. Compilation Techniques for Reconfigurable Architectures. Springer US, 2009. Google ScholarDigital Library
- Creative Commons Attribution 3.0 license (CC BY 3.0). https://creativecommons.org/licenses/by/3.0/.Google Scholar
- Creative Commons Attribution-ShareAlike 3.0 license (CC BY-SA 3.0). https://creativecommons.org/licenses/by-sa/3.0/.Google Scholar
- A. Darte, R. Schreiber, B. R. Rau, and F. Vivien. A Constructive Solution to the Juggling Problem in Processor Array Synthesis. In IPDPS, pages 815--822, 2000. Google ScholarDigital Library
- C. Dase, J. Falcon, and B. MacCleery. Motorcycle control prototyping using an FPGA-based embedded control system. Control Systems, IEEE, 26(5):17--21, 2006.Google ScholarCross Ref
- P. C. Diniz, M. W. Hall, J. Park, B. So, and H. Ziegler. Bridging the Gap between Compilation and Synthesis in the DEFACTO System. In LCPC, pages 52--70, 2001. Google ScholarDigital Library
- M. B. Gokhale, J. M. Stone, J. Arnold, and M. Kalinowski. Stream-oriented FPGA computing in the Streams-C high level language. In IEEE symposium on Field-Programmable Custom Computing Machines, pages 49--56, 2000. Google ScholarDigital Library
- Z. Guo, W. Najjar, and B. Buyukkurt. Efficient hardware code generation for FPGAs. ACM Trans. Archit. Code Optim., 5(1):6:1--6:26, May 2008. Google ScholarDigital Library
- A. Hagiescu, W.-F. Wong, D. Bacon, and R. Rabbah. A computing origami: Folding streams in FPGAs. In ACM/IEEE Design Automation Conference, pages 282--287, 2009. Google ScholarDigital Library
- J. Hegarty, J. Brunhaver, Z. DeVito, J. Ragan-Kelley, N. Cohen, S. Bell, A. Vasilyev, M. Horowitz, and P. Hanrahan. Darkroom: Compiling high-level image nprocessing code into hardware pipelines. ACM Trans. Graph., 33(4):144:1--144:11, 2014. Google ScholarDigital Library
- The Heterogeneous Image Processing Acceleration Framework. http://hipacc-lang.org/.Google Scholar
- J. Holewinski, L.-N. Pouchet, and P. Sadayappan. High-performance code generation for stencil computations on GPU architectures. In International conference on Supercomputing, pages 311--320, 2012. Google ScholarDigital Library
- A. Hormati, M. Kudlur, S. Mahlke, D. Bacon, and R. Rabbah. Optimus: Efficient realization of streaming applications on FPGAs. In 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), pages 41--50, 2008. Google ScholarDigital Library
- B. K. P. Horn and B. G. Schunck. Determining optical flow. Artif. Intell., 17(1-3):185--203, 1981.Google ScholarDigital Library
- S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN conference on Programming Languages Design and Implementation, 2007. Google ScholarDigital Library
- MATLAB HDL Coder. The MathWorks Inc. http://in.mathworks.com/products/hdl-coder//.Google Scholar
- R. Membarth, O. Reiche, F. Hannig, J. Teich, M. Körner, and W. Eckert. Hipacc: A domain-specific language and compiler for image processing. IEEE Trans. Parallel Distrib. Syst., 27(1):210--224, 2016. Google ScholarDigital Library
- R. T. Mullapudi, V. Vasista, and U. Bondhugula. Polymage: Automatic optimization for image processing pipelines. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 429--443, 2015. Google ScholarDigital Library
- W. A. Najjar, W. Böhm, B. A. Draper, J. Hammes, R. Rinker, J. R. Beveridge, M. Chawathe, and C. Ross. High-level language abstraction for reconfigurable computing. Computer, 36(8):63--69, Aug. 2003. Google ScholarDigital Library
- R. S. Nikhil and Arvind. What is bluespec? SIGDA Newsl., 39(1):1--1, Jan. 2009. Google ScholarDigital Library
- M. Owaida, N. Bellas, K. Daloukas, and C. Antonopoulos. Synthesis of platform architectures from OpenCL programs. In IEEE Field-Programmable Custom Computing Machines (FCCM), pages 186--193, May 2011. Google ScholarDigital Library
- P. R. Panda. Systemc: A modeling platform supporting multiple design abstractions. In 14th International symposium on Systems Synthesis, pages 75--80, 2001. Google ScholarDigital Library
- A. Papakonstantinou, K. Gururaj, J. A. Stratton, D. Chen, J. Cong, and W. W. Hwu. Efficient compilation of CUDA kernels for high-performance computing on FPGAs. ACM Trans. Embedded Comput. Syst., 13(2):25, 2013. Google ScholarDigital Library
- PolyMage benchmarks, 2015. https://github.com/bondhugula/polymage-benchmarks.Google Scholar
- PolyMage: A DSL and compiler for automatic optimization of image processing pipelines, 2015. http://mcl.csa.iisc.ernet.in/polymage.html.Google Scholar
- L. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In ACM/SIGDA International symposium on FPGAs, pages 29--38, 2013. Google ScholarDigital Library
- J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 519--530, 2013. Google ScholarDigital Library
- M. Ravishankar, J. Holewinski, and V. Grover. Forma: A dsl for image processing applications to target gpus and multi-core cpus. In 8th Workshop on General Purpose Processing Using GPUs, pages 109--120, 2015. Google ScholarDigital Library
- O. Reiche, M. Schmid, F. Hannig, R. Membarth, and J. Teich. Code generation from a domain-specific language for C-based HLS of hardware accelerators. In 2014 International Conference on Hardware/Software Codesign and System Synthesis, pages 17:1--17:10, 2014. Google ScholarDigital Library
- R. Schreiber, S. Aditya, S. Mahlke, V. Kathail, B. R. Rau, D. Cronquist, and M. Sivaraman. PICO-NPA: High-Level synthesis of non-programmable hardware maccelerators. J. VLSI Signal Process. Syst., 31(2):127--142, 2002. Google ScholarDigital Library
- B. So, M. W. Hall, and P. C. Diniz. A compiler approach to fast hardware design space exploration in FPGA-based systems. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 165--176, 2002. Google ScholarDigital Library
- C. B. Spear. SystemVerilog for Verification: A Guide to Learning the Testbench Language Features. Springer, 2nd edition, 2010. Google ScholarDigital Library
- Adult tortoise, 2016. Finlay Cox. http://www.pasthorizonspr.com/wp-content/uploads/2016/02/tortoise.jpg License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/.Google Scholar
- X. Zhou, J.-P. Giacalone, M. J. Garzarán, R. H. Kuhn, Y. Ni, and D. Padua. Hierarchical overlapped tiling. In International symposium on Code Generation and Optimization, pages 207--218, 2012. Google ScholarDigital Library
Index Terms
- A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs
Recommendations
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS'15This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
Programming Heterogeneous Systems from an Image Processing DSL
Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating, “programming,” and ...
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS '15This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
Comments