research-article

A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs

Authors:
Nitin Chugh

International Institute of Information Technology, Hyderabad, Hyderabad, India

International Institute of Information Technology, Hyderabad, Hyderabad, India
View Profile

,
Vinay Vasista

Indian Institute of Science, Bengaluru, India

Indian Institute of Science, Bengaluru, India
View Profile

,
Suresh Purini

International Institute of Information Technology, Hyderabad, Hyderabad, India

International Institute of Information Technology, Hyderabad, Hyderabad, India
View Profile

,
Uday Bondhugula

Indian Institute of Science, Bengaluru, India

Indian Institute of Science, Bengaluru, India
View Profile

PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationSeptember 2016Pages 327–338https://doi.org/10.1145/2967938.2967969

Published:11 September 2016Publication History

PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

Pages 327–338

ABSTRACT

This paper describes an automatic approach to accelerate image processing pipelines using FPGAs. An image processing pipeline can be viewed as a graph of interconnected stages that processes images successively. Each stage typically performs a point-wise, stencil, or other more complex operations on image pixels. Recent efforts have led to the development of domain-specific languages (DSL) and optimization frameworks for image processing pipelines. In this paper, we develop an approach to map image processing pipelines expressed in the PolyMage DSL to efficient parallel FPGA designs. Our approach exploits reuse and available memory bandwidth (or chip resources) maximally. When compared to Darkroom, a state-of-the-art approach to compile high-level DSL to FPGAs, our approach (a) leads to designs that deliver significantly higher throughput, and (b) supports a greater variety of filters. Furthermore, the designs we generate obtain an improvement even over pre-optimized FPGA implementations provided by vendor libraries for some of the benchmarks.

References

C. Alias, A. Darte, and A. Plesco. Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA. In International workshop on Polyhedral Compilation Techniques (IMPACT), 2012. Google ScholarDigital Library
J. Auerbach, D. F. Bacon, I. Burcea, P. Cheng, S. J. Fink, R. Rabbah, and S. Shukla. A compiler and runtime for heterogeneous computing. In Design Automation Conference, pages 271--276, 2012. Google ScholarDigital Library
D. F. Bacon, R. M. Rabbah, and S. Shukla. FPGA programming for the masses. Commun. ACM, 56(4):56--63, 2013. Google ScholarDigital Library
Blender Foundation. Big Buck Bunny, 2008. The movie. http://www.bigbuckbunny.org/ License: CC BY 3.0 https://creativecommons.org/licenses/by/3.0/.Google Scholar
U. Bondhugula, J. Ramanujam, and P. Sadayappan. Automatic mapping of nested loops to FPGAs. In ACM SIGPLAN PPoPP, Mar. 2007. Google ScholarDigital Library
J. M. Cardoso and D. P. C. Compilation Techniques for Reconfigurable Architectures. Springer US, 2009. Google ScholarDigital Library
Creative Commons Attribution 3.0 license (CC BY 3.0). https://creativecommons.org/licenses/by/3.0/.Google Scholar
Creative Commons Attribution-ShareAlike 3.0 license (CC BY-SA 3.0). https://creativecommons.org/licenses/by-sa/3.0/.Google Scholar
A. Darte, R. Schreiber, B. R. Rau, and F. Vivien. A Constructive Solution to the Juggling Problem in Processor Array Synthesis. In IPDPS, pages 815--822, 2000. Google ScholarDigital Library
C. Dase, J. Falcon, and B. MacCleery. Motorcycle control prototyping using an FPGA-based embedded control system. Control Systems, IEEE, 26(5):17--21, 2006.Google ScholarCross Ref
P. C. Diniz, M. W. Hall, J. Park, B. So, and H. Ziegler. Bridging the Gap between Compilation and Synthesis in the DEFACTO System. In LCPC, pages 52--70, 2001. Google ScholarDigital Library
M. B. Gokhale, J. M. Stone, J. Arnold, and M. Kalinowski. Stream-oriented FPGA computing in the Streams-C high level language. In IEEE symposium on Field-Programmable Custom Computing Machines, pages 49--56, 2000. Google ScholarDigital Library
Z. Guo, W. Najjar, and B. Buyukkurt. Efficient hardware code generation for FPGAs. ACM Trans. Archit. Code Optim., 5(1):6:1--6:26, May 2008. Google ScholarDigital Library
A. Hagiescu, W.-F. Wong, D. Bacon, and R. Rabbah. A computing origami: Folding streams in FPGAs. In ACM/IEEE Design Automation Conference, pages 282--287, 2009. Google ScholarDigital Library
J. Hegarty, J. Brunhaver, Z. DeVito, J. Ragan-Kelley, N. Cohen, S. Bell, A. Vasilyev, M. Horowitz, and P. Hanrahan. Darkroom: Compiling high-level image nprocessing code into hardware pipelines. ACM Trans. Graph., 33(4):144:1--144:11, 2014. Google ScholarDigital Library
The Heterogeneous Image Processing Acceleration Framework. http://hipacc-lang.org/.Google Scholar
J. Holewinski, L.-N. Pouchet, and P. Sadayappan. High-performance code generation for stencil computations on GPU architectures. In International conference on Supercomputing, pages 311--320, 2012. Google ScholarDigital Library
A. Hormati, M. Kudlur, S. Mahlke, D. Bacon, and R. Rabbah. Optimus: Efficient realization of streaming applications on FPGAs. In 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), pages 41--50, 2008. Google ScholarDigital Library
B. K. P. Horn and B. G. Schunck. Determining optical flow. Artif. Intell., 17(1-3):185--203, 1981.Google ScholarDigital Library
S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN conference on Programming Languages Design and Implementation, 2007. Google ScholarDigital Library
MATLAB HDL Coder. The MathWorks Inc. http://in.mathworks.com/products/hdl-coder//.Google Scholar
R. Membarth, O. Reiche, F. Hannig, J. Teich, M. Körner, and W. Eckert. Hipacc: A domain-specific language and compiler for image processing. IEEE Trans. Parallel Distrib. Syst., 27(1):210--224, 2016. Google ScholarDigital Library
R. T. Mullapudi, V. Vasista, and U. Bondhugula. Polymage: Automatic optimization for image processing pipelines. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 429--443, 2015. Google ScholarDigital Library
W. A. Najjar, W. Böhm, B. A. Draper, J. Hammes, R. Rinker, J. R. Beveridge, M. Chawathe, and C. Ross. High-level language abstraction for reconfigurable computing. Computer, 36(8):63--69, Aug. 2003. Google ScholarDigital Library
R. S. Nikhil and Arvind. What is bluespec? SIGDA Newsl., 39(1):1--1, Jan. 2009. Google ScholarDigital Library
M. Owaida, N. Bellas, K. Daloukas, and C. Antonopoulos. Synthesis of platform architectures from OpenCL programs. In IEEE Field-Programmable Custom Computing Machines (FCCM), pages 186--193, May 2011. Google ScholarDigital Library
P. R. Panda. Systemc: A modeling platform supporting multiple design abstractions. In 14th International symposium on Systems Synthesis, pages 75--80, 2001. Google ScholarDigital Library
A. Papakonstantinou, K. Gururaj, J. A. Stratton, D. Chen, J. Cong, and W. W. Hwu. Efficient compilation of CUDA kernels for high-performance computing on FPGAs. ACM Trans. Embedded Comput. Syst., 13(2):25, 2013. Google ScholarDigital Library
PolyMage benchmarks, 2015. https://github.com/bondhugula/polymage-benchmarks.Google Scholar
PolyMage: A DSL and compiler for automatic optimization of image processing pipelines, 2015. http://mcl.csa.iisc.ernet.in/polymage.html.Google Scholar
L. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In ACM/SIGDA International symposium on FPGAs, pages 29--38, 2013. Google ScholarDigital Library
J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 519--530, 2013. Google ScholarDigital Library
M. Ravishankar, J. Holewinski, and V. Grover. Forma: A dsl for image processing applications to target gpus and multi-core cpus. In 8th Workshop on General Purpose Processing Using GPUs, pages 109--120, 2015. Google ScholarDigital Library
O. Reiche, M. Schmid, F. Hannig, R. Membarth, and J. Teich. Code generation from a domain-specific language for C-based HLS of hardware accelerators. In 2014 International Conference on Hardware/Software Codesign and System Synthesis, pages 17:1--17:10, 2014. Google ScholarDigital Library
R. Schreiber, S. Aditya, S. Mahlke, V. Kathail, B. R. Rau, D. Cronquist, and M. Sivaraman. PICO-NPA: High-Level synthesis of non-programmable hardware maccelerators. J. VLSI Signal Process. Syst., 31(2):127--142, 2002. Google ScholarDigital Library
B. So, M. W. Hall, and P. C. Diniz. A compiler approach to fast hardware design space exploration in FPGA-based systems. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 165--176, 2002. Google ScholarDigital Library
C. B. Spear. SystemVerilog for Verification: A Guide to Learning the Testbench Language Features. Springer, 2nd edition, 2010. Google ScholarDigital Library
Adult tortoise, 2016. Finlay Cox. http://www.pasthorizonspr.com/wp-content/uploads/2016/02/tortoise.jpg License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/.Google Scholar
X. Zhou, J.-P. Giacalone, M. J. Garzarán, R. H. Kuhn, Y. Ni, and D. Padua. Hierarchical overlapped tiling. In International symposium on Code Generation and Optimization, pages 207--218, 2012. Google ScholarDigital Library

Index Terms

A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs
1. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging languages and compilers
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS'15

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
Read More
Programming Heterogeneous Systems from an Image Processing DSL

Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating, “programming,” and ...
Read More
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS '15

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation
September 2016
474 pages
ISBN:9781450341219
DOI:10.1145/2967938
General Chairs:
Ayal Zaks
Intel, Israel
,
Bilha Mendelson
Optitura, Israel
,
Program Chairs:
Lawrence Rauchwerger
Texas A&M University, USA
,
Wen-mei W. Hwu
University of Illinois at Urbana-Champaign, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 September 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
domain-specific language
dsl
fpgas
hls
image processing
parallelism
reuse
Qualifiers
- research-article
Conference

Acceptance Rates
PACT '16 Paper Acceptance Rate31of119submissions,26%Overall Acceptance Rate121of471submissions,26%
More
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 34
  Total Citations
  View Citations
- 495
  Total Downloads
- Downloads (Last 12 months)41
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.