research-article

A Mapping-Scheduling Algorithm for Hardware Acceleration on Reconfigurable Platforms

Authors:
Juan Antonio Clemente

Universidad Complutense de Madrid, Spain

Universidad Complutense de Madrid, Spain
View Profile

,
Ivan Beretta

École Polytechnique Fédérale de Lausanne, Switzerland

École Polytechnique Fédérale de Lausanne, Switzerland
View Profile

,
Vincenzo Rana

École Polytechnique Fédérale de Lausanne, Switzerland

École Polytechnique Fédérale de Lausanne, Switzerland
View Profile

,
David Atienza

École Polytechnique Fédérale de Lausanne, Switzerland

École Polytechnique Fédérale de Lausanne, Switzerland
View Profile

,
Donatella Sciuto

Politecnico di Milano, Italy

Politecnico di Milano, Italy
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 7 Issue 2Article No.: 9pp 1–27https://doi.org/10.1145/2611562

Published:04 July 2014Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

Reconfigurable platforms are a promising technology that offers an interesting trade-off between flexibility and performance, which many recent embedded system applications demand, especially in fields such as multimedia processing. These applications typically involve multiple ad-hoc tasks for hardware acceleration, which are usually represented using formalisms such as Data Flow Diagrams (DFDs), Data Flow Graphs (DFGs), Control and Data Flow Graphs (CDFGs) or Petri Nets. However, none of these models is able to capture at the same time the pipeline behavior between tasks (that therefore can coexist in order to minimize the application execution time), their communication patterns, and their data dependencies. This article proves that the knowledge of all this information can be effectively exploited to reduce the resource requirements and the timing performance of modern reconfigurable systems, where a set of hardware accelerators is used to support the computation. For this purpose, this article proposes a novel task representation model, named Temporal Constrained Data Flow Diagram (TCDFD), which includes all this information. This article also presents a mapping-scheduling algorithm that is able to take advantage of the new TCDFD model. It aims at minimizing the dynamic reconfiguration overhead while meeting the communication requirements among the tasks. Experimental results show that the presented approach achieves up to 75% of resources saving and up to 89% of reconfiguration overhead reduction with respect to other state-of-the-art techniques for reconfigurable platforms.

References

P. Alexander and C. Kong. 2001. Rosetta: Semantic support for model-centered systems-level design. Computer 34, 11, 64--70. DOI:http://dx.doi.org/10.1109/2.963446 Google ScholarDigital Library
S. Anellal and B. Kaminska. 1993. Scheduling of a control and data flow graph. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1666--1669. DOI:http://dx.doi.org/10.1109/ISCAS. 1993.394061Google Scholar
L. A. Belady. 1966. A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5, 2, 78--101. DOI:http://dx.doi.org/10.1147/sj.52.0078 Google ScholarDigital Library
A. Bender. 1996. MILP based task mapping for heterogeneous multiprocessor systems. In Proceedings of the European Design Automation Conference with EURO-VHDL '96 and Exhibition. 190--197. DOI:http://dx.doi.org/10.1109/EURDAC.1996.558204 Google ScholarDigital Library
L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, and M. Poncino. 2003. SystemC cosimulation and emulation of multiprocessor SoC designs. Computer 36, 4, 53--59. DOI:http://dx.doi.org/10.1109/MC.2003. 1193229 Google ScholarDigital Library
L. Benini and G. De Micheli. 2002. Networks on chip: A new paradigm for systems on chip design. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 418--419. DOI:http://dx.doi.org/10.1109/DATE.2002.998307 Google ScholarDigital Library
I. Beretta, V. Rana, D. Atienza, and D. Sciuto. 2011a. Island-based adaptable embedded system design. IEEE Embedded Syst. Lett. 3, 2, 53--57. DOI:http://dx.doi.org/10.1109/LES.2011.2115991 Google ScholarDigital Library
I. Beretta, V. Rana, D. Atienza, and D. Sciuto. 2011b. A mapping flow for dynamically reconfigurable multi-core system-on-chip design. IEEE Trans. Comput.-Aid. Design Integr. Circuits Syst. 30, 8, 1211--1224. DOI:http://dx.doi.org/10.1109/TCAD.2011.2138140 Google ScholarDigital Library
P. D. Bruza and Th. P. van der Weide. 1993. The semantics of data flow diagrams. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 66--78. DOI:http://dx.doi.org/10.1.1.40.9398Google Scholar
C. Chang, J. Wawrzynek, and R.W. Brodersen. 2005. BEE2: A high-end reconfigurable computing system. IEEE Des. Test Comput. 22, 2, 114--125. DOI:http://dx.doi.org/10.1109/MDT.2005.30 Google ScholarDigital Library
J. A. Clemente, I. Beretta, V. Rana, D. Atienza, and D. Sciuto. 2011a. A hybrid mapping-scheduling technique for dynamically reconfigurable hardware. In Proceedings of the 21st International Conference on Field Programmable Logic and Applications. 177--180. DOI:http://dx.doi.org/10.1109/FPL.2011.40 Google ScholarDigital Library
J. A. Clemente, J. Resano, C. Gonzalez, and D. Mozos. 2011b. A hardware implementation of a run-time scheduler for reconfigurable systems. IEEE Trans. VLSI Syst. 19, 7, 1263--1276. DOI:http://dx. doi.org/10.1109/TVLSI.2010.2050158 Google ScholarDigital Library
S. F. Corbetta, M. Morandi, M. Novati, M. D. Santambrogio, and D. Sciuto. 2007. Two novel approaches to online partial bitstream relocation in a dynamically reconfigurable system. In Proceedings of the IEEE Annual Symposium on VLSI. 457--458. DOI:http://dx.doi.org/10.1109/ISVLSI.2007.99 Google ScholarDigital Library
R. Cordone, F. Redaelli, M. A. Redaelli, M. D. Santambrogio, and D. Sciuto. 2009. Partitioning and scheduling of task graphs on partially dynamically reconfigurable FPGAs. IEEE Trans. Comput.-Aid. Design Integr. Circuits Syst. 28, 5, 662--675. DOI:http://dx.doi.org/10.1109/TCAD.2009.2015739 Google ScholarDigital Library
J. Eker, J. W. Janneck, E. A. Lee, J. Liu, X. Liu, J. Ludvig, S. Neuendorffer, S. Sachs, and Yuhong Xiong. 2003. Taming heterogeneity: The Ptolemy approach. Proc. IEEE 91, 1, 127--144. DOI:http://dx.doi.org/10.1109/JPROC.2002.805829Google ScholarCross Ref
R. Eskinazi, M. E. Lima, P. R. M. Maciel, C. A. Valderrama, A. G. S. Filho, and P. S. B. Nascimento. 2005. A timed petri net approach for pre-runtime scheduling in partial and dynamic reconfigurable systems. In Proceedings of the 19th International Parallel and Distributed Processing Symposium. 154a. DOI:http://dx.doi.org/10.1109/IPDPS.2005.72 Google ScholarDigital Library
F. Ferrandi, C. Pilato, D. Sciuto, and A. Tumeo. 2010. Mapping and scheduling of parallel C applications with ant colony optimization onto heterogeneous reconfigurable MPSoCs. In Proceedings of the 15th Asia and South Pacific Design Automation Conference. 799--804. DOI:http://dx.doi.org/10.1109/ASPDAC. 2010.5419782 Google ScholarDigital Library
S. Ghiasi, A. Nahapetian, and M. Sarrafzadeh. 2004. An optimal algorithm for minimizing runtime reconfiguration delay. ACM Trans. Embed. Comput. Syst. 3, 237--256. DOI:http://dx.doi.org/10.1145/993396. 993398 Google ScholarDigital Library
A. Hansson. 2005. A unified approach to mapping and routing in a combined guaranteed service and best-effort network-on-chip architecture. Tech. Rep., Lund University, Sweden.Google Scholar
C. Haubelt, S. Otto, C. Grabbe, and J. Teich. 2005. A system-level approach to hardware reconfigurable systems. In Proceedings of the 10th Asia and South Pacific Design Automation Conference. 298--301. DOI:http://dx.doi.org/10.1109/ASPDAC.2005.1466177 Google ScholarDigital Library
B. Hendrickson and R. Leland. 1994. The Chaco user's guide, Version 2.0. Tech. Rep. Sandia National Laboratories. http://www.cs.sandia.gov/_bahendr/chaco.htmlGoogle Scholar
International Telecommunication Union (ITU). 1993. ITU-T Recommendation H.261. (1993). http://www.itu. int/rec/T-REC-H.261/eGoogle Scholar
M. Janiaut, C. Tanougast, H. Rabah, Y. Berviller, C. Mannino, and S. Weber. 2005. Configurable hardware implementation of a conceptual decoder for a real-time mpeg-2 analysis. In Proceedings of the 15th International Conference on Field Programmable Logic and Applications. 386--390. DOI:http://dx.doi.org/10.1109/FPL.2005.1515752Google Scholar
C. Kao. 2006. Benefits of partial reconfiguration. Xilinx.Google Scholar
K. M. Kavi, B. P. Buckles, and U. N. Bhat. 1986. A formal definition of data flow graph models. IEEE Trans. Comput. C-35, 11, 940--948. DOI:http://dx.doi.org/10.1109/TC.1986.1676696 Google ScholarDigital Library
E. A. Lee, C. Hylands, J. Janneck, J. Davis II, J. Liu, X. Liu, S. Neuendorffer, S. Sachs M. Stewart, K. Vissers, and P. Whitaker. Overview of the Ptolemy project. Tech. Rep.Google Scholar
M. Li and Y. Ruan. 2011. Approach to formalizing UML sequence diagrams. In Proceedings of the 3rd International Workshop on Intelligent Systems and Applications. 1--4. DOI:http://dx.doi.org/10.1109/ISA.2011. 5873348Google Scholar
Z. Li. 2002. Configuration prefetching techniques for partial reconfigurable coprocessor with relocation and defragmentation. In Proceedings of the ACM/SIGDA 10th Symposium on Field-Programmable Gate Arrays. 187--195. DOI:http://dx.doi.org/10.1145/503048.503076 Google ScholarDigital Library
T. Lindroth, N. Avessta, J. Teuhola, and T. Seceleanu. 2006. Complexity analysis of H.264 decoder for FPGA design. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1253--1256. DOI:http://dx.doi.org/10.1109/ICME.2006.262765Google Scholar
S. Lukovic and L. Fiorin. 2008. An automated design flow for NoC-based MPSoCs on FPGA. In Proceedings of the 19th IEEE/IFIP International Symposium on Rapid System Prototyping. 58--64. DOI:http://dx.doi.org/10.1109/RSP.2008.31 Google ScholarDigital Library
X. Mei-hua, C. Yu-lan, R. Feng, and C. Zhang-jin. 2007. Optimizing design and FPGA implementation for CABAC decoder. In Proceedings of the International Symposium on High Density packaging and Microsystem Integration. 1--5. DOI:http://dx.doi.org/10.1109/HDP.2007.4283645Google Scholar
S. O. Memik, G. Memik, R. Jafari, and E. Kursun. 2003. Global resource sharing for synthesis of control data flow graphs on FPGAs. In Proceedings of the 50th Design Automation Conference. 604--609. DOI:http://dx.doi.org/10.1109/DAC.2003.1219090 Google ScholarDigital Library
S. Murali, M. Coenen, A. Radulescu, K. Goossens, and G. De Micheli. 2006a. A methodology for mapping multiple use-cases onto networks on chips. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 118--123. DOI:http://dx.doi.org/10.1109/DATE.2006.244007 Google ScholarDigital Library
S. Murali, M. Coenen, A. Radulescu, K. Goossens, and G. De Micheli. 2006b. A methodology for mapping multiple use-cases onto networks on chips. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 1--6. DOI:http://dx.doi.org/10.1109/DATE.2006.244007 Google ScholarDigital Library
J. Noguera and R. M. Badía. 2004. Multitasking on reconfigurable architectures: microarchitecture support and dynamic scheduling. ACM Trans. Embed. Comput. Syst. 3, 2, 385--406. DOI:http://dx.doi.org/10.1145/993396.993404 Google ScholarDigital Library
J. Resano, D. Mozos, D. Verkest, and F. Catthoor. 2005. A reconfiguration manager for dynamically reconfigurable hardware. IEEE Des. Test Comput. 22, 5, 452--460. DOI:http://dx.doi.org/10.1109/MDT.2005.100 Google ScholarDigital Library
M. Roitzsch. 2007. Slice-balancing H.264 video encoding for improved scalability of multicore decoding. In Proceedings of the 7th ACM and IEEE International Conference on Embedded Software. 269--278. DOI:http://dx.doi.org/10.1145/1289927.1289969 Google ScholarDigital Library
H. Taghipour, J. Frounchi, and M. H. Zarifi. 2008. Design and implementation of MP3 decoder using partial dynamic reconfiguration on Virtex-4 FPGAs. In Proceedings of the International Conference on Computer and Communication Engineering. 683--686. DOI:http://dx.doi.org/10.1109/ICCCE.2008.4580691Google Scholar
B. D. Theelen, M. C. W. Geilen, S. Stuijk, S. V. Gheorghita, T. Basten, J. P. M. Voeten, and A. H. Ghamarian. 2008. Scenario-aware data flow. Tech. Rep. Eindhoven University of Technology, Eindhoven, The Netherlands.Google Scholar
M. Verderber, A. Zemva, and D. Lampret. 2003. HW/SW partitioned optimization and VLSI-FPGA implementation of the MPEG-2 video decoder. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 238--243 suppl. DOI:http://dx.doi.org/10.1109/DATE.2003.1253835 Google ScholarDigital Library
H. Walder and M. Platzner. 2004. A Runtime environment for reconfigurable hardware operating systems. In Proceedings of the 14th International Conference on Field Programmable Logic and Application, Lecture Notes in Computer Science, vol. 3203. Springer, 831--835. DOI:http://dx.doi.org/10.1007/978-3-540-30117-284Google Scholar
S. Wildermann, F. Reimann, D. Ziener, and J. Teich. 2011. Symbolic design space exploration for multi-mode reconfigurable systems. In Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 129--138. DOI:http://dx.doi.org/10.1145/2039370.2039393 Google ScholarDigital Library
Xilinx Corporation. 2010. Virtex-5 FPGA User Guide.Google Scholar
Xilinx Corporation. 2012a. MicroBlaze Processor Reference Guide.Google Scholar
Xilinx Corporation. 2012b. Zynq-7000 extensible processing platform overview.Google Scholar
D. Zaretsky, G. Mittal, R. P. Dick, and P. Banerjee. 2005. Generation of control and data flow graphs from scheduled and pipelined assembly code. In Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 4339, Springer, 76--90. DOI:http://dx.doi.org/10.1007/978-3-540-69330-7_6 Google ScholarDigital Library
H. Zhang and G. Wu. 2009. Petri nets based scheduling modeling for embedded systems. In Proceedings of the 2nd International Conference on Intelligent Computation Technology and Automation, Vol. 4. 80--83. DOI:http://dx.doi.org/10.1109/ICICTA.2009.736 Google ScholarDigital Library
R. Zurawski and M. Zhou. 1994. Petri nets and industrial applications: A tutorial. IEEE Trans. Ind. Electron. 41, 6, 567--583. DOI:http://dx.doi.org/10.1109/41.334574Google ScholarCross Ref

Index Terms

A Mapping-Scheduling Algorithm for Hardware Acceleration on Reconfigurable Platforms
1. Computer systems organization

Recommendations

Fingerprint image processing acceleration through run-time reconfigurable hardware

To the best of the authors' knowledge, this is the first brief that implements a complete automatic fingerprint-based authentication system (AFAS) application under a dynamically partial self-reconfigurable field-programmable gate array (FPGA). The main ...
Read More
Microkernel Architecture and Hardware Abstraction Layer of a Reliable Reconfigurable Real-Time Operating System (R3TOS)

This article presents a new solution for easing the development of reconfigurable applications using Field-Programable Gate Arrays (FPGAs). Namely, our Reliable Reconfigurable Real-Time Operating System (R3TOS) provides OS-like support for partially ...
Read More
Hardware and software infrastructure to implement many-core systems in modern FPGAs
SBCCI '17: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the Sands

Many-core systems are increasingly popular in embedded systems due to their high-performance and flexibility to execute different workloads. These many-core systems provide a rich processing fabric but lack the flexibility to accelerate critical ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Reconfigurable Technology and Systems Volume 7, Issue 2
June 2014
199 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/2638850
Issue’s Table of Contents

Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 July 2014
- Accepted: 1 November 2013
- Revised: 1 August 2013
- Received: 1 February 2013
Published in trets Volume 7, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Mapping
reconfigurable systems
reconfiguration overheads
runtime reconfiguration
task scheduling
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 502
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Mapping-Scheduling Algorithm for Hardware Acceleration on Reconfigurable Platforms

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Fingerprint image processing acceleration through run-time reconfigurable hardware

Microkernel Architecture and Hardware Abstraction Layer of a Reliable Reconfigurable Real-Time Operating System (R3TOS)

Hardware and software infrastructure to implement many-core systems in modern FPGAs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Mapping-Scheduling Algorithm for Hardware Acceleration on Reconfigurable Platforms

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Fingerprint image processing acceleration through run-time reconfigurable hardware

Microkernel Architecture and Hardware Abstraction Layer of a Reliable Reconfigurable Real-Time Operating System (R3TOS)

Hardware and software infrastructure to implement many-core systems in modern FPGAs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media