skip to main content
research-article

A Mapping-Scheduling Algorithm for Hardware Acceleration on Reconfigurable Platforms

Published:04 July 2014Publication History
Skip Abstract Section

Abstract

Reconfigurable platforms are a promising technology that offers an interesting trade-off between flexibility and performance, which many recent embedded system applications demand, especially in fields such as multimedia processing. These applications typically involve multiple ad-hoc tasks for hardware acceleration, which are usually represented using formalisms such as Data Flow Diagrams (DFDs), Data Flow Graphs (DFGs), Control and Data Flow Graphs (CDFGs) or Petri Nets. However, none of these models is able to capture at the same time the pipeline behavior between tasks (that therefore can coexist in order to minimize the application execution time), their communication patterns, and their data dependencies. This article proves that the knowledge of all this information can be effectively exploited to reduce the resource requirements and the timing performance of modern reconfigurable systems, where a set of hardware accelerators is used to support the computation. For this purpose, this article proposes a novel task representation model, named Temporal Constrained Data Flow Diagram (TCDFD), which includes all this information. This article also presents a mapping-scheduling algorithm that is able to take advantage of the new TCDFD model. It aims at minimizing the dynamic reconfiguration overhead while meeting the communication requirements among the tasks. Experimental results show that the presented approach achieves up to 75% of resources saving and up to 89% of reconfiguration overhead reduction with respect to other state-of-the-art techniques for reconfigurable platforms.

References

  1. P. Alexander and C. Kong. 2001. Rosetta: Semantic support for model-centered systems-level design. Computer 34, 11, 64--70. DOI:http://dx.doi.org/10.1109/2.963446 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Anellal and B. Kaminska. 1993. Scheduling of a control and data flow graph. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1666--1669. DOI:http://dx.doi.org/10.1109/ISCAS. 1993.394061Google ScholarGoogle Scholar
  3. L. A. Belady. 1966. A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5, 2, 78--101. DOI:http://dx.doi.org/10.1147/sj.52.0078 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Bender. 1996. MILP based task mapping for heterogeneous multiprocessor systems. In Proceedings of the European Design Automation Conference with EURO-VHDL '96 and Exhibition. 190--197. DOI:http://dx.doi.org/10.1109/EURDAC.1996.558204 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, and M. Poncino. 2003. SystemC cosimulation and emulation of multiprocessor SoC designs. Computer 36, 4, 53--59. DOI:http://dx.doi.org/10.1109/MC.2003. 1193229 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Benini and G. De Micheli. 2002. Networks on chip: A new paradigm for systems on chip design. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 418--419. DOI:http://dx.doi.org/10.1109/DATE.2002.998307 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. Beretta, V. Rana, D. Atienza, and D. Sciuto. 2011a. Island-based adaptable embedded system design. IEEE Embedded Syst. Lett. 3, 2, 53--57. DOI:http://dx.doi.org/10.1109/LES.2011.2115991 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. Beretta, V. Rana, D. Atienza, and D. Sciuto. 2011b. A mapping flow for dynamically reconfigurable multi-core system-on-chip design. IEEE Trans. Comput.-Aid. Design Integr. Circuits Syst. 30, 8, 1211--1224. DOI:http://dx.doi.org/10.1109/TCAD.2011.2138140 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. D. Bruza and Th. P. van der Weide. 1993. The semantics of data flow diagrams. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 66--78. DOI:http://dx.doi.org/10.1.1.40.9398Google ScholarGoogle Scholar
  10. C. Chang, J. Wawrzynek, and R.W. Brodersen. 2005. BEE2: A high-end reconfigurable computing system. IEEE Des. Test Comput. 22, 2, 114--125. DOI:http://dx.doi.org/10.1109/MDT.2005.30 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. A. Clemente, I. Beretta, V. Rana, D. Atienza, and D. Sciuto. 2011a. A hybrid mapping-scheduling technique for dynamically reconfigurable hardware. In Proceedings of the 21st International Conference on Field Programmable Logic and Applications. 177--180. DOI:http://dx.doi.org/10.1109/FPL.2011.40 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. A. Clemente, J. Resano, C. Gonzalez, and D. Mozos. 2011b. A hardware implementation of a run-time scheduler for reconfigurable systems. IEEE Trans. VLSI Syst. 19, 7, 1263--1276. DOI:http://dx. doi.org/10.1109/TVLSI.2010.2050158 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. F. Corbetta, M. Morandi, M. Novati, M. D. Santambrogio, and D. Sciuto. 2007. Two novel approaches to online partial bitstream relocation in a dynamically reconfigurable system. In Proceedings of the IEEE Annual Symposium on VLSI. 457--458. DOI:http://dx.doi.org/10.1109/ISVLSI.2007.99 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Cordone, F. Redaelli, M. A. Redaelli, M. D. Santambrogio, and D. Sciuto. 2009. Partitioning and scheduling of task graphs on partially dynamically reconfigurable FPGAs. IEEE Trans. Comput.-Aid. Design Integr. Circuits Syst. 28, 5, 662--675. DOI:http://dx.doi.org/10.1109/TCAD.2009.2015739 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Eker, J. W. Janneck, E. A. Lee, J. Liu, X. Liu, J. Ludvig, S. Neuendorffer, S. Sachs, and Yuhong Xiong. 2003. Taming heterogeneity: The Ptolemy approach. Proc. IEEE 91, 1, 127--144. DOI:http://dx.doi.org/10.1109/JPROC.2002.805829Google ScholarGoogle ScholarCross RefCross Ref
  16. R. Eskinazi, M. E. Lima, P. R. M. Maciel, C. A. Valderrama, A. G. S. Filho, and P. S. B. Nascimento. 2005. A timed petri net approach for pre-runtime scheduling in partial and dynamic reconfigurable systems. In Proceedings of the 19th International Parallel and Distributed Processing Symposium. 154a. DOI:http://dx.doi.org/10.1109/IPDPS.2005.72 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Ferrandi, C. Pilato, D. Sciuto, and A. Tumeo. 2010. Mapping and scheduling of parallel C applications with ant colony optimization onto heterogeneous reconfigurable MPSoCs. In Proceedings of the 15th Asia and South Pacific Design Automation Conference. 799--804. DOI:http://dx.doi.org/10.1109/ASPDAC. 2010.5419782 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Ghiasi, A. Nahapetian, and M. Sarrafzadeh. 2004. An optimal algorithm for minimizing runtime reconfiguration delay. ACM Trans. Embed. Comput. Syst. 3, 237--256. DOI:http://dx.doi.org/10.1145/993396. 993398 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Hansson. 2005. A unified approach to mapping and routing in a combined guaranteed service and best-effort network-on-chip architecture. Tech. Rep., Lund University, Sweden.Google ScholarGoogle Scholar
  20. C. Haubelt, S. Otto, C. Grabbe, and J. Teich. 2005. A system-level approach to hardware reconfigurable systems. In Proceedings of the 10th Asia and South Pacific Design Automation Conference. 298--301. DOI:http://dx.doi.org/10.1109/ASPDAC.2005.1466177 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Hendrickson and R. Leland. 1994. The Chaco user's guide, Version 2.0. Tech. Rep. Sandia National Laboratories. http://www.cs.sandia.gov/_bahendr/chaco.htmlGoogle ScholarGoogle Scholar
  22. International Telecommunication Union (ITU). 1993. ITU-T Recommendation H.261. (1993). http://www.itu. int/rec/T-REC-H.261/eGoogle ScholarGoogle Scholar
  23. M. Janiaut, C. Tanougast, H. Rabah, Y. Berviller, C. Mannino, and S. Weber. 2005. Configurable hardware implementation of a conceptual decoder for a real-time mpeg-2 analysis. In Proceedings of the 15th International Conference on Field Programmable Logic and Applications. 386--390. DOI:http://dx.doi.org/10.1109/FPL.2005.1515752Google ScholarGoogle Scholar
  24. C. Kao. 2006. Benefits of partial reconfiguration. Xilinx.Google ScholarGoogle Scholar
  25. K. M. Kavi, B. P. Buckles, and U. N. Bhat. 1986. A formal definition of data flow graph models. IEEE Trans. Comput. C-35, 11, 940--948. DOI:http://dx.doi.org/10.1109/TC.1986.1676696 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. A. Lee, C. Hylands, J. Janneck, J. Davis II, J. Liu, X. Liu, S. Neuendorffer, S. Sachs M. Stewart, K. Vissers, and P. Whitaker. Overview of the Ptolemy project. Tech. Rep.Google ScholarGoogle Scholar
  27. M. Li and Y. Ruan. 2011. Approach to formalizing UML sequence diagrams. In Proceedings of the 3rd International Workshop on Intelligent Systems and Applications. 1--4. DOI:http://dx.doi.org/10.1109/ISA.2011. 5873348Google ScholarGoogle Scholar
  28. Z. Li. 2002. Configuration prefetching techniques for partial reconfigurable coprocessor with relocation and defragmentation. In Proceedings of the ACM/SIGDA 10th Symposium on Field-Programmable Gate Arrays. 187--195. DOI:http://dx.doi.org/10.1145/503048.503076 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Lindroth, N. Avessta, J. Teuhola, and T. Seceleanu. 2006. Complexity analysis of H.264 decoder for FPGA design. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1253--1256. DOI:http://dx.doi.org/10.1109/ICME.2006.262765Google ScholarGoogle Scholar
  30. S. Lukovic and L. Fiorin. 2008. An automated design flow for NoC-based MPSoCs on FPGA. In Proceedings of the 19th IEEE/IFIP International Symposium on Rapid System Prototyping. 58--64. DOI:http://dx.doi.org/10.1109/RSP.2008.31 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Mei-hua, C. Yu-lan, R. Feng, and C. Zhang-jin. 2007. Optimizing design and FPGA implementation for CABAC decoder. In Proceedings of the International Symposium on High Density packaging and Microsystem Integration. 1--5. DOI:http://dx.doi.org/10.1109/HDP.2007.4283645Google ScholarGoogle Scholar
  32. S. O. Memik, G. Memik, R. Jafari, and E. Kursun. 2003. Global resource sharing for synthesis of control data flow graphs on FPGAs. In Proceedings of the 50th Design Automation Conference. 604--609. DOI:http://dx.doi.org/10.1109/DAC.2003.1219090 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Murali, M. Coenen, A. Radulescu, K. Goossens, and G. De Micheli. 2006a. A methodology for mapping multiple use-cases onto networks on chips. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 118--123. DOI:http://dx.doi.org/10.1109/DATE.2006.244007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Murali, M. Coenen, A. Radulescu, K. Goossens, and G. De Micheli. 2006b. A methodology for mapping multiple use-cases onto networks on chips. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 1--6. DOI:http://dx.doi.org/10.1109/DATE.2006.244007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Noguera and R. M. Badía. 2004. Multitasking on reconfigurable architectures: microarchitecture support and dynamic scheduling. ACM Trans. Embed. Comput. Syst. 3, 2, 385--406. DOI:http://dx.doi.org/10.1145/993396.993404 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Resano, D. Mozos, D. Verkest, and F. Catthoor. 2005. A reconfiguration manager for dynamically reconfigurable hardware. IEEE Des. Test Comput. 22, 5, 452--460. DOI:http://dx.doi.org/10.1109/MDT.2005.100 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Roitzsch. 2007. Slice-balancing H.264 video encoding for improved scalability of multicore decoding. In Proceedings of the 7th ACM and IEEE International Conference on Embedded Software. 269--278. DOI:http://dx.doi.org/10.1145/1289927.1289969 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. H. Taghipour, J. Frounchi, and M. H. Zarifi. 2008. Design and implementation of MP3 decoder using partial dynamic reconfiguration on Virtex-4 FPGAs. In Proceedings of the International Conference on Computer and Communication Engineering. 683--686. DOI:http://dx.doi.org/10.1109/ICCCE.2008.4580691Google ScholarGoogle Scholar
  39. B. D. Theelen, M. C. W. Geilen, S. Stuijk, S. V. Gheorghita, T. Basten, J. P. M. Voeten, and A. H. Ghamarian. 2008. Scenario-aware data flow. Tech. Rep. Eindhoven University of Technology, Eindhoven, The Netherlands.Google ScholarGoogle Scholar
  40. M. Verderber, A. Zemva, and D. Lampret. 2003. HW/SW partitioned optimization and VLSI-FPGA implementation of the MPEG-2 video decoder. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 238--243 suppl. DOI:http://dx.doi.org/10.1109/DATE.2003.1253835 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. H. Walder and M. Platzner. 2004. A Runtime environment for reconfigurable hardware operating systems. In Proceedings of the 14th International Conference on Field Programmable Logic and Application, Lecture Notes in Computer Science, vol. 3203. Springer, 831--835. DOI:http://dx.doi.org/10.1007/978-3-540-30117-284Google ScholarGoogle Scholar
  42. S. Wildermann, F. Reimann, D. Ziener, and J. Teich. 2011. Symbolic design space exploration for multi-mode reconfigurable systems. In Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 129--138. DOI:http://dx.doi.org/10.1145/2039370.2039393 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xilinx Corporation. 2010. Virtex-5 FPGA User Guide.Google ScholarGoogle Scholar
  44. Xilinx Corporation. 2012a. MicroBlaze Processor Reference Guide.Google ScholarGoogle Scholar
  45. Xilinx Corporation. 2012b. Zynq-7000 extensible processing platform overview.Google ScholarGoogle Scholar
  46. D. Zaretsky, G. Mittal, R. P. Dick, and P. Banerjee. 2005. Generation of control and data flow graphs from scheduled and pipelined assembly code. In Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 4339, Springer, 76--90. DOI:http://dx.doi.org/10.1007/978-3-540-69330-7_6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. H. Zhang and G. Wu. 2009. Petri nets based scheduling modeling for embedded systems. In Proceedings of the 2nd International Conference on Intelligent Computation Technology and Automation, Vol. 4. 80--83. DOI:http://dx.doi.org/10.1109/ICICTA.2009.736 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. R. Zurawski and M. Zhou. 1994. Petri nets and industrial applications: A tutorial. IEEE Trans. Ind. Electron. 41, 6, 567--583. DOI:http://dx.doi.org/10.1109/41.334574Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Mapping-Scheduling Algorithm for Hardware Acceleration on Reconfigurable Platforms

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Reconfigurable Technology and Systems
            ACM Transactions on Reconfigurable Technology and Systems  Volume 7, Issue 2
            June 2014
            199 pages
            ISSN:1936-7406
            EISSN:1936-7414
            DOI:10.1145/2638850
            Issue’s Table of Contents

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 4 July 2014
            • Accepted: 1 November 2013
            • Revised: 1 August 2013
            • Received: 1 February 2013
            Published in trets Volume 7, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader