skip to main content
research-article

A Software Scheme for Multithreading on CGRAs

Authors Info & Claims
Published:21 January 2015Publication History
Skip Abstract Section

Abstract

Recent industry trends show a drastic rise in the use of hand-held embedded devices, from everyday applications to medical (e.g., monitoring devices) and critical defense applications (e.g., sensor nodes). The two key requirements in the design of such devices are their processing capabilities and battery life. There is therefore an urgency to build high-performance and power-efficient embedded devices, inspiring researchers to develop novel system designs for the same. The use of a coprocessor (application-specific hardware) to offload power-hungry computations is gaining favor among system designers to suit their power budgets. We propose the use of CGRAs (Coarse-Grained Reconfigurable Arrays) as a power-efficient coprocessor. Though CGRAs have been widely used for streaming applications, the extensive compiler support required limits its applicability and use as a general purpose coprocessor. In addition, a CGRA structure can efficiently execute only one statically scheduled kernel at a time, which is a serious limitation when used as an accelerator to a multithreaded or multitasking processor. In this work, we envision a multithreaded CGRA where multiple schedules (or kernels) can be executed simultaneously on the CGRA (as a coprocessor). We propose a comprehensive software scheme that transforms the traditionally single-threaded CGRA into a multithreaded coprocessor to be used as a power-efficient accelerator for multithreaded embedded processors. Our software scheme includes (1) a compiler framework that integrates with existing CGRA mapping techniques to prepare kernels for execution on the multithreaded CGRA and (2) a runtime mechanism that dynamically schedules multiple kernels (offloaded from the processor) to execute simultaneously on the CGRA coprocessor. Our multithreaded CGRA coprocessor implementation thus makes it possible to achieve improved power-efficient computing in modern multithreaded embedded systems.

References

  1. ARM-A9. 2009. ARM-A9 Datasheet. Retrieved from http://www.arm.com/files/pdf/ARMCortexA-9Processors.pdf.Google ScholarGoogle Scholar
  2. F. Bouwens, M. Berekovic, A. Kanstein, and G. Gaydadjiev. 2007. Architectural exploration of the ADRES coarse-grained reconfigurable array. In ARC’07. 1--13. http://dl.acm.org/citation.cfm?id=1764631.1764633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. CUDA-fermi 2010. Tesla S2050 GPU Computing System. Retrieved from http://www.nvidia.com/docs/IO/43395/NV-DS-Tesla-S2050-june10-final-LORES.pdf.Google ScholarGoogle Scholar
  4. G. Dimitroulakos, S. Georgiopoulos, M. D. Galanis, and C. E. Goutis. 2009. Resource aware mapping on coarse grained reconfigurable arrays. Microprocess. Microsyst. 33, 2 (2009), 91--105. DOI: http://dx.doi.org/10.1016/j.micpro.2008.07.002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Dimitroulakos, M. D. Galanis, and C. E. Goutis. 2005. A compiler method for memory-conscious mapping of applications on coarse-grained reconfigurable architectures. In 19th IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society, Washington, DC, USA, 4. DOI: http://dx.doi.org/10.1109/IPDPS.2005.8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Ebeling, D. C. Cronquist, P. Franklin, J. Secosky, and S. G. Berg. 1997. Mapping applications to the RaPiD configurable architecture. In FCCM’97. IEEE Computer Society, 106--115. DOI: http://dx.doi.org/10.1109/FPGA.1997.624610 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Friedman, A. Carroll, B. Van Essen, B. Ylvisaker, C. Ebeling, and S. Hauck. 2009. SPR: An architecture-adaptive CGRA mapping tool. In FPGA’09. ACM, New York, NY, USA, 191--200. DOI: http://dx.doi.org/10.1145/1508128.1508158 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Hamzeh, A. Shrivastava, and S. Vrudhula. 2012. EPIMap: Using epimorphism to map applications on CGRAs. In DAC’12. ACM, 1284--1291. DOI: http://dx.doi.org/10.1145/2228360.2228600 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Hamzeh, A. Shrivastava, and S. Vrudhula. 2013. REGIMap: Register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs). In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, USA, Article 18, 10 pages. DOI: http://dx.doi.org/10.1145/2463209.2488756 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Hartenstein. 2001. A decade of reconfigurable computing: A visionary retrospective. In DATE’01. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. W. Hartenstein and R. Kress. 1995. A datapath synthesis system for the reconfigurable datapath architecture. In ASP-DAC’95. ACM, New York, NY, USA, Article 77. DOI: http://dx.doi.org/10.1145/224818.224959 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Hatanaka and N. Bagherzadeh. 2007. A modulo scheduling algorithm for a coarse-grain reconfigurable array template. In IPDPS’07. 1--8. DOI: http://dx.doi.org/10.1109/IPDPS.2007.370371Google ScholarGoogle Scholar
  13. Intel-N550. 2010. Intel N550 Datasheet. Retrieved from http://ark.intel.com/products/50154/Intel-Atom- Processor-N550-(1M-Cache-1_50-GHz).Google ScholarGoogle Scholar
  14. Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi. 2005. Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization. In DATE’05. IEEE Computer Society, Washington, DC, USA, 12--17. DOI: http://dx.doi.org/10.1109/DATE.2005.260 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Kim, R. N. Mahapatra, and K. Choi. 2010. Design space exploration for efficient resource utilization in coarse-grained reconfigurable architecture. In Transactions on VLSI Systems. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Liang and X. Huang. 2009. SmartCell: An energy efficient coarse-grained reconfigurable architecture for stream-based applications. EURASIP J. Embedded Syst. 2009, Article 1 (Jan. 2009), {15} pages. DOI: http://dx.doi.org/10.1155/2009/518659 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In FTP’02. 166--173. DOI: http://dx.doi.org/10.1109/FPT.2002.1188678Google ScholarGoogle Scholar
  18. B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2003. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In DATE’03. IEEE Computer Society. 296--301. DOI: http://dx.doi.org/10.1109/DATE.2003.1253623 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Mei, F.-J. Veredas, and B. Masschelein. 2005. Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture. In International Conference on Field Programmable Logic and Applications, 2005. 622--625. DOI: http://dx.doi.org/10.1109/FPL.2005.1515799Google ScholarGoogle Scholar
  20. B. Mei, M. Berekovic, and J.-Y. Mignolet. 2007. ADRES & DRESC: Architecture and compiler for coarse-grain reconfigurable processors. In Fine- and Coarse-Grain Reconfigurable Computing, S. Vassiliadis and D. Soudris (Eds.). Springer Netherlands, 255--297. DOI: http://dx.doi.org/10.1007/978-1-4020-6505-76Google ScholarGoogle Scholar
  21. H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-S Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In PACT’08. ACM, New York, NY, USA, 166--176. DOI: http://dx.doi.org/10.1145/1454115.1454140 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Park, Y. Park, and S. Mahlke. 2009a. Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications. In MICRO 42. ACM, New York, NY, USA, 370--380. DOI: http://dx.doi.org/10.1145/1669112.1669160 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Park, K. Fan, M. Kudlur, and S. Mahlke. 2006. Modulo graph embedding: Mapping applications onto coarse-grained reconfigurable architectures. In CASES’06. ACM, 136--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Park, H. Park, and S. Mahlke. 2009b. CGRA express: Accelerating execution using dynamic operation fusion. In CASES’09. ACM, New York, NY, USA, 271--280. DOI: http://dx.doi.org/10.1145/1629395.1629433 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Park, H. Park, and S. A. Mahlke. 2009. CGRA express: Accelerating execution using dynamic operation fusion. In CASES’09. 271--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Ramakrishna Rau. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In MICRO 27. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Shrivastava, J. Pager, R. Jeyapaul, M. H., and S. Vrudhula. 2011. Enabling multithreading on CGRAs. In ICPP’11. IEEE Computer Society, 255--264. DOI: http://dx.doi.org/10.1109/ICPP.2011.77 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. M. Chaves Filho. 2000. MorphoSys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Comput. 49, 5 (May 2000), 465--481. DOI: http://dx.doi.org/10.1109/12.859540 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. W. Yoon, J. W. Yoon, A. Shrivastava, S. Park, M. Ahn, R. Jeyapaul, and Y. Paek. 2008. SPKM: A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures. In DAC’08. 776--782. DOI: http://dx.doi.org/10.1109/ASPDAC.2008.4484056 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Software Scheme for Multithreading on CGRAs

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Embedded Computing Systems
                ACM Transactions on Embedded Computing Systems  Volume 14, Issue 1
                January 2015
                443 pages
                ISSN:1539-9087
                EISSN:1558-3465
                DOI:10.1145/2724585
                Issue’s Table of Contents

                Copyright © 2015 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 21 January 2015
                • Accepted: 1 June 2014
                • Revised: 1 April 2014
                • Received: 1 December 2011
                Published in tecs Volume 14, Issue 1

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader