research-article

A Software Scheme for Multithreading on CGRAs

Authors:
Jared Pager

Compiler Microarchitecture Lab, Arizona State University, Arizona

Compiler Microarchitecture Lab, Arizona State University, Arizona
View Profile

,
Reiley Jeyapaul

Compiler Microarchitecture Lab, Arizona State University

Compiler Microarchitecture Lab, Arizona State University
View Profile

,
Aviral Shrivastava

Compiler Microarchitecture Lab, Arizona State University

Compiler Microarchitecture Lab, Arizona State University
View Profile

Authors Info & Claims

ACM Transactions on Embedded Computing Systems Volume 14 Issue 1Article No.: 19pp 1–26https://doi.org/10.1145/2638558

Published:21 January 2015Publication History

ACM Transactions on Embedded Computing Systems

Abstract

Recent industry trends show a drastic rise in the use of hand-held embedded devices, from everyday applications to medical (e.g., monitoring devices) and critical defense applications (e.g., sensor nodes). The two key requirements in the design of such devices are their processing capabilities and battery life. There is therefore an urgency to build high-performance and power-efficient embedded devices, inspiring researchers to develop novel system designs for the same. The use of a coprocessor (application-specific hardware) to offload power-hungry computations is gaining favor among system designers to suit their power budgets. We propose the use of CGRAs (Coarse-Grained Reconfigurable Arrays) as a power-efficient coprocessor. Though CGRAs have been widely used for streaming applications, the extensive compiler support required limits its applicability and use as a general purpose coprocessor. In addition, a CGRA structure can efficiently execute only one statically scheduled kernel at a time, which is a serious limitation when used as an accelerator to a multithreaded or multitasking processor. In this work, we envision a multithreaded CGRA where multiple schedules (or kernels) can be executed simultaneously on the CGRA (as a coprocessor). We propose a comprehensive software scheme that transforms the traditionally single-threaded CGRA into a multithreaded coprocessor to be used as a power-efficient accelerator for multithreaded embedded processors. Our software scheme includes (1) a compiler framework that integrates with existing CGRA mapping techniques to prepare kernels for execution on the multithreaded CGRA and (2) a runtime mechanism that dynamically schedules multiple kernels (offloaded from the processor) to execute simultaneously on the CGRA coprocessor. Our multithreaded CGRA coprocessor implementation thus makes it possible to achieve improved power-efficient computing in modern multithreaded embedded systems.

References

ARM-A9. 2009. ARM-A9 Datasheet. Retrieved from http://www.arm.com/files/pdf/ARMCortexA-9Processors.pdf.Google Scholar
F. Bouwens, M. Berekovic, A. Kanstein, and G. Gaydadjiev. 2007. Architectural exploration of the ADRES coarse-grained reconfigurable array. In ARC’07. 1--13. http://dl.acm.org/citation.cfm&quest;id=1764631.1764633. Google ScholarDigital Library
CUDA-fermi 2010. Tesla S2050 GPU Computing System. Retrieved from http://www.nvidia.com/docs/IO/43395/NV-DS-Tesla-S2050-june10-final-LORES.pdf.Google Scholar
G. Dimitroulakos, S. Georgiopoulos, M. D. Galanis, and C. E. Goutis. 2009. Resource aware mapping on coarse grained reconfigurable arrays. Microprocess. Microsyst. 33, 2 (2009), 91--105. DOI: http://dx.doi.org/10.1016/j.micpro.2008.07.002 Google ScholarDigital Library
G. Dimitroulakos, M. D. Galanis, and C. E. Goutis. 2005. A compiler method for memory-conscious mapping of applications on coarse-grained reconfigurable architectures. In 19th IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society, Washington, DC, USA, 4. DOI: http://dx.doi.org/10.1109/IPDPS.2005.8 Google ScholarDigital Library
C. Ebeling, D. C. Cronquist, P. Franklin, J. Secosky, and S. G. Berg. 1997. Mapping applications to the RaPiD configurable architecture. In FCCM’97. IEEE Computer Society, 106--115. DOI: http://dx.doi.org/10.1109/FPGA.1997.624610 Google ScholarDigital Library
S. Friedman, A. Carroll, B. Van Essen, B. Ylvisaker, C. Ebeling, and S. Hauck. 2009. SPR: An architecture-adaptive CGRA mapping tool. In FPGA’09. ACM, New York, NY, USA, 191--200. DOI: http://dx.doi.org/10.1145/1508128.1508158 Google ScholarDigital Library
M. Hamzeh, A. Shrivastava, and S. Vrudhula. 2012. EPIMap: Using epimorphism to map applications on CGRAs. In DAC’12. ACM, 1284--1291. DOI: http://dx.doi.org/10.1145/2228360.2228600 Google ScholarDigital Library
M. Hamzeh, A. Shrivastava, and S. Vrudhula. 2013. REGIMap: Register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs). In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, USA, Article 18, 10 pages. DOI: http://dx.doi.org/10.1145/2463209.2488756 Google ScholarDigital Library
R. Hartenstein. 2001. A decade of reconfigurable computing: A visionary retrospective. In DATE’01. IEEE Press. Google ScholarDigital Library
R. W. Hartenstein and R. Kress. 1995. A datapath synthesis system for the reconfigurable datapath architecture. In ASP-DAC’95. ACM, New York, NY, USA, Article 77. DOI: http://dx.doi.org/10.1145/224818.224959 Google ScholarDigital Library
A. Hatanaka and N. Bagherzadeh. 2007. A modulo scheduling algorithm for a coarse-grain reconfigurable array template. In IPDPS’07. 1--8. DOI: http://dx.doi.org/10.1109/IPDPS.2007.370371Google Scholar
Intel-N550. 2010. Intel N550 Datasheet. Retrieved from http://ark.intel.com/products/50154/Intel-Atom- Processor-N550-(1M-Cache-1_50-GHz).Google Scholar
Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi. 2005. Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization. In DATE’05. IEEE Computer Society, Washington, DC, USA, 12--17. DOI: http://dx.doi.org/10.1109/DATE.2005.260 Google ScholarDigital Library
Y. Kim, R. N. Mahapatra, and K. Choi. 2010. Design space exploration for efficient resource utilization in coarse-grained reconfigurable architecture. In Transactions on VLSI Systems. IEEE Press. Google ScholarDigital Library
C. Liang and X. Huang. 2009. SmartCell: An energy efficient coarse-grained reconfigurable architecture for stream-based applications. EURASIP J. Embedded Syst. 2009, Article 1 (Jan. 2009), {15} pages. DOI: http://dx.doi.org/10.1155/2009/518659 Google ScholarDigital Library
B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In FTP’02. 166--173. DOI: http://dx.doi.org/10.1109/FPT.2002.1188678Google Scholar
B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2003. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In DATE’03. IEEE Computer Society. 296--301. DOI: http://dx.doi.org/10.1109/DATE.2003.1253623 Google ScholarDigital Library
B. Mei, F.-J. Veredas, and B. Masschelein. 2005. Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture. In International Conference on Field Programmable Logic and Applications, 2005. 622--625. DOI: http://dx.doi.org/10.1109/FPL.2005.1515799Google Scholar
B. Mei, M. Berekovic, and J.-Y. Mignolet. 2007. ADRES & DRESC: Architecture and compiler for coarse-grain reconfigurable processors. In Fine- and Coarse-Grain Reconfigurable Computing, S. Vassiliadis and D. Soudris (Eds.). Springer Netherlands, 255--297. DOI: http://dx.doi.org/10.1007/978-1-4020-6505-76Google Scholar
H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-S Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In PACT’08. ACM, New York, NY, USA, 166--176. DOI: http://dx.doi.org/10.1145/1454115.1454140 Google ScholarDigital Library
H. Park, Y. Park, and S. Mahlke. 2009a. Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications. In MICRO 42. ACM, New York, NY, USA, 370--380. DOI: http://dx.doi.org/10.1145/1669112.1669160 Google ScholarDigital Library
H. Park, K. Fan, M. Kudlur, and S. Mahlke. 2006. Modulo graph embedding: Mapping applications onto coarse-grained reconfigurable architectures. In CASES’06. ACM, 136--146. Google ScholarDigital Library
Y. Park, H. Park, and S. Mahlke. 2009b. CGRA express: Accelerating execution using dynamic operation fusion. In CASES’09. ACM, New York, NY, USA, 271--280. DOI: http://dx.doi.org/10.1145/1629395.1629433 Google ScholarDigital Library
Y. Park, H. Park, and S. A. Mahlke. 2009. CGRA express: Accelerating execution using dynamic operation fusion. In CASES’09. 271--280. Google ScholarDigital Library
B. Ramakrishna Rau. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In MICRO 27. ACM. Google ScholarDigital Library
A. Shrivastava, J. Pager, R. Jeyapaul, M. H., and S. Vrudhula. 2011. Enabling multithreading on CGRAs. In ICPP’11. IEEE Computer Society, 255--264. DOI: http://dx.doi.org/10.1109/ICPP.2011.77 Google ScholarDigital Library
H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. M. Chaves Filho. 2000. MorphoSys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Comput. 49, 5 (May 2000), 465--481. DOI: http://dx.doi.org/10.1109/12.859540 Google ScholarDigital Library
J. W. Yoon, J. W. Yoon, A. Shrivastava, S. Park, M. Ahn, R. Jeyapaul, and Y. Paek. 2008. SPKM: A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures. In DAC’08. 776--782. DOI: http://dx.doi.org/10.1109/ASPDAC.2008.4484056 Google ScholarDigital Library

Index Terms

A Software Scheme for Multithreading on CGRAs

Recommendations

Enabling Multithreading on CGRAs
ICPP '11: Proceedings of the 2011 International Conference on Parallel Processing

Coarse-Grained Reconfigurable Arrays or CGRAs are programmable fabrics that promise both high performance and high power efficiency. Traditionally, CGRAs were used to accelerate extremely-embedded systems, and were typically manually programmed. However,...
Read More
A power-efficient adaptive heapsort for fpga-based image coding application (abstract only)
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

This paper presents an adaptive heap sort architecture for an image coding implementation on FPGA, which specifically addresses the issue of sorting different amount of data located in each subband during the coding. The proposed sorting architecture is ...
Read More
Elastic CGRAs
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Vital technology trends such as voltage scaling and homogeneous multicore scaling have reached their limits and architects turn to alternate computing paradigms, such as heterogeneous and domain-specialized solutions. Coarse-Grain Reconfigurable Arrays (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Embedded Computing Systems Volume 14, Issue 1
January 2015
443 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2724585
Editor:
Sandeep K. Shukla
Virginia Tech, USA
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 21 January 2015
- Accepted: 1 June 2014
- Revised: 1 April 2014
- Received: 1 December 2011
Published in tecs Volume 14, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CGRA
compiler framework
embedded system
multithreading
power efficiency
runtime transformation
scheduling
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 286
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Software Scheme for Multithreading on CGRAs

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

Enabling Multithreading on CGRAs

A power-efficient adaptive heapsort for fpga-based image coding application (abstract only)

Elastic CGRAs