Abstract
Recent industry trends show a drastic rise in the use of hand-held embedded devices, from everyday applications to medical (e.g., monitoring devices) and critical defense applications (e.g., sensor nodes). The two key requirements in the design of such devices are their processing capabilities and battery life. There is therefore an urgency to build high-performance and power-efficient embedded devices, inspiring researchers to develop novel system designs for the same. The use of a coprocessor (application-specific hardware) to offload power-hungry computations is gaining favor among system designers to suit their power budgets. We propose the use of CGRAs (Coarse-Grained Reconfigurable Arrays) as a power-efficient coprocessor. Though CGRAs have been widely used for streaming applications, the extensive compiler support required limits its applicability and use as a general purpose coprocessor. In addition, a CGRA structure can efficiently execute only one statically scheduled kernel at a time, which is a serious limitation when used as an accelerator to a multithreaded or multitasking processor. In this work, we envision a multithreaded CGRA where multiple schedules (or kernels) can be executed simultaneously on the CGRA (as a coprocessor). We propose a comprehensive software scheme that transforms the traditionally single-threaded CGRA into a multithreaded coprocessor to be used as a power-efficient accelerator for multithreaded embedded processors. Our software scheme includes (1) a compiler framework that integrates with existing CGRA mapping techniques to prepare kernels for execution on the multithreaded CGRA and (2) a runtime mechanism that dynamically schedules multiple kernels (offloaded from the processor) to execute simultaneously on the CGRA coprocessor. Our multithreaded CGRA coprocessor implementation thus makes it possible to achieve improved power-efficient computing in modern multithreaded embedded systems.
- ARM-A9. 2009. ARM-A9 Datasheet. Retrieved from http://www.arm.com/files/pdf/ARMCortexA-9Processors.pdf.Google Scholar
- F. Bouwens, M. Berekovic, A. Kanstein, and G. Gaydadjiev. 2007. Architectural exploration of the ADRES coarse-grained reconfigurable array. In ARC’07. 1--13. http://dl.acm.org/citation.cfm?id=1764631.1764633. Google ScholarDigital Library
- CUDA-fermi 2010. Tesla S2050 GPU Computing System. Retrieved from http://www.nvidia.com/docs/IO/43395/NV-DS-Tesla-S2050-june10-final-LORES.pdf.Google Scholar
- G. Dimitroulakos, S. Georgiopoulos, M. D. Galanis, and C. E. Goutis. 2009. Resource aware mapping on coarse grained reconfigurable arrays. Microprocess. Microsyst. 33, 2 (2009), 91--105. DOI: http://dx.doi.org/10.1016/j.micpro.2008.07.002 Google ScholarDigital Library
- G. Dimitroulakos, M. D. Galanis, and C. E. Goutis. 2005. A compiler method for memory-conscious mapping of applications on coarse-grained reconfigurable architectures. In 19th IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society, Washington, DC, USA, 4. DOI: http://dx.doi.org/10.1109/IPDPS.2005.8 Google ScholarDigital Library
- C. Ebeling, D. C. Cronquist, P. Franklin, J. Secosky, and S. G. Berg. 1997. Mapping applications to the RaPiD configurable architecture. In FCCM’97. IEEE Computer Society, 106--115. DOI: http://dx.doi.org/10.1109/FPGA.1997.624610 Google ScholarDigital Library
- S. Friedman, A. Carroll, B. Van Essen, B. Ylvisaker, C. Ebeling, and S. Hauck. 2009. SPR: An architecture-adaptive CGRA mapping tool. In FPGA’09. ACM, New York, NY, USA, 191--200. DOI: http://dx.doi.org/10.1145/1508128.1508158 Google ScholarDigital Library
- M. Hamzeh, A. Shrivastava, and S. Vrudhula. 2012. EPIMap: Using epimorphism to map applications on CGRAs. In DAC’12. ACM, 1284--1291. DOI: http://dx.doi.org/10.1145/2228360.2228600 Google ScholarDigital Library
- M. Hamzeh, A. Shrivastava, and S. Vrudhula. 2013. REGIMap: Register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs). In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, USA, Article 18, 10 pages. DOI: http://dx.doi.org/10.1145/2463209.2488756 Google ScholarDigital Library
- R. Hartenstein. 2001. A decade of reconfigurable computing: A visionary retrospective. In DATE’01. IEEE Press. Google ScholarDigital Library
- R. W. Hartenstein and R. Kress. 1995. A datapath synthesis system for the reconfigurable datapath architecture. In ASP-DAC’95. ACM, New York, NY, USA, Article 77. DOI: http://dx.doi.org/10.1145/224818.224959 Google ScholarDigital Library
- A. Hatanaka and N. Bagherzadeh. 2007. A modulo scheduling algorithm for a coarse-grain reconfigurable array template. In IPDPS’07. 1--8. DOI: http://dx.doi.org/10.1109/IPDPS.2007.370371Google Scholar
- Intel-N550. 2010. Intel N550 Datasheet. Retrieved from http://ark.intel.com/products/50154/Intel-Atom- Processor-N550-(1M-Cache-1_50-GHz).Google Scholar
- Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi. 2005. Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization. In DATE’05. IEEE Computer Society, Washington, DC, USA, 12--17. DOI: http://dx.doi.org/10.1109/DATE.2005.260 Google ScholarDigital Library
- Y. Kim, R. N. Mahapatra, and K. Choi. 2010. Design space exploration for efficient resource utilization in coarse-grained reconfigurable architecture. In Transactions on VLSI Systems. IEEE Press. Google ScholarDigital Library
- C. Liang and X. Huang. 2009. SmartCell: An energy efficient coarse-grained reconfigurable architecture for stream-based applications. EURASIP J. Embedded Syst. 2009, Article 1 (Jan. 2009), {15} pages. DOI: http://dx.doi.org/10.1155/2009/518659 Google ScholarDigital Library
- B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In FTP’02. 166--173. DOI: http://dx.doi.org/10.1109/FPT.2002.1188678Google Scholar
- B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2003. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In DATE’03. IEEE Computer Society. 296--301. DOI: http://dx.doi.org/10.1109/DATE.2003.1253623 Google ScholarDigital Library
- B. Mei, F.-J. Veredas, and B. Masschelein. 2005. Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture. In International Conference on Field Programmable Logic and Applications, 2005. 622--625. DOI: http://dx.doi.org/10.1109/FPL.2005.1515799Google Scholar
- B. Mei, M. Berekovic, and J.-Y. Mignolet. 2007. ADRES & DRESC: Architecture and compiler for coarse-grain reconfigurable processors. In Fine- and Coarse-Grain Reconfigurable Computing, S. Vassiliadis and D. Soudris (Eds.). Springer Netherlands, 255--297. DOI: http://dx.doi.org/10.1007/978-1-4020-6505-76Google Scholar
- H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-S Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In PACT’08. ACM, New York, NY, USA, 166--176. DOI: http://dx.doi.org/10.1145/1454115.1454140 Google ScholarDigital Library
- H. Park, Y. Park, and S. Mahlke. 2009a. Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications. In MICRO 42. ACM, New York, NY, USA, 370--380. DOI: http://dx.doi.org/10.1145/1669112.1669160 Google ScholarDigital Library
- H. Park, K. Fan, M. Kudlur, and S. Mahlke. 2006. Modulo graph embedding: Mapping applications onto coarse-grained reconfigurable architectures. In CASES’06. ACM, 136--146. Google ScholarDigital Library
- Y. Park, H. Park, and S. Mahlke. 2009b. CGRA express: Accelerating execution using dynamic operation fusion. In CASES’09. ACM, New York, NY, USA, 271--280. DOI: http://dx.doi.org/10.1145/1629395.1629433 Google ScholarDigital Library
- Y. Park, H. Park, and S. A. Mahlke. 2009. CGRA express: Accelerating execution using dynamic operation fusion. In CASES’09. 271--280. Google ScholarDigital Library
- B. Ramakrishna Rau. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In MICRO 27. ACM. Google ScholarDigital Library
- A. Shrivastava, J. Pager, R. Jeyapaul, M. H., and S. Vrudhula. 2011. Enabling multithreading on CGRAs. In ICPP’11. IEEE Computer Society, 255--264. DOI: http://dx.doi.org/10.1109/ICPP.2011.77 Google ScholarDigital Library
- H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. M. Chaves Filho. 2000. MorphoSys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Comput. 49, 5 (May 2000), 465--481. DOI: http://dx.doi.org/10.1109/12.859540 Google ScholarDigital Library
- J. W. Yoon, J. W. Yoon, A. Shrivastava, S. Park, M. Ahn, R. Jeyapaul, and Y. Paek. 2008. SPKM: A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures. In DAC’08. 776--782. DOI: http://dx.doi.org/10.1109/ASPDAC.2008.4484056 Google ScholarDigital Library
Index Terms
- A Software Scheme for Multithreading on CGRAs
Recommendations
Enabling Multithreading on CGRAs
ICPP '11: Proceedings of the 2011 International Conference on Parallel ProcessingCoarse-Grained Reconfigurable Arrays or CGRAs are programmable fabrics that promise both high performance and high power efficiency. Traditionally, CGRAs were used to accelerate extremely-embedded systems, and were typically manually programmed. However,...
A power-efficient adaptive heapsort for fpga-based image coding application (abstract only)
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arraysThis paper presents an adaptive heap sort architecture for an image coding implementation on FPGA, which specifically addresses the issue of sorting different amount of data located in each subband during the coding. The proposed sorting architecture is ...
Elastic CGRAs
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysVital technology trends such as voltage scaling and homogeneous multicore scaling have reached their limits and architects turn to alternate computing paradigms, such as heterogeneous and domain-specialized solutions. Coarse-Grain Reconfigurable Arrays (...
Comments