skip to main content
research-article

Domain-Specific Architectures: Research Problems and Promising Approaches

Published:24 January 2023Publication History
Skip Abstract Section

Abstract

Process technology-driven performance and energy efficiency improvements have slowed down as we approach physical design limits. General-purpose manycore architectures attempt to circumvent this challenge, but they have a significant performance and energy-efficient gap compared to special-purpose solutions. Domain-specific architectures (DSAs), an instance of heterogeneous architectures, efficiently combine general-purpose cores and specialized hardware accelerators to boost energy efficiency and provide programming flexibility. Indeed, the hardware, software, and systems aspects in DSAs are highly tailored to maximize the energy efficiency of applications in a target domain. As DSAs and their conceptualization advance rapidly, there is a strong need to understand the research problems that need immediate attention. This article discusses the primary research directions in the design and runtime management of DSAs. Then, it surveys some promising approaches and highlights the outstanding research needs.

REFERENCES

  1. [1] Apple. [n.d.]. Apple Secure Enclave. Retrieved May 15, 2022 from https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web.Google ScholarGoogle Scholar
  2. [2] Cadence. [n.d.]. ARM CoreLink Interconnects Whitepaper. Retrieved May 15, 2022 from https://ip.cadence.com/uploads/251/white-paper-interconnect-solutions-debugging-issues-advanced-ARM-CoreLink-pdf.Google ScholarGoogle Scholar
  3. [3] ARM. [n.d.]. ARM TrustZone. Retrived May 15, 2022 from https://developer.arm.com/documentation/PRD29-GENC-009492/c/TrustZone-Hardware-Architecture.Google ScholarGoogle Scholar
  4. [4] Google. [n.d.]. Google’s Thrust Towards Open-Source Hardware. Retrieved May 15, 2022 from https://opensource.googleblog.com/2019/05/google-fosters-open-source-hardware.html.Google ScholarGoogle Scholar
  5. [5] Aakash Jani. 2022. Year in Review: PC Processors Adopt Hybrid CPUs. Retrieved May 15, 2022 from https://www.techinsights.com/blog/year-review-pc-processors-adopt-hybrid-cpus.Google ScholarGoogle Scholar
  6. [6] Retrieved May 15, 2022 from https://futurenetworks.ieee.org/images/files/pdf/FirstResponder/Tom-Rondeau-DARPA.pdf.Google ScholarGoogle Scholar
  7. [7] Siemens. [n.d.]. Veloce2 Emulator. Retrieved May 15, 2022 from https://www.mentor.com/products/fv/emulation-systems/veloce.Google ScholarGoogle Scholar
  8. [8] Synopsys. [n.d.]. ZeBu Server 4. Retrieved May 15, 2022 from https://www.synopsys.com/verification/emulation/zebu-server.html.Google ScholarGoogle Scholar
  9. [9] Ajayi Tutu, Chhabria Vidya A., Fogaça Mateus, Hashemi Soheil, Hosny Abdelrahman, Kahng Andrew B., Kim Minsoo, et al. 2019. Toward an open-source digital flow: First learnings from the OpenROAD project. In Proceedings of the 56th Annual Design Automation Conference. 14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Amarnath Aporva, Pal Subhankar, Kassa Hiwot Tadese, Vega Augusto, Buyuktosunoglu Alper, Franke Hubertus, Wellman John-David, Dreslinski Ronald, and Bose Pradip. 2021. Heterogeneity-aware scheduling on SoCs for autonomous vehicles. IEEE Computer Architecture Letters 20, 2 (2021), 8285.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Arabnejad Hamid and Barbosa Jorge G.. 2013. List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Transactions on Parallel and Distributed Systems 25, 3 (2013), 682694.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Arda Samet, Krishnakumar Anish, Goksoy Ahmet Alper, Mack Joshua, Kumbhare Nirmal, Sartor Anderson Luiz, Akoglu Ali, Marculescu Radu, and Ogras Umit Y.. 2020. DS3: A system-level domain-specific system-on-chip simulation framework. IEEE Transactions on Computers 69, 8 (2020), 12481262.Google ScholarGoogle Scholar
  13. [13] Asanovic Krste, Bodik Ras, Catanzaro Bryan Christopher, Gebis Joseph James, Husbands Parry, Keutzer Kurt, Patterson David A., et al. 2006. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report No. UCB/EECS-2006-183. EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  14. [14] Asanović Krste and Patterson David A.. 2014. Instruction Sets Should Be Free: The Case for RISC-V. Technical Report No. UCB/EECS-2014-146. EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  15. [15] Atienza David, Valle Pablo G. Del, Paci Giacomo, Poletti Francesco, Benini Luca, Micheli Giovanni De, Mendias Jose M., and Hermida Roman. 2008. HW-SW emulation framework for temperature-aware design in MPSoCs. ACM Transactions on Design Automation of Electronic Systems 12, 3 (2008), 126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Atitallah Rabie Ben, Niar Smail, Meftali Samy, and Dekeyser Jean-Luc. 2007. An MPSoC performance estimation framework using transaction level modeling. In Proceedings of the International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’07). 525533.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Augonnet Cédric, Thibault Samuel, Namyst Raymond, and Wacrenier Pierre-André. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] al Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Ross Daly, Caleb Donovick, David Durst, et. 2020. Creating an agile hardware design flow. In Proceedings of the 57th ACM/IEEE Design Automation Conference (DAC’20). 16.Google ScholarGoogle Scholar
  19. [19] Bartholomew Daniel. 2006. QEMU: A multihost, multitarget emulator. Linux Journal 2006, 145 (2006), 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Behzadan Amir H., Timm Brian W., and Kamat Vineet R.. 2008. General-purpose modular hardware and software framework for mobile outdoor augmented reality applications in engineering. Advanced Engineering Informatics 22, 1 (2008), 90105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Bellard Fabrice. 2005. QEMU, A fast and portable dynamic translator. In Proceedings of the USENIX Annual Technical Conference: FREENIX Track, Vol. 41. 105555.Google ScholarGoogle Scholar
  22. [22] Beltrame Giovanni, Fossati Luca, and Sciuto Donatella. 2009. ReSP: A nonintrusive transaction-level reflective MPSoC simulation platform for design space exploration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 12 (2009), 18571869.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Belviranli Mehmet E. and Vetter Jeffrey S.. 2019. FLAME: Graph-based hardware representations for rapid and precise performance modeling. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’19). 17751780.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Benini Luca, Bogliolo Alessandro, and Micheli Giovanni De. 2000. A survey of design techniques for system-level dynamic power management. IEEE Transactions on Very Scale Integration (VLSI) Systems 8, 3 (2000), 299316.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Bhat Ganapati, Singla Gaurav, Unver Ali K., and Ogras Umit Y.. 2017. Algorithmic optimization of thermal and power management for heterogeneous mobile platforms. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 3 (2017), 544557.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Bittencourt Luiz F., Sakellariou Rizos, and Madeira Edmundo R. M.. 2010. DAG scheduling using a lookahead variant of the heterogeneous earliest finish time algorithm. In Proceedings of the Euromicro Conference on Parallel, Distributed, and Network-Based Processing.2734.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Boroujerdian Behzad, Jing Ying, Tripathy Devashree, Kumar Amit, Subramanian Lavanya, Yen Luke, Lee Vincent, et al. 2022. FARSI: An early-stage design space exploration framework to tame the domain-specific system-on-chip complexity. ACM Transactions on Embedded Computing Systems. Online, June 16, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Bose Pradip, Vega Augusto, Adve Sarita, Adve Vikram, Misailovic Sasa, Carloni Luca, Shepard Ken, Brooks David, Reddi Vijay Janapa, and Wei Gu-Yeon. 2021. Secure and resilient SoCs for autonomous vehicles. In International Workshop on Domain Specific System Architecture (DOSSA), in conjunction with IEEE International Symposium on High-Performance Computer Architecture (HPCA). https://scholar.google.com/scholar?hl=en&as_sdt=0%2C50&q=Secure+and+resilient+SoCs+for+autonomous+vehicles&btnG=.Google ScholarGoogle Scholar
  29. [29] Braun Tracy D., Siegel Howard Jay, Beck Noah, Bölöni Ladislau L., Maheswaran Muthucumaru, Reuther Albert I., Robertson James P., et al. 2001. A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing 61, 6 (2001), 810837.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Brown Kevin J., Sujeeth Arvind K., Lee Hyouk Joong, Rompf Tiark, Chafi Hassan, Odersky Martin, and Olukotun Kunle. 2011. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 89100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Burres Brad, Daly Dan, Debbage Mark, Louzoun Eliel, Severns-Williams Christine, Sundar Naru, Turbovich Nadav, Wolford Barry, and Li Yadong. 2021. Intel’s hyperscale-ready infrastructure processing unit (IPU). In Proceedings of the IEEE Hot Chips 33 Symposium (HCS’21). 116.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Burstein Idan. 2021. Nvidia data center processing unit (DPU) architecture. In Proceedings of the IEEE Hot Chips 33 Symposium (HCS’21). 120.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Canis Andrew, Choi Jongsok, Aldham Mark, Zhang Victor, Kammoona Ahmed, Czajkowski Tomasz, Brown Stephen D., and Anderson Jason H.. 2013. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Transactions on Embedded Computing Systems 13, 2 (2013), 127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Carara Everton A., Oliveira Roberto P. De, Calazans Ney L. V., and Moraes Fernando G.. 2009. HeMPS—A framework for NoC-based MPSoC generation. In Proceedings of the International Symposium on Circuits and Systems. 13451348.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Carloni Luca P.. 2016. The case for embedded scalable platforms. In Proceedings of the ACM/EDAC/IEEE Design Automation Conference (DAC’16). 16.Google ScholarGoogle Scholar
  36. [36] Castrillon Jeronimo, Leupers Rainer, and Ascheid Gerd. 2011. MAPS: Mapping concurrent dataflow applications to heterogeneous MPSoCs. IEEE Transactions on Industrial Informatics 9, 1 (2011), 527545.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Challapalle Nagadastagiri, Rampalli Sahithi, Chandran Makesh, Kalsi Gurpreet, Subramoney Sreenivas, Sampson John, and Narayanan Vijaykrishnan. 2020. PSB-RNN: A processing-in-memory systolic array architecture using block circulant matrices for recurrent neural networks. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’20). 180185.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Challapalle Nagadastagiri, Rampalli Sahithi, Jao Nicholas, Ramanathan Akshaykrishna, Sampson John, and Narayanan Vijaykrishnan. 2020. FARM: A flexible accelerator for recurrent and memory augmented neural networks. Journal of Signal Processing Systems 92, 11 (2020), 12471261.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Challapalle Nagadastagiri, Swaminathan Karthik, Chandramoorthy Nandhini, and Narayanan Vijaykrishnan. 2021. Crossbar based processing in memory accelerator architecture for graph convolutional networks. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD’21). 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Charif Amir, Busnot Gabriel, Mameesh Rania, Sassolas Tanguy, and Ventroux Nicolas. 2019. Fast virtual prototyping for embedded computing systems design and exploration. In Proceedings of Rapid Simulation and Performance Evaluation: Methods and Tools. 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Chen Kuan-Yu, Yang Chi-Sheng, Sun Yu-Hsiu, Tseng Chien-Wei, Fayazi Morteza, He Xin, Feng Siying, et al. 2022. A 507 GMACs/J 256-core domain adaptive systolic-array-processor for wireless communication and linear-algebra kernels in 12nm FINFET. In Proceedings of the 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits’22).Google ScholarGoogle Scholar
  42. [42] Chen Wen, Ray Sandip, Bhadra Jayanta, Abadir Magdy, and Wang Li-C.. 2017. Challenges and trends in modern SoC design verification. IEEE Design & Test 34, 5 (2017), 722.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Chin S. Alexander and Anderson Jason H.. 2018. An architecture-agnostic integer linear programming approach to CGRA mapping. In Proceedings of the 55th Annual Design Automation Conference. 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Chou Chen-Ling, Ogras Umit Y., and Marculescu Radu. 2008. Energy-and performance-aware incremental mapping for networks on chip with multiple voltage levels. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 10 (2008), 18661879.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Cong Jason, Huang Hui, Ma Chiyuan, Xiao Bingjun, and Zhou Peipei. 2014. A fully pipelined and dynamically composable architecture of CGRA. In Proceedings of the Annual International Symposium on Field-Programmable Custom Computing Machines. 916.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Cong Jason, Sarkar Vivek, Reinman Glenn, and Bui Alex. 2010. Customizable domain-specific computing. IEEE Design & Test of Computers 28, 2 (2010), 615.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] David Robert, Duke Jared, Jain Advait, Reddi Vijay Janapa, Jeffries Nat, Li Jian, Kreeger Nick, et al. 2021. TensorFlow Lite Micro: Embedded machine learning for TinyML systems. Proceedings of Machine Learning and Systems 3 (2021), 800811.Google ScholarGoogle Scholar
  48. [48] Davis Robert I. and Burns Alan. 2011. A survey of hard real-time scheduling for multiprocessor systems. ACM Computing Surveys 43, 4 (2011), 144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Dey Somdip, Singh Amit Kumar, and McDonald-Maier Klaus. 2021. ThermalAttackNet: Are CNNs making it easy to perform temperature side-channel attack in mobile edge devices?Future Internet 13, 6 (2021), 146.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Dey Somdip, Singh Amit Kumar, and McDonald-Maier Klaus Dieter. 2019. P-EdgeCoolingMode: An agent-based performance aware thermal management unit for DVFS enabled heterogeneous MPSoCs. IET Computers & Digital Techniques 13, 6 (2019), 514523.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Donyanavard Bryan, Mück Tiago, Sarma Santanu, and Dutt Nikil. 2016. SPARTA: Runtime task allocation for energy efficient heterogeneous manycores. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’16). 110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Dou Yong, Vassiliadis Stamatis, Kuzmanov Georgi Krasimirov, and Gaydadjiev Georgi Nedeltchev. 2005. 64-bit floating-point FPGA matrix multiplication. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 8695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] D’Souza Sandeep and Rajkumar Ragunathan. 2018. CycleTandem: Energy-saving scheduling for real-time systems with hardware accelerators. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS’18). 94106.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Duque Laura A. Rozo, Diaz Jose M. Monsalve, and Yang Chengmo. 2015. Improving MPSoC reliability through adapting runtime task schedule based on time-correlated fault behavior. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’15). 818823.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Esmaeilzadeh Hadi, Blem Emily, Amant Renee St., Sankaralingam Karthikeyan, and Burger Doug. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11). IEEE, Los Alamitos, CA, 365376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Ewald Roland and Uhrmacher Adelinde M.. 2014. SESSL: A domain-specific language for simulation experiments. ACM Transactions on Modeling and Computer Simulation 24, 2 (2014), 125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Ferrante Jeanne, Ottenstein Karl J., and Warren Joe D.. 1987. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems 9, 3 (1987), 319349.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Firouzi Farshad, Azarpeyvand Ali, Salehi Mostafa E., and Fakhraie Sied Mehdi. 2012. Adaptive fault-tolerant DVFS with dynamic online AVF prediction. Microelectronics Reliability 52, 6 (2012), 11971208.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Fonseca Alcides and Cabral Bruno. 2017. Prototyping a GPGPU neural network for deep-learning big data analysis. Big Data Research 8 (2017), 5056.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Frigo Matteo and Johnson Steven G.. 1998. FFTW: An adaptive software architecture for the FFT. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’98), Vol. 3. 13811384.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Gajski Daniel D., Abdi Samar, Gerstlauer Andreas, and Schirner Gunar. 2009. Embedded System Design: Modeling, Synthesis and Verification. Springer Science & Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Gajski Daniel D., Narayan Sanjiv, Ramachandran Loganath, Vahid Frank, and Fung Peter. 1996. System design methodologies: Aiming at the 100 h design cycle. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 4, 1 (1996), 7082.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Gajski Daniel D., Zhu Jianwen, Dömer Rainer, Gerstlauer Andreas, and Zhao Shuqing. 2012. SpecC: Specification Language and Methodology. Springer Science & Business Media.Google ScholarGoogle Scholar
  64. [64] Galassi Mark, Davies Jim, Theiler James, Gough Brian, Jungman Gerard, Alken Patrick, Booth Michael, Rossi Fabrice, and Ulerich Rhys. 2002. GNU Scientific Library. Network Theory Limited.Google ScholarGoogle Scholar
  65. [65] P. Gambron and S. Thorne. 2020. Comparison of Several FFT Libraries in C/C++. Technical Report. STFC.Google ScholarGoogle Scholar
  66. [66] Geer David. 2005. Chip makers turn to multicore processors. Computer 38, 5 (2005), 1113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Genko Nicolas, Atienza David, Micheli Giovanni De, and Benini Luca. 2007. Feature-NoC emulation: A tool and design flow for MPSoC. IEEE Circuits and Systems Magazine 7, 4 (2007), 4251.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Gepner Pawel and Kowalik Michal Filip. 2006. Multi-core processors: New way to achieve high system performance. In Proceedings of the International Symposium on Parallel Computing in Electrical Engineering (PARELEC’06). 913.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Gerstlauer Andreas. 2010. Host-compiled simulation of multi-core platforms. In Proceedings of the 21st IEEE International Symposium on Rapid System Protyping. 16.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Gerstlauer Andreas, Haubelt Christian, Pimentel Andy D., Stefanov Todor P., Gajski Daniel D., and Teich Jürgen. 2009. Electronic system-level synthesis methodologies. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 10 (2009), 15171530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Frank Ghenassia (Ed.). 2005. Transaction-Level Modeling with SystemC. Vol. 2. Springer.Google ScholarGoogle Scholar
  72. [72] Giri Davide, Chiu Kuan-Lin, Eichler Guy, Mantovani Paolo, and Carloni Luca P.. 2021. Accelerator integration for open-source SoC design. IEEE Micro 41, 4 (2021), 814.Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Giri Davide, Mantovani Paolo, and Carloni Luca P.. 2018. NoC-based support of heterogeneous cache-coherence models for accelerators. In Proceedings of the IEEE/ACM International Symposium on Networks-on-Chip (NoCS’18). 18.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Daniel S. Green. 2018. Heterogeneous Integration at DARPA: Pathfinding and Progress in Assembly Approaches. DARPA.Google ScholarGoogle Scholar
  75. [75] Green Oded, McColl Robert, and Bader David A.. 2012. GPU merge path: A GPU merging algorithm. In Proceedings of the ACM International Conference on Supercomputing. 331340.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. [76] Greengard Samuel. 2020. Will RISC-V revolutionize computing?Communications of the ACM 63, 5 (2020), 3032.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. [77] Gries Matthias. 2004. Methods for evaluating and covering the design space during early design development. Integration 38, 2 (2004), 131183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. [78] Grötker Thorsten, Liao Stan, Martin Grant, and Swan Stuart. 2007. System Design with SystemCTM. Springer Science & Business Media.Google ScholarGoogle Scholar
  79. [79] Halambi Ashok, Grun Peter, Ganesh Vijay, Khare Asheesh, Dutt Nikil, and Nicolau Alex. 2008. EXPRESSION: A language for architecture exploration through compiler/simulator retargetability. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’08). 3145.Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Han Sodam, Yun Yonghee, Kim Young Hwan, and Kang Seokhyeong. 2020. Proactive scenario characteristic-aware online power management on mobile systems. IEEE Access 8 (2020), 6969569711.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Hanumaiah Vinay, Desai Digant, Gaudette Benjamin, Wu Carole-Jean, and Vrudhula Sarma. 2014. STEAM: A smart temperature and energy aware multicore controller. ACM Transactions on Embedded Computing Systems 13, 5s (2014), 125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. [82] Hanumaiah Vinay and Vrudhula Sarma. 2012. Energy-efficient operation of multicore processors by DVFS, task migration, and active cooling. IEEE Transactions on Computers 63, 2 (2012), 349360.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Hardkernel ODROID Wiki. [n.d.].. ODROID-XU3. Retrieved May 15, 2022 from https://wiki.odroid.com/old_product/odroid-xu3/odroid-xu3.Google ScholarGoogle Scholar
  84. [84] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Hennessy John and Patterson David. 2018. A new golden age for computer architecture: Domain-specific hardware/software co-design, enhanced. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18).Google ScholarGoogle Scholar
  86. [86] Hennessy John L. and Patterson David A.. 2019. A new golden age for computer architecture. Communications of the ACM 62, 2 (2019), 4860.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. [87] Hu Jingcao and Marculescu Radu. 2005. Energy-and performance-aware mapping for regular NoC architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24, 4 (2005), 551562.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. [88] Hu Zhengbing, Zhang Qingying, Petoukhov Sergey, and He Matthew. 2021. Advances in Artificial Systems for Logistics Engineering. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Huang Jia, Blech Jan Olaf, Raabe Andreas, Buckl Christian, and Knoll Alois. 2011. Analysis and optimization of fault-tolerant task scheduling on multiprocessor embedded systems. In Proceedings of the 7th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. 247256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. [90] Huang Lin, Yuan Feng, and Xu Qiang. 2009. Lifetime reliability-aware task allocation and scheduling for MPSoC platforms. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’09). 5156.Google ScholarGoogle Scholar
  91. [91] Hutter Michael and Schmidt Jörn-Marc. 2013. The temperature side channel and heating fault attacks. In Proceedings of the International Conference on Smart Card Research and Advanced Applications. 219235.Google ScholarGoogle Scholar
  92. [92] Jamieson Peter, Kent Kenneth B., Gharibian Farnaz, and Shannon Lesley. 2010. Odin II—An open-source Verilog HDL synthesis tool for CAD research. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. 149156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. [93] James Jeffers, James Reinders, and Avinash Sodani. 2016. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann.Google ScholarGoogle Scholar
  94. [94] Jin Shiyuan, Schiavone Guy, and Turgut Damla. 2008. A performance study of multiprocessor task scheduling algorithms. Journal of Supercomputing 43, 1 (2008), 7797.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. [95] Jouppi Norman P. and Wall David W.. 1989. Available instruction-level parallelism for superscalar and superpipelined machines. ACM SIGARCH Computer Architecture News 17, 2 (1989), 272282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. [96] Jouppi Norman P., Yoon Doe Hyun, Ashcraft Matthew, Gottscho Mark, Jablin Thomas B., Kurian George, Laudon James, et al. 2021. Ten lessons from three generations shaped Google’s TPUv4i: Industrial product. In Proceedings of the ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA’21). 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. [97] Khamis Mostafa, El-Ashry Sameh, Shalaby Ahmed, AbdElsalam Mohamed, and El-Kharashi M. Watheq. 2018. A configurable RISC-V for NoC-based MPSoCs: A framework for hardware emulation. In Proceedings of the 11th International Workshop on Network on Chip Architectures (NoCArc’18). 16.Google ScholarGoogle ScholarCross RefCross Ref
  98. [98] Kim Sung, Fayazi Morteza, Daftardar Alhad, Chen Kuan-Yu, Tan Jielun, Pal Subhankar, Ajayi Tutu, et al. 2022. Versa: A 36-core systolic multiprocessor with dynamically reconfigurable interconnect and memory. IEEE Journal of Solid-State Circuits 57, 4 (2022), 986–998.Google ScholarGoogle ScholarCross RefCross Ref
  99. [99] Kong Joonho, Chung Sung Woo, and Skadron Kevin. 2012. Recent thermal management techniques for microprocessors. ACM Computing Surveys 44, 3 (2012), 142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. [100] Kotsifakou Maria, Srivastava Prakalp, Sinclair Matthew D., Komuravelli Rakesh, Adve Vikram, and Adve Sarita. 2018. HPVM: Heterogeneous parallel virtual machine. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 6880.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. [101] Krishnakumar Anish, Arda Samet E., Goksoy A. Alper, Mandal Sumit K., Ogras Umit Y., Sartor Anderson L., and Marculescu Radu. 2020. Runtime task scheduling using imitation learning for heterogeneous many-core systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 40644077.Google ScholarGoogle ScholarCross RefCross Ref
  102. [102] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (2012).Google ScholarGoogle Scholar
  103. [103] Kukkala Vipin Kumar, Pasricha Sudeep, and Bradley Thomas. 2020. SEDAN: Security-aware design of time-critical automotive networks. IEEE Transactions on Vehicular Technology 69, 8 (2020), 90179030.Google ScholarGoogle ScholarCross RefCross Ref
  104. [104] Kurth Andreas, Vogel Pirmin, Capotondi Alessandro, Marongiu Andrea, and Benini Luca. 2017. HERO: Heterogeneous embedded research platform for exploring RISC-V manycore accelerators on FPGA. arXiv preprint arXiv:1712.06497 (2017).Google ScholarGoogle Scholar
  105. [105] Lategahn Henning, Geiger Andreas, and Kitt Bernd. 2011. Visual SLAM for autonomous ground vehicles. In Proceedings of the International Conference on Robotics and Automation. 17321737.Google ScholarGoogle ScholarCross RefCross Ref
  106. [106] Lattner Chris and Adve Vikram. 2004. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization. 7586.Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. [107] Lee Seyong, Min Seung-Jai, and Eigenmann Rudolf. 2009. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. ACM SIGPLAN Notices 44, 4 (2009), 101110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. [108] Lin Ching-Chi, Syu You-Cheng, Chang Chao-Jui, Wu Jan-Jan, Liu Pangfeng, Cheng Po-Wen, and Hsu Wei-Te. 2015. Energy-efficient task scheduling for multi-core platforms with per-core DVFS. Journal of Parallel and Distributed Computing 86 (2015), 7181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. [109] Lin Shih-Chieh, Zhang Yunqi, Hsu Chang-Hong, Skach Matt, Haque E., Tang Lingjia, and Mars Jason. 2018. The architectural implications of autonomous driving: Constraints and acceleration. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 751766.Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. [110] Liu Chen, Rajendran Jeyavijayan, Yang Chengmo, and Karri Ramesh. 2014. Shielding heterogeneous MPSoCs from untrustworthy 3PIPs through security-driven task scheduling. IEEE Transactions on Emerging Topics in Computing 2, 4 (2014), 461472.Google ScholarGoogle ScholarCross RefCross Ref
  111. [111] Liu Leibo, Zhu Jianfeng, Li Zhaoshi, Lu Yanan, Deng Yangdong, Han Jie, Yin Shouyi, and Wei Shaojun. 2019. A survey of coarse-grained reconfigurable architecture and design: Taxonomy, challenges, and applications. ACM Computing Surveys 52, 6 (2019), 139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. [112] Mack Joshua, Arda Samet, Ogras Umit Y., and Akoglu Ali. 2021. Performant, multi-objective scheduling of highly interleaved task graphs on heterogeneous system on chip devices. IEEE Transactions on Parallel and Distributed Systems 33 (2021), 2148–2162.Google ScholarGoogle Scholar
  113. [113] Mack Joshua, Hassan Sahil, Kumbhare Nirmal, Gonzalez Miguel Castro, and Akoglu Ali. 2022. CEDR—A compiler-integrated, extensible DSSoC runtime. ACM Transactions on Embedded Computing Systems. Online, April 13, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. [114] Mack Joshua, Kumbhare Nirmal, Krishnakumar Anish, Ogras Umit Y., and Akoglu Ali. 2020. User-space emulation framework for domain-specific SoC design. In Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’20). 4453.Google ScholarGoogle ScholarCross RefCross Ref
  115. [115] Malawade Arnav, Odema Mohanad, Lajeunesse-DeGroot Sebastien, and Faruque Mohammad Abdullah Al. 2021. SAGE: A split-architecture methodology for efficient end-to-end autonomous vehicle control. ACM Transactions on Embedded Computing Systems 20, 5s (2021), 122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. [116] Mandal Dipan Kumar, Jandhyala Srivatsava, Omer Om J., Kalsi Gurpreet S., George Biji, Neela Gopi, Rethinagiri Santhosh Kumar, et al. 2019. Visual inertial odometry at the edge: A hardware-software co-design approach for ultra-low latency and power. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’19). 960963.Google ScholarGoogle ScholarCross RefCross Ref
  117. [117] Mandal Sumit K., Bhat Ganapati, Patil Chetan Arvind, Doppa Janardhan Rao, Pande Partha Pratim, and Ogras Umit Y.. 2019. Dynamic resource management of heterogeneous mobile platforms via imitation learning. IEEE Transactions on Very Large Scale Integration (VLSI) Systems.Google ScholarGoogle ScholarCross RefCross Ref
  118. [118] Mandal Sumit K., Krishnakumar Anish, and Ogras Umit Y.. 2021. Energy-efficient networks-on-chip architectures: Design and run-time optimization. In Network-on-Chip Security and Privacy. Springer, 5575.Google ScholarGoogle ScholarCross RefCross Ref
  119. [119] Mantovani Paolo, Giri Davide, Guglielmo Giuseppe Di, Piccolboni Luca, Zuckerman Joseph, Cota Emilio G., Petracca Michele, Pilato Christian, and Carloni Luca P.. 2020. Agile SoC development with open ESP. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD’20). 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. [120] Mao Hongzi, Alizadeh Mohammad, Menache Ishai, and Kandula Srikanth. 2016. Resource management with deep reinforcement learning. In Proceedings of the ACM Workshop on Hot Topics in Networks. 5056.Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. [121] Mao Hongzi, Schwarzkopf Malte, Venkatakrishnan Shaileshh Bojja, Meng Zili, and Alizadeh Mohammad. 2019. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM’19). ACM, New York, NY, 270288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. [122] Marculescu Radu, Ogras Umit Y., Peh Li-Shiuan, Jerger Natalie Enright, and Hoskote Yatin. 2008. Outstanding research problems in NoC design: System, microarchitecture, and circuit perspectives. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 1 (2008), 321.Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. [123] Marsan Laurent and Sagot Marie-France. 2000. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Journal of Computational Biology 7, 3-4 (2000), 345362.Google ScholarGoogle ScholarCross RefCross Ref
  124. [124] Mashey John R.. 2021. Interactions, impacts, and coincidences of the first golden age of computer architecture. IEEE Micro 41, 6 (2021), 131139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. [125] Mernik Marjan, Heering Jan, and Sloane Anthony M.. 2005. When and how to develop domain-specific languages. ACM Computing Surveys 37, 4 (2005), 316344.Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. [126] Mittal Sparsh. 2020. A survey of FPGA-based accelerators for convolutional neural networks. Neural Computing and Applications 32, 4 (2020), 11091139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. [127] Mittal Sparsh and Vetter Jeffrey S.. 2015. A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys 47, 4 (2015), 135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. [128] Moazzemi Kasra, Maity Biswadip, Yi Saehanseul, Rahmani Amir M., and Dutt Nikil. 2019. HESSLE-FREE: Heterogeneous systems leveraging fuzzy control for runtime resource management. ACM Transactions on Embedded Computing Systems 18, 5s (2019), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. [129] Mohanan Ashwin Vishnu, Bonamy Cyrille, and Augier Pierre. 2018. FluidFFT: Common API (C++ and Python) for fast Fourier transform HPC libraries. arXiv preprint arXiv:1807.01775 (2018).Google ScholarGoogle Scholar
  130. [130] Mulas Fabrizio, Atienza David, Acquaviva Andrea, Carta Salvatore, Benini Luca, and Micheli Giovanni De. 2009. Thermal balancing policy for multiprocessor stream computing platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 12 (2009), 18701882.Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. [131] Murdock Kit, Oswald David, Garcia Flavio D., Bulck Jo Van, Gruss Daniel, and Piessens Frank. 2020. Plundervolt: Software-based fault injection attacks against Intel SGX. In Proceedings of the IEEE Symposium on Security and Privacy (SP’20). 14661482.Google ScholarGoogle ScholarCross RefCross Ref
  132. [132] Naghibijouybari Hoda, Neupane Ajaya, Qian Zhiyun, and Abu-Ghazaleh Nael. 2018. Rendered insecure: GPU side channel attacks are practical. In Proceedings of ACM SIGSAC Conference on Computer and Communications Security. 21392153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. [133] Norrie Thomas, Patil Nishant, Yoon Doe Hyun, Kurian George, Li Sheng, Laudon James, Young Cliff, Jouppi Norman, and Patterson David. 2021. The design process for Google’s training chips: TPUv2 and TPUv3. IEEE Micro 41, 2 (2021), 5663.Google ScholarGoogle ScholarCross RefCross Ref
  134. [134] O’Mahony Niall, Campbell Sean, Carvalho Anderson, Harapanahalli Suman, Hernandez Gustavo Velasco, Krpalkova Lenka, Riordan Daniel, and Walsh Joseph. 2019. Deep learning vs. traditional computer vision. In Proceedings of the Science and Information Conference. 128144.Google ScholarGoogle Scholar
  135. [135] Padoin Edson Luiz, Pilla Laércio Lima, Castro Márcio, Boito Francieli Z., Navaux Philippe Olivier Alexandre, and Méhaut Jean-François. 2015. Performance/energy trade-off in scientific computing: The case of ARM big.LITTLE and Intel Sandy Bridge. IET Computers & Digital Techniques 9, 1 (2015), 2735.Google ScholarGoogle ScholarCross RefCross Ref
  136. [136] Pan Zhixin and Mishra Prabhat. 2021. Automated test generation for hardware Trojan detection using reinforcement learning. In Proceedings of the 26th Asia and South Pacific Design Automation Conference. 408413.Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. [137] Pasricha Sudeep, Ayoub Raid, Kishinevsky Michael, Mandal Sumit K., and Ogras Umit Y.. 2020. A survey on energy management for mobile and IoT devices. IEEE Design & Test 37, 5 (2020), 724.Google ScholarGoogle ScholarCross RefCross Ref
  138. [138] Patterson David. 2018. 50 years of computer architecture: From the mainframe CPU to the domain-specific TPU and the open RISC-V instruction set. In Proceedings of the 2018 IEEE International Solid-State Circuits Conference-(ISSCC’18). IEEE, Los Alamitos, CA, 2731.Google ScholarGoogle ScholarCross RefCross Ref
  139. [139] Pérez Arturo, Rodríguez Alfonso, Otero Andrés, Arjona David González, Jiménez-Peralo Alvaro, Verdugo Miguel Ángel, and Torre Eduardo De La. 2020. Run-time reconfigurable MPSoC-based on-board processor for vision-based space navigation. IEEE Access 8 (2020), 5989159905.Google ScholarGoogle ScholarCross RefCross Ref
  140. [140] Puig Martín Pi, Giusti Laura Cristina De, Naiouf Marcelo, and Giusti Armando Eduardo De. 2019. A study of hardware performance counters selection for cross architectural GPU power modeling. In XXV Congreso Argentino de Ciencias de la Computación (CACIC’19).Google ScholarGoogle Scholar
  141. [141] Pimentel Andy D., Erbas Cagkan, and Polstra Simon. 2006. A systematic approach to exploring embedded system architectures at multiple abstraction levels. IEEE Transactions on Computers 55, 2 (2006), 99112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. [142] Portugal Ivens, Alencar Paulo, and Cowan Donald. 2018. The use of machine learning algorithms in recommender systems: A systematic review. Expert Systems with Applications 97 (2018), 205227.Google ScholarGoogle ScholarCross RefCross Ref
  143. [143] Pu Jing, Bell Steven, Yang Xuan, Setter Jeff, Richardson Stephen, Ragan-Kelley Jonathan, and Horowitz Mark. 2017. Programming heterogeneous systems from an image processing DSL. ACM Transactions on Architecture and Code Optimization 14, 3 (2017), 125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. [144] Punkka Timo. 2012. Agile hardware and co-design. In Proceedings of the Embedded Systems Conference. 18.Google ScholarGoogle Scholar
  145. [145] Ragan-Kelley Jonathan, Adams Andrew, Sharlet Dillon, Barnes Connelly, Paris Sylvain, Levoy Marc, Amarasinghe Saman, and Durand Frédo. 2017. Halide: Decoupling algorithms from schedules for high-performance image processing. Communications of the ACM 61, 1 (2017), 106115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. [146] Reddy Basireddy Karunakar, Singh Amit Kumar, Biswas Dwaipayan, Merrett Geoff V., and Al-Hashimi Bashir M.. 2017. Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores. IEEE Transactions on Multi-Scale Computing Systems 4, 3 (2017), 369382.Google ScholarGoogle ScholarCross RefCross Ref
  147. [147] Riesgo Teresa, Torroja Yago, and Torre Eduardo De la. 1999. Design methodologies based on hardware description languages. IEEE Transactions on Industrial Electronics 46, 1 (1999), 312.Google ScholarGoogle ScholarCross RefCross Ref
  148. [148] Rosing Tajana Simunic, Mihic Kresimir, and Micheli Giovanni De. 2007. Power and reliability management of SoCs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 15, 4 (2007), 391403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. [149] Saeed Ahmed, Elbably M., Abdelfadeel G., and Eladawy M. I.. 2009. Efficient FPGA implementation of FFT/IFFT processor. International Journal of Circuits, Systems and Signal Processing 3, 3 (2009), 103110.Google ScholarGoogle Scholar
  150. [150] Sahin Onur and Coskun Ayse K.. 2016. Providing sustainable performance in thermally constrained mobile devices. In Proceedings of the 14th ACM/IEEE Symposium on Embedded Systems for Real-Time Multimedia. 7277.Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. [151] Sarma Santanu and Dutt Nikil. 2014. FPGA emulation and prototyping of a cyberphysical-system-on-chip (CPSoC). In Proceedings of the IEEE International Symposium on Rapid System Prototyping. 121127.Google ScholarGoogle ScholarCross RefCross Ref
  152. [152] Sartor Anderson L., Krishnakumar Anish, Arda Samet E., Ogras Umit Y., and Marculescu Radu. 2020. HiLITE: Hierarchical and lightweight imitation learning for power management of embedded SoCs. IEEE Computer Architecture Letters 19, 1 (2020), 6367.Google ScholarGoogle ScholarCross RefCross Ref
  153. [153] Shao Yakun Sophia, Xi Sam Likun, Srinivasan Vijayalakshmi, Wei Gu-Yeon, and Brooks David. 2016. Co-designing accelerators and SoC interfaces using gem5-Aladdin. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  154. [154] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  155. [155] Singh Amit Kumar, Shafique Muhammad, Kumar Akash, and Henkel Jörg. 2013. Mapping on multi/many-core systems: Survey of current and emerging trends. In Proceedings of the 50th ACM/EDAC/IEEE Design Automation Conference (DAC’13). 110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. [156] Skillicorn David B. and Talia Domenico. 1998. Models and languages for parallel computation. ACM Computing Surveys, 2 (1998), 123169.Google ScholarGoogle Scholar
  157. [157] Spafford Kyle L. and Vetter Jeffrey S.. 2012. Aspen: A domain specific language for performance modeling. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). 111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  158. [158] Stevens Ashley. 2014. Quality of Service (QoS) in ARM® Systems: An Overview. White Paper. ARM, Cambridge, UK.Google ScholarGoogle Scholar
  159. [159] Suda Naveen, Chandra Vikas, Dasika Ganesh, Mohanty Abinash, Ma Yufei, Vrudhula Sarma, Seo Jae-Sun, and Cao Yu. 2016. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 1625.Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. [160] Sujeeth Arvind K., Brown Kevin J., Lee Hyoukjoong, Rompf Tiark, Chafi Hassan, Odersky Martin, and Olukotun Kunle. 2014. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Transactions on Embedded Computing Systems 13, 4s (2014), 125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. [161] Suriano Leonardo, Madroñal Daniel, Rodríguez Alfonso, Juárez Eduardo, Sanz César, and Torre Eduardo de la. 2018. A unified hardware/software monitoring method for reconfigurable computing architectures using PAPI. In Proceedings of the 13th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC’18). 18.Google ScholarGoogle ScholarCross RefCross Ref
  162. [162] Swaminathan Karthik and Vega Augusto. 2021. Hardware specialization: From cell to heterogeneous microprocessors everywhere. IEEE Micro 41, 6 (2021), 112120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. [163] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19.Google ScholarGoogle ScholarCross RefCross Ref
  164. [164] Tang Zhuo, Qi Ling, Cheng Zhenzhen, Li Kenli, Khan Samee U., and Li Keqin. 2016. An energy-efficient task scheduling algorithm in DVFS-enabled cloud environment. Journal of Grid Computing 14, 1 (2016), 5574.Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. [165] Tariq Umair Ullah, Wu Hui, and Ishak Suhaimi Abd. 2018. Energy-aware scheduling of conditional task graphs on NoC-based MPSoCs. In Proceedings of the 51st Hawaii International Conference on System Sciences.Google ScholarGoogle ScholarCross RefCross Ref
  166. [166] Theis Thomas N. and Wong H.-S. Philip. 2017. The end of Moore’s law: A new beginning for information technology. Computing in Science & Engineering 19, 2 (2017), 4150.Google ScholarGoogle ScholarDigital LibraryDigital Library
  167. [167] Topcuoglu Haluk, Hariri Salim, and Wu Min-You. 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13, 3 (2002), 260274.Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. [168] Tortorella Yvan, Bertaccini Luca, Rossi Davide, Benini Luca, and Conti Francesco. 2022. RedMulE: A compact FP16 matrix-multiplication accelerator for adaptive deep learning on RISC-V-based ultra-low-power SoCs. arXiv preprint arXiv:2204.11192 (2022).Google ScholarGoogle Scholar
  169. [169] Tu Fengbin, Yin Shouyi, Ouyang Peng, Tang Shibin, Liu Leibo, and Wei Shaojun. 2017. Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 8 (2017), 22202233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  170. [170] Uhrie Richard, Bliss Daniel W., Chakrabarti Chaitali, Ogras Umit Y., and Brunhaver John. 2019. Machine understanding of domain computation for domain-specific system-on-chips (DSSoC). In Open Architecture/Open Business Model Net-Centric Systems and Defense Transformation 2019, Vol. 11015. International Society for Optics and Photonics, SPIE, 180187.Google ScholarGoogle Scholar
  171. [171] Uhrie Richard, Chakrabarti Chaitali, and Brunhaver John. 2020. Automated parallel kernel extraction from dynamic application traces. arXiv preprint arXiv:2001.09995 (2020).Google ScholarGoogle Scholar
  172. [172] Ullman J. D.. 1975. NP-complete scheduling problems. Journal of Computer and System Sciences 10, 3 (1975), 384393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  173. [173] Stralen Peter Van and Pimentel Andy. 2010. Scenario-based design space exploration of MPSoCs. In Proceedings of the IEEE International Conference on Computer Design. 305312.Google ScholarGoogle ScholarCross RefCross Ref
  174. [174] Varanasi Prashant and Heiser Gernot. 2011. Hardware-supported virtualization on ARM. In Proceedings of the 2nd Asia-Pacific Workshop on Systems. 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  175. [175] Vega Augusto, Wellman John-David, Franke Hubertus, Buyuktosunoglu Alper, Bose Pradip, Amarnath Aporva, Kassa Hiwot, Pal Subhankar, and Dreslinski Ronald. 2021. STOMP: Agile evaluation of scheduling policies in heterogeneous multi-processors. In Proceedings of the 3rd International Workshop on Domain Specific System Architecture in Conjunction with the 27th IEEE International Symposium on High-Performance Computer Architecture (DOSSA-3 @ HPCA’21).Google ScholarGoogle Scholar
  176. [176] Ventroux Nicolas, Guerre Alexandre, Sassolas Tanguy, Moutaoukil L., Blanc Guillaume, Bechara Charly, and David Raphaël. 2010. SESAM: An MPSoC simulation environment for dynamic application processing. In Proceedings of the 10th IEEE International Conference on Computer and Information Technology. 18801886.Google ScholarGoogle ScholarDigital LibraryDigital Library
  177. [177] Walker Matthew J. P. and Anderson Jason H.. 2019. Generic connectivity-based CGRA mapping via integer linear programming. In Proceedings of the Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’19). 6573.Google ScholarGoogle ScholarCross RefCross Ref
  178. [178] Wang Bo, Ma Sheng, Zhu Guoyi, Yi Xiao, and Xu Rui. 2022. A novel systolic array processor with dynamic dataflows. Integration 85 (2022), 4247.Google ScholarGoogle ScholarDigital LibraryDigital Library
  179. [179] Wang Endong, Zhang Qing, Shen Bo, Zhang Guangyong, Lu Xiaowei, Wu Qing, and Wang Yajuan. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi\(^{TM}\). Springer, 167188.Google ScholarGoogle Scholar
  180. [180] Wang Liang and Skadron Kevin. 2013. Implications of the power wall: Dim cores and reconfigurable logic. IEEE Micro 33, 5 (2013), 4048.Google ScholarGoogle ScholarDigital LibraryDigital Library
  181. [181] Wang Yu Emma, Wei Gu-Yeon, and Brooks David. 2019. Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv preprint arXiv:1907.10701 (2019).Google ScholarGoogle Scholar
  182. [182] Wei Xuechao, Yu Cody Hao, Zhang Peng, Chen Youxiang, Wang Yuxin, Hu Han, Liang Yun, and Cong Jason. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference. 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  183. [183] Wiens Jenna and Shenoy Erica S.. 2018. Machine learning for healthcare: On the verge of a major shift in healthcare epidemiology. Clinical Infectious Diseases 66, 1 (2018), 149153.Google ScholarGoogle ScholarCross RefCross Ref
  184. [184] Wijerathne Dhananjaya, Li Zhaoying, Pathania Anuj, Mitra Tulika, and Thiele Lothar. 2021. HiMap: Fast and scalable high-quality mapping on CGRA via hierarchical abstraction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41, 10 (2021), 3290–3303.Google ScholarGoogle Scholar
  185. [185] Wu Yen-Kuan, Sharifi Shervin, and Rosing Tajana Simunic. 2011. Distributed thermal management for embedded heterogeneous MPSoCs with dedicated hardware accelerators. In Proceedings of the IEEE 29th International Conference on Computer Design (ICCD’11). 183189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  186. [186] Xiang Yi and Pasricha Sudeep. 2015. Soft and hard reliability-aware scheduling for multicore embedded systems with energy harvesting. IEEE Transactions on Multi-Scale Computing Systems 1, 4 (2015), 220235.Google ScholarGoogle ScholarCross RefCross Ref
  187. [187] Xiao Yao, Nazarian Shahin, and Bogdan Paul. 2019. Self-optimizing and self-programming computing systems: A combined compiler, complex networks, and machine learning approach. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27, 6 (2019), 14161427.Google ScholarGoogle ScholarDigital LibraryDigital Library
  188. [188] Xiao Yao, Nazarian Shahin, and Bogdan Paul. 2021. Plasticity-on-chip design: Exploiting self-similarity for data communications. IEEE Transactions on Computers 70, 6 (2021), 950962.Google ScholarGoogle ScholarCross RefCross Ref
  189. [189] Xiong Yan, Zhou Jian, Pal Subhankar, Blaauw David, Kim Hun-Seok, Mudge Trevor, Dreslinski Ronald, and Chakrabarti Chaitali. 2020. Accelerating deep neural network computation on a low power reconfigurable architecture. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’20). 15.Google ScholarGoogle ScholarCross RefCross Ref
  190. [190] Zhang Chen, Li Peng, Sun Guangyu, Guan Yijin, Xiao Bingjun, and Cong Jason. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 161170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  191. [191] Zhang Yunming, Yang Mengjiao, Baghdadi Riyadh, Kamil Shoaib, Shun Julian, and Amarasinghe Saman. 2018. Graphlt: A high-performance graph DSL. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), Article 121, 30 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  192. [192] Zhao Zhongyuan, Sheng Weiguang, Wang Qin, Yin Wenzhi, Ye Pengfei, Li Jinchao, and Mao Zhigang. 2020. Towards higher performance and robust compilation for CGRA modulo scheduling. IEEE Transactions on Parallel and Distributed Systems 31, 9 (2020), 22012219.Google ScholarGoogle ScholarCross RefCross Ref
  193. [193] Zhou Junlong, Sun Jin, Cong Peijin, Liu Zhe, Zhou Xiumin, Wei Tongquan, and Hu Shiyan. 2019. Security-critical energy-aware task scheduling for heterogeneous real-time MPSoCs in IoT. IEEE Transactions on Services Computing 13, 4 (2019), 745758.Google ScholarGoogle ScholarCross RefCross Ref
  194. [194] Zhou Junlong, Zhang Mingyue, Sun Jin, Wang Tian, Zhou Xiumin, and Hu Shiyan. 2022. DRHEFT: Deadline-constrained reliability-aware HEFT algorithm for real-time heterogeneous MPSoC systems. IEEE Transactions on Reliability 71, 1 (2022), 178–189.Google ScholarGoogle Scholar

Index Terms

  1. Domain-Specific Architectures: Research Problems and Promising Approaches

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format