research-article

Domain-Specific Architectures: Research Problems and Promising Approaches

Authors:
Anish Krishnakumar

University of Wisconsin–Madison

University of Wisconsin–Madison

0000-0003-2419-1860
View Profile

,
Umit Ogras

University of Wisconsin–Madison

University of Wisconsin–Madison

0000-0002-5045-5535
View Profile

,
Radu Marculescu

The University of Texas at Austin

The University of Texas at Austin

0000-0003-1826-7646
View Profile

,
Mike Kishinevsky

Intel Corporation

Intel Corporation

0000-0002-5593-9694
View Profile

,
Trevor Mudge

University of Michigan

University of Michigan

0000-0001-7845-2187
View Profile

Authors Info & Claims

ACM Transactions on Embedded Computing Systems Volume 22 Issue 2Article No.: 28pp 1–26https://doi.org/10.1145/3563946

Published:24 January 2023Publication History

ACM Transactions on Embedded Computing Systems

Abstract

Process technology-driven performance and energy efficiency improvements have slowed down as we approach physical design limits. General-purpose manycore architectures attempt to circumvent this challenge, but they have a significant performance and energy-efficient gap compared to special-purpose solutions. Domain-specific architectures (DSAs), an instance of heterogeneous architectures, efficiently combine general-purpose cores and specialized hardware accelerators to boost energy efficiency and provide programming flexibility. Indeed, the hardware, software, and systems aspects in DSAs are highly tailored to maximize the energy efficiency of applications in a target domain. As DSAs and their conceptualization advance rapidly, there is a strong need to understand the research problems that need immediate attention. This article discusses the primary research directions in the design and runtime management of DSAs. Then, it surveys some promising approaches and highlights the outstanding research needs.

REFERENCES

[1] Apple. [n.d.]. Apple Secure Enclave. Retrieved May 15, 2022 from https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web.Google Scholar
[2] Cadence. [n.d.]. ARM CoreLink Interconnects Whitepaper. Retrieved May 15, 2022 from https://ip.cadence.com/uploads/251/white-paper-interconnect-solutions-debugging-issues-advanced-ARM-CoreLink-pdf.Google Scholar
[3] ARM. [n.d.]. ARM TrustZone. Retrived May 15, 2022 from https://developer.arm.com/documentation/PRD29-GENC-009492/c/TrustZone-Hardware-Architecture.Google Scholar
[4] Google. [n.d.]. Google’s Thrust Towards Open-Source Hardware. Retrieved May 15, 2022 from https://opensource.googleblog.com/2019/05/google-fosters-open-source-hardware.html.Google Scholar
[5] Aakash Jani. 2022. Year in Review: PC Processors Adopt Hybrid CPUs. Retrieved May 15, 2022 from https://www.techinsights.com/blog/year-review-pc-processors-adopt-hybrid-cpus.Google Scholar
[6] Retrieved May 15, 2022 from https://futurenetworks.ieee.org/images/files/pdf/FirstResponder/Tom-Rondeau-DARPA.pdf.Google Scholar
[7] Siemens. [n.d.]. Veloce2 Emulator. Retrieved May 15, 2022 from https://www.mentor.com/products/fv/emulation-systems/veloce.Google Scholar
[8] Synopsys. [n.d.]. ZeBu Server 4. Retrieved May 15, 2022 from https://www.synopsys.com/verification/emulation/zebu-server.html.Google Scholar
[9] Ajayi Tutu, Chhabria Vidya A., Fogaça Mateus, Hashemi Soheil, Hosny Abdelrahman, Kahng Andrew B., Kim Minsoo, et al. 2019. Toward an open-source digital flow: First learnings from the OpenROAD project. In Proceedings of the 56th Annual Design Automation Conference. 1–4.Google ScholarDigital Library
[10] Amarnath Aporva, Pal Subhankar, Kassa Hiwot Tadese, Vega Augusto, Buyuktosunoglu Alper, Franke Hubertus, Wellman John-David, Dreslinski Ronald, and Bose Pradip. 2021. Heterogeneity-aware scheduling on SoCs for autonomous vehicles. IEEE Computer Architecture Letters 20, 2 (2021), 82–85.Google ScholarCross Ref
[11] Arabnejad Hamid and Barbosa Jorge G.. 2013. List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Transactions on Parallel and Distributed Systems 25, 3 (2013), 682–694.Google ScholarDigital Library
[12] Arda Samet, Krishnakumar Anish, Goksoy Ahmet Alper, Mack Joshua, Kumbhare Nirmal, Sartor Anderson Luiz, Akoglu Ali, Marculescu Radu, and Ogras Umit Y.. 2020. DS3: A system-level domain-specific system-on-chip simulation framework. IEEE Transactions on Computers 69, 8 (2020), 1248–1262.Google Scholar
[13] Asanovic Krste, Bodik Ras, Catanzaro Bryan Christopher, Gebis Joseph James, Husbands Parry, Keutzer Kurt, Patterson David A., et al. 2006. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report No. UCB/EECS-2006-183. EECS Department, University of California, Berkeley.Google Scholar
[14] Asanović Krste and Patterson David A.. 2014. Instruction Sets Should Be Free: The Case for RISC-V. Technical Report No. UCB/EECS-2014-146. EECS Department, University of California, Berkeley.Google Scholar
[15] Atienza David, Valle Pablo G. Del, Paci Giacomo, Poletti Francesco, Benini Luca, Micheli Giovanni De, Mendias Jose M., and Hermida Roman. 2008. HW-SW emulation framework for temperature-aware design in MPSoCs. ACM Transactions on Design Automation of Electronic Systems 12, 3 (2008), 1–26.Google ScholarDigital Library
[16] Atitallah Rabie Ben, Niar Smail, Meftali Samy, and Dekeyser Jean-Luc. 2007. An MPSoC performance estimation framework using transaction level modeling. In Proceedings of the International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’07). 525–533.Google ScholarDigital Library
[17] Augonnet Cédric, Thibault Samuel, Namyst Raymond, and Wacrenier Pierre-André. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187–198.Google ScholarDigital Library
[18] al Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Ross Daly, Caleb Donovick, David Durst, et. 2020. Creating an agile hardware design flow. In Proceedings of the 57th ACM/IEEE Design Automation Conference (DAC’20). 1–6.Google Scholar
[19] Bartholomew Daniel. 2006. QEMU: A multihost, multitarget emulator. Linux Journal 2006, 145 (2006), 3.Google ScholarDigital Library
[20] Behzadan Amir H., Timm Brian W., and Kamat Vineet R.. 2008. General-purpose modular hardware and software framework for mobile outdoor augmented reality applications in engineering. Advanced Engineering Informatics 22, 1 (2008), 90–105.Google ScholarDigital Library
[21] Bellard Fabrice. 2005. QEMU, A fast and portable dynamic translator. In Proceedings of the USENIX Annual Technical Conference: FREENIX Track, Vol. 41. 10–5555.Google Scholar
[22] Beltrame Giovanni, Fossati Luca, and Sciuto Donatella. 2009. ReSP: A nonintrusive transaction-level reflective MPSoC simulation platform for design space exploration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 12 (2009), 1857–1869.Google ScholarDigital Library
[23] Belviranli Mehmet E. and Vetter Jeffrey S.. 2019. FLAME: Graph-based hardware representations for rapid and precise performance modeling. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’19). 1775–1780.Google ScholarCross Ref
[24] Benini Luca, Bogliolo Alessandro, and Micheli Giovanni De. 2000. A survey of design techniques for system-level dynamic power management. IEEE Transactions on Very Scale Integration (VLSI) Systems 8, 3 (2000), 299–316.Google ScholarDigital Library
[25] Bhat Ganapati, Singla Gaurav, Unver Ali K., and Ogras Umit Y.. 2017. Algorithmic optimization of thermal and power management for heterogeneous mobile platforms. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 3 (2017), 544–557.Google ScholarDigital Library
[26] Bittencourt Luiz F., Sakellariou Rizos, and Madeira Edmundo R. M.. 2010. DAG scheduling using a lookahead variant of the heterogeneous earliest finish time algorithm. In Proceedings of the Euromicro Conference on Parallel, Distributed, and Network-Based Processing.27–34.Google ScholarDigital Library
[27] Boroujerdian Behzad, Jing Ying, Tripathy Devashree, Kumar Amit, Subramanian Lavanya, Yen Luke, Lee Vincent, et al. 2022. FARSI: An early-stage design space exploration framework to tame the domain-specific system-on-chip complexity. ACM Transactions on Embedded Computing Systems. Online, June 16, 2022.Google ScholarDigital Library
[28] Bose Pradip, Vega Augusto, Adve Sarita, Adve Vikram, Misailovic Sasa, Carloni Luca, Shepard Ken, Brooks David, Reddi Vijay Janapa, and Wei Gu-Yeon. 2021. Secure and resilient SoCs for autonomous vehicles. In International Workshop on Domain Specific System Architecture (DOSSA), in conjunction with IEEE International Symposium on High-Performance Computer Architecture (HPCA). https://scholar.google.com/scholar?hl=en&as_sdt=0%2C50&q=Secure+and+resilient+SoCs+for+autonomous+vehicles&btnG=.Google Scholar
[29] Braun Tracy D., Siegel Howard Jay, Beck Noah, Bölöni Ladislau L., Maheswaran Muthucumaru, Reuther Albert I., Robertson James P., et al. 2001. A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing 61, 6 (2001), 810–837.Google ScholarDigital Library
[30] Brown Kevin J., Sujeeth Arvind K., Lee Hyouk Joong, Rompf Tiark, Chafi Hassan, Odersky Martin, and Olukotun Kunle. 2011. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 89–100.Google ScholarDigital Library
[31] Burres Brad, Daly Dan, Debbage Mark, Louzoun Eliel, Severns-Williams Christine, Sundar Naru, Turbovich Nadav, Wolford Barry, and Li Yadong. 2021. Intel’s hyperscale-ready infrastructure processing unit (IPU). In Proceedings of the IEEE Hot Chips 33 Symposium (HCS’21). 1–16.Google ScholarCross Ref
[32] Burstein Idan. 2021. Nvidia data center processing unit (DPU) architecture. In Proceedings of the IEEE Hot Chips 33 Symposium (HCS’21). 1–20.Google ScholarCross Ref
[33] Canis Andrew, Choi Jongsok, Aldham Mark, Zhang Victor, Kammoona Ahmed, Czajkowski Tomasz, Brown Stephen D., and Anderson Jason H.. 2013. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Transactions on Embedded Computing Systems 13, 2 (2013), 1–27.Google ScholarDigital Library
[34] Carara Everton A., Oliveira Roberto P. De, Calazans Ney L. V., and Moraes Fernando G.. 2009. HeMPS—A framework for NoC-based MPSoC generation. In Proceedings of the International Symposium on Circuits and Systems. 1345–1348.Google ScholarCross Ref
[35] Carloni Luca P.. 2016. The case for embedded scalable platforms. In Proceedings of the ACM/EDAC/IEEE Design Automation Conference (DAC’16). 1–6.Google Scholar
[36] Castrillon Jeronimo, Leupers Rainer, and Ascheid Gerd. 2011. MAPS: Mapping concurrent dataflow applications to heterogeneous MPSoCs. IEEE Transactions on Industrial Informatics 9, 1 (2011), 527–545.Google ScholarCross Ref
[37] Challapalle Nagadastagiri, Rampalli Sahithi, Chandran Makesh, Kalsi Gurpreet, Subramoney Sreenivas, Sampson John, and Narayanan Vijaykrishnan. 2020. PSB-RNN: A processing-in-memory systolic array architecture using block circulant matrices for recurrent neural networks. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’20). 180–185.Google ScholarCross Ref
[38] Challapalle Nagadastagiri, Rampalli Sahithi, Jao Nicholas, Ramanathan Akshaykrishna, Sampson John, and Narayanan Vijaykrishnan. 2020. FARM: A flexible accelerator for recurrent and memory augmented neural networks. Journal of Signal Processing Systems 92, 11 (2020), 1247–1261.Google ScholarDigital Library
[39] Challapalle Nagadastagiri, Swaminathan Karthik, Chandramoorthy Nandhini, and Narayanan Vijaykrishnan. 2021. Crossbar based processing in memory accelerator architecture for graph convolutional networks. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD’21). 1–9.Google ScholarDigital Library
[40] Charif Amir, Busnot Gabriel, Mameesh Rania, Sassolas Tanguy, and Ventroux Nicolas. 2019. Fast virtual prototyping for embedded computing systems design and exploration. In Proceedings of Rapid Simulation and Performance Evaluation: Methods and Tools. 1–8.Google ScholarDigital Library
[41] Chen Kuan-Yu, Yang Chi-Sheng, Sun Yu-Hsiu, Tseng Chien-Wei, Fayazi Morteza, He Xin, Feng Siying, et al. 2022. A 507 GMACs/J 256-core domain adaptive systolic-array-processor for wireless communication and linear-algebra kernels in 12nm FINFET. In Proceedings of the 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits’22).Google Scholar
[42] Chen Wen, Ray Sandip, Bhadra Jayanta, Abadir Magdy, and Wang Li-C.. 2017. Challenges and trends in modern SoC design verification. IEEE Design & Test 34, 5 (2017), 7–22.Google ScholarCross Ref
[43] Chin S. Alexander and Anderson Jason H.. 2018. An architecture-agnostic integer linear programming approach to CGRA mapping. In Proceedings of the 55th Annual Design Automation Conference. 1–6.Google ScholarDigital Library
[44] Chou Chen-Ling, Ogras Umit Y., and Marculescu Radu. 2008. Energy-and performance-aware incremental mapping for networks on chip with multiple voltage levels. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 10 (2008), 1866–1879.Google ScholarDigital Library
[45] Cong Jason, Huang Hui, Ma Chiyuan, Xiao Bingjun, and Zhou Peipei. 2014. A fully pipelined and dynamically composable architecture of CGRA. In Proceedings of the Annual International Symposium on Field-Programmable Custom Computing Machines. 9–16.Google ScholarCross Ref
[46] Cong Jason, Sarkar Vivek, Reinman Glenn, and Bui Alex. 2010. Customizable domain-specific computing. IEEE Design & Test of Computers 28, 2 (2010), 6–15.Google ScholarDigital Library
[47] David Robert, Duke Jared, Jain Advait, Reddi Vijay Janapa, Jeffries Nat, Li Jian, Kreeger Nick, et al. 2021. TensorFlow Lite Micro: Embedded machine learning for TinyML systems. Proceedings of Machine Learning and Systems 3 (2021), 800–811.Google Scholar
[48] Davis Robert I. and Burns Alan. 2011. A survey of hard real-time scheduling for multiprocessor systems. ACM Computing Surveys 43, 4 (2011), 1–44.Google ScholarDigital Library
[49] Dey Somdip, Singh Amit Kumar, and McDonald-Maier Klaus. 2021. ThermalAttackNet: Are CNNs making it easy to perform temperature side-channel attack in mobile edge devices?Future Internet 13, 6 (2021), 146.Google ScholarCross Ref
[50] Dey Somdip, Singh Amit Kumar, and McDonald-Maier Klaus Dieter. 2019. P-EdgeCoolingMode: An agent-based performance aware thermal management unit for DVFS enabled heterogeneous MPSoCs. IET Computers & Digital Techniques 13, 6 (2019), 514–523.Google ScholarCross Ref
[51] Donyanavard Bryan, Mück Tiago, Sarma Santanu, and Dutt Nikil. 2016. SPARTA: Runtime task allocation for energy efficient heterogeneous manycores. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’16). 1–10.Google ScholarDigital Library
[52] Dou Yong, Vassiliadis Stamatis, Kuzmanov Georgi Krasimirov, and Gaydadjiev Georgi Nedeltchev. 2005. 64-bit floating-point FPGA matrix multiplication. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 86–95.Google ScholarDigital Library
[53] D’Souza Sandeep and Rajkumar Ragunathan. 2018. CycleTandem: Energy-saving scheduling for real-time systems with hardware accelerators. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS’18). 94–106.Google ScholarCross Ref
[54] Duque Laura A. Rozo, Diaz Jose M. Monsalve, and Yang Chengmo. 2015. Improving MPSoC reliability through adapting runtime task schedule based on time-correlated fault behavior. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’15). 818–823.Google ScholarCross Ref
[55] Esmaeilzadeh Hadi, Blem Emily, Amant Renee St., Sankaralingam Karthikeyan, and Burger Doug. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11). IEEE, Los Alamitos, CA, 365–376.Google ScholarDigital Library
[56] Ewald Roland and Uhrmacher Adelinde M.. 2014. SESSL: A domain-specific language for simulation experiments. ACM Transactions on Modeling and Computer Simulation 24, 2 (2014), 1–25.Google ScholarDigital Library
[57] Ferrante Jeanne, Ottenstein Karl J., and Warren Joe D.. 1987. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems 9, 3 (1987), 319–349.Google ScholarDigital Library
[58] Firouzi Farshad, Azarpeyvand Ali, Salehi Mostafa E., and Fakhraie Sied Mehdi. 2012. Adaptive fault-tolerant DVFS with dynamic online AVF prediction. Microelectronics Reliability 52, 6 (2012), 1197–1208.Google ScholarCross Ref
[59] Fonseca Alcides and Cabral Bruno. 2017. Prototyping a GPGPU neural network for deep-learning big data analysis. Big Data Research 8 (2017), 50–56.Google ScholarCross Ref
[60] Frigo Matteo and Johnson Steven G.. 1998. FFTW: An adaptive software architecture for the FFT. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’98), Vol. 3. 1381–1384.Google ScholarCross Ref
[61] Gajski Daniel D., Abdi Samar, Gerstlauer Andreas, and Schirner Gunar. 2009. Embedded System Design: Modeling, Synthesis and Verification. Springer Science & Business Media.Google ScholarDigital Library
[62] Gajski Daniel D., Narayan Sanjiv, Ramachandran Loganath, Vahid Frank, and Fung Peter. 1996. System design methodologies: Aiming at the 100 h design cycle. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 4, 1 (1996), 70–82.Google ScholarDigital Library
[63] Gajski Daniel D., Zhu Jianwen, Dömer Rainer, Gerstlauer Andreas, and Zhao Shuqing. 2012. SpecC: Specification Language and Methodology. Springer Science & Business Media.Google Scholar
[64] Galassi Mark, Davies Jim, Theiler James, Gough Brian, Jungman Gerard, Alken Patrick, Booth Michael, Rossi Fabrice, and Ulerich Rhys. 2002. GNU Scientific Library. Network Theory Limited.Google Scholar
[65] P. Gambron and S. Thorne. 2020. Comparison of Several FFT Libraries in C/C++. Technical Report. STFC.Google Scholar
[66] Geer David. 2005. Chip makers turn to multicore processors. Computer 38, 5 (2005), 11–13.Google ScholarDigital Library
[67] Genko Nicolas, Atienza David, Micheli Giovanni De, and Benini Luca. 2007. Feature-NoC emulation: A tool and design flow for MPSoC. IEEE Circuits and Systems Magazine 7, 4 (2007), 42–51.Google ScholarCross Ref
[68] Gepner Pawel and Kowalik Michal Filip. 2006. Multi-core processors: New way to achieve high system performance. In Proceedings of the International Symposium on Parallel Computing in Electrical Engineering (PARELEC’06). 9–13.Google ScholarDigital Library
[69] Gerstlauer Andreas. 2010. Host-compiled simulation of multi-core platforms. In Proceedings of the 21st IEEE International Symposium on Rapid System Protyping. 1–6.Google ScholarCross Ref
[70] Gerstlauer Andreas, Haubelt Christian, Pimentel Andy D., Stefanov Todor P., Gajski Daniel D., and Teich Jürgen. 2009. Electronic system-level synthesis methodologies. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 10 (2009), 1517–1530.Google ScholarDigital Library
[71] Frank Ghenassia (Ed.). 2005. Transaction-Level Modeling with SystemC. Vol. 2. Springer.Google Scholar
[72] Giri Davide, Chiu Kuan-Lin, Eichler Guy, Mantovani Paolo, and Carloni Luca P.. 2021. Accelerator integration for open-source SoC design. IEEE Micro 41, 4 (2021), 8–14.Google ScholarCross Ref
[73] Giri Davide, Mantovani Paolo, and Carloni Luca P.. 2018. NoC-based support of heterogeneous cache-coherence models for accelerators. In Proceedings of the IEEE/ACM International Symposium on Networks-on-Chip (NoCS’18). 1–8.Google ScholarCross Ref
[74] Daniel S. Green. 2018. Heterogeneous Integration at DARPA: Pathfinding and Progress in Assembly Approaches. DARPA.Google Scholar
[75] Green Oded, McColl Robert, and Bader David A.. 2012. GPU merge path: A GPU merging algorithm. In Proceedings of the ACM International Conference on Supercomputing. 331–340.Google ScholarDigital Library
[76] Greengard Samuel. 2020. Will RISC-V revolutionize computing?Communications of the ACM 63, 5 (2020), 30–32.Google ScholarDigital Library
[77] Gries Matthias. 2004. Methods for evaluating and covering the design space during early design development. Integration 38, 2 (2004), 131–183.Google ScholarDigital Library
[78] Grötker Thorsten, Liao Stan, Martin Grant, and Swan Stuart. 2007. System Design with SystemCTM. Springer Science & Business Media.Google Scholar
[79] Halambi Ashok, Grun Peter, Ganesh Vijay, Khare Asheesh, Dutt Nikil, and Nicolau Alex. 2008. EXPRESSION: A language for architecture exploration through compiler/simulator retargetability. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’08). 31–45.Google ScholarCross Ref
[80] Han Sodam, Yun Yonghee, Kim Young Hwan, and Kang Seokhyeong. 2020. Proactive scenario characteristic-aware online power management on mobile systems. IEEE Access 8 (2020), 69695–69711.Google ScholarCross Ref
[81] Hanumaiah Vinay, Desai Digant, Gaudette Benjamin, Wu Carole-Jean, and Vrudhula Sarma. 2014. STEAM: A smart temperature and energy aware multicore controller. ACM Transactions on Embedded Computing Systems 13, 5s (2014), 1–25.Google ScholarDigital Library
[82] Hanumaiah Vinay and Vrudhula Sarma. 2012. Energy-efficient operation of multicore processors by DVFS, task migration, and active cooling. IEEE Transactions on Computers 63, 2 (2012), 349–360.Google ScholarDigital Library
[83] Hardkernel ODROID Wiki. [n.d.].. ODROID-XU3. Retrieved May 15, 2022 from https://wiki.odroid.com/old_product/odroid-xu3/odroid-xu3.Google Scholar
[84] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
[85] Hennessy John and Patterson David. 2018. A new golden age for computer architecture: Domain-specific hardware/software co-design, enhanced. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18).Google Scholar
[86] Hennessy John L. and Patterson David A.. 2019. A new golden age for computer architecture. Communications of the ACM 62, 2 (2019), 48–60.Google ScholarDigital Library
[87] Hu Jingcao and Marculescu Radu. 2005. Energy-and performance-aware mapping for regular NoC architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24, 4 (2005), 551–562.Google ScholarDigital Library
[88] Hu Zhengbing, Zhang Qingying, Petoukhov Sergey, and He Matthew. 2021. Advances in Artificial Systems for Logistics Engineering. Springer.Google ScholarCross Ref
[89] Huang Jia, Blech Jan Olaf, Raabe Andreas, Buckl Christian, and Knoll Alois. 2011. Analysis and optimization of fault-tolerant task scheduling on multiprocessor embedded systems. In Proceedings of the 7th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. 247–256.Google ScholarDigital Library
[90] Huang Lin, Yuan Feng, and Xu Qiang. 2009. Lifetime reliability-aware task allocation and scheduling for MPSoC platforms. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’09). 51–56.Google Scholar
[91] Hutter Michael and Schmidt Jörn-Marc. 2013. The temperature side channel and heating fault attacks. In Proceedings of the International Conference on Smart Card Research and Advanced Applications. 219–235.Google Scholar
[92] Jamieson Peter, Kent Kenneth B., Gharibian Farnaz, and Shannon Lesley. 2010. Odin II—An open-source Verilog HDL synthesis tool for CAD research. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. 149–156.Google ScholarDigital Library
[93] James Jeffers, James Reinders, and Avinash Sodani. 2016. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann.Google Scholar
[94] Jin Shiyuan, Schiavone Guy, and Turgut Damla. 2008. A performance study of multiprocessor task scheduling algorithms. Journal of Supercomputing 43, 1 (2008), 77–97.Google ScholarDigital Library
[95] Jouppi Norman P. and Wall David W.. 1989. Available instruction-level parallelism for superscalar and superpipelined machines. ACM SIGARCH Computer Architecture News 17, 2 (1989), 272–282.Google ScholarDigital Library
[96] Jouppi Norman P., Yoon Doe Hyun, Ashcraft Matthew, Gottscho Mark, Jablin Thomas B., Kurian George, Laudon James, et al. 2021. Ten lessons from three generations shaped Google’s TPUv4i: Industrial product. In Proceedings of the ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA’21). 1–14.Google ScholarDigital Library
[97] Khamis Mostafa, El-Ashry Sameh, Shalaby Ahmed, AbdElsalam Mohamed, and El-Kharashi M. Watheq. 2018. A configurable RISC-V for NoC-based MPSoCs: A framework for hardware emulation. In Proceedings of the 11th International Workshop on Network on Chip Architectures (NoCArc’18). 1–6.Google ScholarCross Ref
[98] Kim Sung, Fayazi Morteza, Daftardar Alhad, Chen Kuan-Yu, Tan Jielun, Pal Subhankar, Ajayi Tutu, et al. 2022. Versa: A 36-core systolic multiprocessor with dynamically reconfigurable interconnect and memory. IEEE Journal of Solid-State Circuits 57, 4 (2022), 986–998.Google ScholarCross Ref
[99] Kong Joonho, Chung Sung Woo, and Skadron Kevin. 2012. Recent thermal management techniques for microprocessors. ACM Computing Surveys 44, 3 (2012), 1–42.Google ScholarDigital Library
[100] Kotsifakou Maria, Srivastava Prakalp, Sinclair Matthew D., Komuravelli Rakesh, Adve Vikram, and Adve Sarita. 2018. HPVM: Heterogeneous parallel virtual machine. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 68–80.Google ScholarDigital Library
[101] Krishnakumar Anish, Arda Samet E., Goksoy A. Alper, Mandal Sumit K., Ogras Umit Y., Sartor Anderson L., and Marculescu Radu. 2020. Runtime task scheduling using imitation learning for heterogeneous many-core systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 4064–4077.Google ScholarCross Ref
[102] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (2012).Google Scholar
[103] Kukkala Vipin Kumar, Pasricha Sudeep, and Bradley Thomas. 2020. SEDAN: Security-aware design of time-critical automotive networks. IEEE Transactions on Vehicular Technology 69, 8 (2020), 9017–9030.Google ScholarCross Ref
[104] Kurth Andreas, Vogel Pirmin, Capotondi Alessandro, Marongiu Andrea, and Benini Luca. 2017. HERO: Heterogeneous embedded research platform for exploring RISC-V manycore accelerators on FPGA. arXiv preprint arXiv:1712.06497 (2017).Google Scholar
[105] Lategahn Henning, Geiger Andreas, and Kitt Bernd. 2011. Visual SLAM for autonomous ground vehicles. In Proceedings of the International Conference on Robotics and Automation. 1732–1737.Google ScholarCross Ref
[106] Lattner Chris and Adve Vikram. 2004. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization. 75–86.Google ScholarDigital Library
[107] Lee Seyong, Min Seung-Jai, and Eigenmann Rudolf. 2009. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. ACM SIGPLAN Notices 44, 4 (2009), 101–110.Google ScholarDigital Library
[108] Lin Ching-Chi, Syu You-Cheng, Chang Chao-Jui, Wu Jan-Jan, Liu Pangfeng, Cheng Po-Wen, and Hsu Wei-Te. 2015. Energy-efficient task scheduling for multi-core platforms with per-core DVFS. Journal of Parallel and Distributed Computing 86 (2015), 71–81.Google ScholarDigital Library
[109] Lin Shih-Chieh, Zhang Yunqi, Hsu Chang-Hong, Skach Matt, Haque E., Tang Lingjia, and Mars Jason. 2018. The architectural implications of autonomous driving: Constraints and acceleration. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 751–766.Google ScholarDigital Library
[110] Liu Chen, Rajendran Jeyavijayan, Yang Chengmo, and Karri Ramesh. 2014. Shielding heterogeneous MPSoCs from untrustworthy 3PIPs through security-driven task scheduling. IEEE Transactions on Emerging Topics in Computing 2, 4 (2014), 461–472.Google ScholarCross Ref
[111] Liu Leibo, Zhu Jianfeng, Li Zhaoshi, Lu Yanan, Deng Yangdong, Han Jie, Yin Shouyi, and Wei Shaojun. 2019. A survey of coarse-grained reconfigurable architecture and design: Taxonomy, challenges, and applications. ACM Computing Surveys 52, 6 (2019), 1–39.Google ScholarDigital Library
[112] Mack Joshua, Arda Samet, Ogras Umit Y., and Akoglu Ali. 2021. Performant, multi-objective scheduling of highly interleaved task graphs on heterogeneous system on chip devices. IEEE Transactions on Parallel and Distributed Systems 33 (2021), 2148–2162.Google Scholar
[113] Mack Joshua, Hassan Sahil, Kumbhare Nirmal, Gonzalez Miguel Castro, and Akoglu Ali. 2022. CEDR—A compiler-integrated, extensible DSSoC runtime. ACM Transactions on Embedded Computing Systems. Online, April 13, 2022.Google ScholarDigital Library
[114] Mack Joshua, Kumbhare Nirmal, Krishnakumar Anish, Ogras Umit Y., and Akoglu Ali. 2020. User-space emulation framework for domain-specific SoC design. In Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’20). 44–53.Google ScholarCross Ref
[115] Malawade Arnav, Odema Mohanad, Lajeunesse-DeGroot Sebastien, and Faruque Mohammad Abdullah Al. 2021. SAGE: A split-architecture methodology for efficient end-to-end autonomous vehicle control. ACM Transactions on Embedded Computing Systems 20, 5s (2021), 1–22.Google ScholarDigital Library
[116] Mandal Dipan Kumar, Jandhyala Srivatsava, Omer Om J., Kalsi Gurpreet S., George Biji, Neela Gopi, Rethinagiri Santhosh Kumar, et al. 2019. Visual inertial odometry at the edge: A hardware-software co-design approach for ultra-low latency and power. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’19). 960–963.Google ScholarCross Ref
[117] Mandal Sumit K., Bhat Ganapati, Patil Chetan Arvind, Doppa Janardhan Rao, Pande Partha Pratim, and Ogras Umit Y.. 2019. Dynamic resource management of heterogeneous mobile platforms via imitation learning. IEEE Transactions on Very Large Scale Integration (VLSI) Systems.Google ScholarCross Ref
[118] Mandal Sumit K., Krishnakumar Anish, and Ogras Umit Y.. 2021. Energy-efficient networks-on-chip architectures: Design and run-time optimization. In Network-on-Chip Security and Privacy. Springer, 55–75.Google ScholarCross Ref
[119] Mantovani Paolo, Giri Davide, Guglielmo Giuseppe Di, Piccolboni Luca, Zuckerman Joseph, Cota Emilio G., Petracca Michele, Pilato Christian, and Carloni Luca P.. 2020. Agile SoC development with open ESP. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD’20). 1–9.Google ScholarDigital Library
[120] Mao Hongzi, Alizadeh Mohammad, Menache Ishai, and Kandula Srikanth. 2016. Resource management with deep reinforcement learning. In Proceedings of the ACM Workshop on Hot Topics in Networks. 50–56.Google ScholarDigital Library
[121] Mao Hongzi, Schwarzkopf Malte, Venkatakrishnan Shaileshh Bojja, Meng Zili, and Alizadeh Mohammad. 2019. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM’19). ACM, New York, NY, 270–288.Google ScholarDigital Library
[122] Marculescu Radu, Ogras Umit Y., Peh Li-Shiuan, Jerger Natalie Enright, and Hoskote Yatin. 2008. Outstanding research problems in NoC design: System, microarchitecture, and circuit perspectives. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 1 (2008), 3–21.Google ScholarDigital Library
[123] Marsan Laurent and Sagot Marie-France. 2000. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Journal of Computational Biology 7, 3-4 (2000), 345–362.Google ScholarCross Ref
[124] Mashey John R.. 2021. Interactions, impacts, and coincidences of the first golden age of computer architecture. IEEE Micro 41, 6 (2021), 131–139.Google ScholarDigital Library
[125] Mernik Marjan, Heering Jan, and Sloane Anthony M.. 2005. When and how to develop domain-specific languages. ACM Computing Surveys 37, 4 (2005), 316–344.Google ScholarDigital Library
[126] Mittal Sparsh. 2020. A survey of FPGA-based accelerators for convolutional neural networks. Neural Computing and Applications 32, 4 (2020), 1109–1139.Google ScholarDigital Library
[127] Mittal Sparsh and Vetter Jeffrey S.. 2015. A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys 47, 4 (2015), 1–35.Google ScholarDigital Library
[128] Moazzemi Kasra, Maity Biswadip, Yi Saehanseul, Rahmani Amir M., and Dutt Nikil. 2019. HESSLE-FREE: Heterogeneous systems leveraging fuzzy control for runtime resource management. ACM Transactions on Embedded Computing Systems 18, 5s (2019), 1–19.Google ScholarDigital Library
[129] Mohanan Ashwin Vishnu, Bonamy Cyrille, and Augier Pierre. 2018. FluidFFT: Common API (C++ and Python) for fast Fourier transform HPC libraries. arXiv preprint arXiv:1807.01775 (2018).Google Scholar
[130] Mulas Fabrizio, Atienza David, Acquaviva Andrea, Carta Salvatore, Benini Luca, and Micheli Giovanni De. 2009. Thermal balancing policy for multiprocessor stream computing platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 12 (2009), 1870–1882.Google ScholarDigital Library
[131] Murdock Kit, Oswald David, Garcia Flavio D., Bulck Jo Van, Gruss Daniel, and Piessens Frank. 2020. Plundervolt: Software-based fault injection attacks against Intel SGX. In Proceedings of the IEEE Symposium on Security and Privacy (SP’20). 1466–1482.Google ScholarCross Ref
[132] Naghibijouybari Hoda, Neupane Ajaya, Qian Zhiyun, and Abu-Ghazaleh Nael. 2018. Rendered insecure: GPU side channel attacks are practical. In Proceedings of ACM SIGSAC Conference on Computer and Communications Security. 2139–2153.Google ScholarDigital Library
[133] Norrie Thomas, Patil Nishant, Yoon Doe Hyun, Kurian George, Li Sheng, Laudon James, Young Cliff, Jouppi Norman, and Patterson David. 2021. The design process for Google’s training chips: TPUv2 and TPUv3. IEEE Micro 41, 2 (2021), 56–63.Google ScholarCross Ref
[134] O’Mahony Niall, Campbell Sean, Carvalho Anderson, Harapanahalli Suman, Hernandez Gustavo Velasco, Krpalkova Lenka, Riordan Daniel, and Walsh Joseph. 2019. Deep learning vs. traditional computer vision. In Proceedings of the Science and Information Conference. 128–144.Google Scholar
[135] Padoin Edson Luiz, Pilla Laércio Lima, Castro Márcio, Boito Francieli Z., Navaux Philippe Olivier Alexandre, and Méhaut Jean-François. 2015. Performance/energy trade-off in scientific computing: The case of ARM big.LITTLE and Intel Sandy Bridge. IET Computers & Digital Techniques 9, 1 (2015), 27–35.Google ScholarCross Ref
[136] Pan Zhixin and Mishra Prabhat. 2021. Automated test generation for hardware Trojan detection using reinforcement learning. In Proceedings of the 26th Asia and South Pacific Design Automation Conference. 408–413.Google ScholarDigital Library
[137] Pasricha Sudeep, Ayoub Raid, Kishinevsky Michael, Mandal Sumit K., and Ogras Umit Y.. 2020. A survey on energy management for mobile and IoT devices. IEEE Design & Test 37, 5 (2020), 7–24.Google ScholarCross Ref
[138] Patterson David. 2018. 50 years of computer architecture: From the mainframe CPU to the domain-specific TPU and the open RISC-V instruction set. In Proceedings of the 2018 IEEE International Solid-State Circuits Conference-(ISSCC’18). IEEE, Los Alamitos, CA, 27–31.Google ScholarCross Ref
[139] Pérez Arturo, Rodríguez Alfonso, Otero Andrés, Arjona David González, Jiménez-Peralo Alvaro, Verdugo Miguel Ángel, and Torre Eduardo De La. 2020. Run-time reconfigurable MPSoC-based on-board processor for vision-based space navigation. IEEE Access 8 (2020), 59891–59905.Google ScholarCross Ref
[140] Puig Martín Pi, Giusti Laura Cristina De, Naiouf Marcelo, and Giusti Armando Eduardo De. 2019. A study of hardware performance counters selection for cross architectural GPU power modeling. In XXV Congreso Argentino de Ciencias de la Computación (CACIC’19).Google Scholar
[141] Pimentel Andy D., Erbas Cagkan, and Polstra Simon. 2006. A systematic approach to exploring embedded system architectures at multiple abstraction levels. IEEE Transactions on Computers 55, 2 (2006), 99–112.Google ScholarDigital Library
[142] Portugal Ivens, Alencar Paulo, and Cowan Donald. 2018. The use of machine learning algorithms in recommender systems: A systematic review. Expert Systems with Applications 97 (2018), 205–227.Google ScholarCross Ref
[143] Pu Jing, Bell Steven, Yang Xuan, Setter Jeff, Richardson Stephen, Ragan-Kelley Jonathan, and Horowitz Mark. 2017. Programming heterogeneous systems from an image processing DSL. ACM Transactions on Architecture and Code Optimization 14, 3 (2017), 1–25.Google ScholarDigital Library
[144] Punkka Timo. 2012. Agile hardware and co-design. In Proceedings of the Embedded Systems Conference. 1–8.Google Scholar
[145] Ragan-Kelley Jonathan, Adams Andrew, Sharlet Dillon, Barnes Connelly, Paris Sylvain, Levoy Marc, Amarasinghe Saman, and Durand Frédo. 2017. Halide: Decoupling algorithms from schedules for high-performance image processing. Communications of the ACM 61, 1 (2017), 106–115.Google ScholarDigital Library
[146] Reddy Basireddy Karunakar, Singh Amit Kumar, Biswas Dwaipayan, Merrett Geoff V., and Al-Hashimi Bashir M.. 2017. Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores. IEEE Transactions on Multi-Scale Computing Systems 4, 3 (2017), 369–382.Google ScholarCross Ref
[147] Riesgo Teresa, Torroja Yago, and Torre Eduardo De la. 1999. Design methodologies based on hardware description languages. IEEE Transactions on Industrial Electronics 46, 1 (1999), 3–12.Google ScholarCross Ref
[148] Rosing Tajana Simunic, Mihic Kresimir, and Micheli Giovanni De. 2007. Power and reliability management of SoCs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 15, 4 (2007), 391–403.Google ScholarDigital Library
[149] Saeed Ahmed, Elbably M., Abdelfadeel G., and Eladawy M. I.. 2009. Efficient FPGA implementation of FFT/IFFT processor. International Journal of Circuits, Systems and Signal Processing 3, 3 (2009), 103–110.Google Scholar
[150] Sahin Onur and Coskun Ayse K.. 2016. Providing sustainable performance in thermally constrained mobile devices. In Proceedings of the 14th ACM/IEEE Symposium on Embedded Systems for Real-Time Multimedia. 72–77.Google ScholarDigital Library
[151] Sarma Santanu and Dutt Nikil. 2014. FPGA emulation and prototyping of a cyberphysical-system-on-chip (CPSoC). In Proceedings of the IEEE International Symposium on Rapid System Prototyping. 121–127.Google ScholarCross Ref
[152] Sartor Anderson L., Krishnakumar Anish, Arda Samet E., Ogras Umit Y., and Marculescu Radu. 2020. HiLITE: Hierarchical and lightweight imitation learning for power management of embedded SoCs. IEEE Computer Architecture Letters 19, 1 (2020), 63–67.Google ScholarCross Ref
[153] Shao Yakun Sophia, Xi Sam Likun, Srinivasan Vijayalakshmi, Wei Gu-Yeon, and Brooks David. 2016. Co-designing accelerators and SoC interfaces using gem5-Aladdin. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1–12.Google ScholarDigital Library
[154] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
[155] Singh Amit Kumar, Shafique Muhammad, Kumar Akash, and Henkel Jörg. 2013. Mapping on multi/many-core systems: Survey of current and emerging trends. In Proceedings of the 50th ACM/EDAC/IEEE Design Automation Conference (DAC’13). 1–10.Google ScholarDigital Library
[156] Skillicorn David B. and Talia Domenico. 1998. Models and languages for parallel computation. ACM Computing Surveys, 2 (1998), 123–169.Google Scholar
[157] Spafford Kyle L. and Vetter Jeffrey S.. 2012. Aspen: A domain specific language for performance modeling. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). 1–11.Google ScholarDigital Library
[158] Stevens Ashley. 2014. Quality of Service (QoS) in ARM® Systems: An Overview. White Paper. ARM, Cambridge, UK.Google Scholar
[159] Suda Naveen, Chandra Vikas, Dasika Ganesh, Mohanty Abinash, Ma Yufei, Vrudhula Sarma, Seo Jae-Sun, and Cao Yu. 2016. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 16–25.Google ScholarDigital Library
[160] Sujeeth Arvind K., Brown Kevin J., Lee Hyoukjoong, Rompf Tiark, Chafi Hassan, Odersky Martin, and Olukotun Kunle. 2014. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Transactions on Embedded Computing Systems 13, 4s (2014), 1–25.Google ScholarDigital Library
[161] Suriano Leonardo, Madroñal Daniel, Rodríguez Alfonso, Juárez Eduardo, Sanz César, and Torre Eduardo de la. 2018. A unified hardware/software monitoring method for reconfigurable computing architectures using PAPI. In Proceedings of the 13th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC’18). 1–8.Google ScholarCross Ref
[162] Swaminathan Karthik and Vega Augusto. 2021. Hardware specialization: From cell to heterogeneous microprocessors everywhere. IEEE Micro 41, 6 (2021), 112–120.Google ScholarDigital Library
[163] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarCross Ref
[164] Tang Zhuo, Qi Ling, Cheng Zhenzhen, Li Kenli, Khan Samee U., and Li Keqin. 2016. An energy-efficient task scheduling algorithm in DVFS-enabled cloud environment. Journal of Grid Computing 14, 1 (2016), 55–74.Google ScholarDigital Library
[165] Tariq Umair Ullah, Wu Hui, and Ishak Suhaimi Abd. 2018. Energy-aware scheduling of conditional task graphs on NoC-based MPSoCs. In Proceedings of the 51st Hawaii International Conference on System Sciences.Google ScholarCross Ref
[166] Theis Thomas N. and Wong H.-S. Philip. 2017. The end of Moore’s law: A new beginning for information technology. Computing in Science & Engineering 19, 2 (2017), 41–50.Google ScholarDigital Library
[167] Topcuoglu Haluk, Hariri Salim, and Wu Min-You. 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13, 3 (2002), 260–274.Google ScholarDigital Library
[168] Tortorella Yvan, Bertaccini Luca, Rossi Davide, Benini Luca, and Conti Francesco. 2022. RedMulE: A compact FP16 matrix-multiplication accelerator for adaptive deep learning on RISC-V-based ultra-low-power SoCs. arXiv preprint arXiv:2204.11192 (2022).Google Scholar
[169] Tu Fengbin, Yin Shouyi, Ouyang Peng, Tang Shibin, Liu Leibo, and Wei Shaojun. 2017. Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 8 (2017), 2220–2233.Google ScholarDigital Library
[170] Uhrie Richard, Bliss Daniel W., Chakrabarti Chaitali, Ogras Umit Y., and Brunhaver John. 2019. Machine understanding of domain computation for domain-specific system-on-chips (DSSoC). In Open Architecture/Open Business Model Net-Centric Systems and Defense Transformation 2019, Vol. 11015. International Society for Optics and Photonics, SPIE, 180–187.Google Scholar
[171] Uhrie Richard, Chakrabarti Chaitali, and Brunhaver John. 2020. Automated parallel kernel extraction from dynamic application traces. arXiv preprint arXiv:2001.09995 (2020).Google Scholar
[172] Ullman J. D.. 1975. NP-complete scheduling problems. Journal of Computer and System Sciences 10, 3 (1975), 384–393. Google ScholarDigital Library
[173] Stralen Peter Van and Pimentel Andy. 2010. Scenario-based design space exploration of MPSoCs. In Proceedings of the IEEE International Conference on Computer Design. 305–312.Google ScholarCross Ref
[174] Varanasi Prashant and Heiser Gernot. 2011. Hardware-supported virtualization on ARM. In Proceedings of the 2nd Asia-Pacific Workshop on Systems. 1–5.Google ScholarDigital Library
[175] Vega Augusto, Wellman John-David, Franke Hubertus, Buyuktosunoglu Alper, Bose Pradip, Amarnath Aporva, Kassa Hiwot, Pal Subhankar, and Dreslinski Ronald. 2021. STOMP: Agile evaluation of scheduling policies in heterogeneous multi-processors. In Proceedings of the 3rd International Workshop on Domain Specific System Architecture in Conjunction with the 27th IEEE International Symposium on High-Performance Computer Architecture (DOSSA-3 @ HPCA’21).Google Scholar
[176] Ventroux Nicolas, Guerre Alexandre, Sassolas Tanguy, Moutaoukil L., Blanc Guillaume, Bechara Charly, and David Raphaël. 2010. SESAM: An MPSoC simulation environment for dynamic application processing. In Proceedings of the 10th IEEE International Conference on Computer and Information Technology. 1880–1886.Google ScholarDigital Library
[177] Walker Matthew J. P. and Anderson Jason H.. 2019. Generic connectivity-based CGRA mapping via integer linear programming. In Proceedings of the Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’19). 65–73.Google ScholarCross Ref
[178] Wang Bo, Ma Sheng, Zhu Guoyi, Yi Xiao, and Xu Rui. 2022. A novel systolic array processor with dynamic dataflows. Integration 85 (2022), 42–47.Google ScholarDigital Library
[179] Wang Endong, Zhang Qing, Shen Bo, Zhang Guangyong, Lu Xiaowei, Wu Qing, and Wang Yajuan. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi\(^{TM}\). Springer, 167–188.Google Scholar
[180] Wang Liang and Skadron Kevin. 2013. Implications of the power wall: Dim cores and reconfigurable logic. IEEE Micro 33, 5 (2013), 40–48.Google ScholarDigital Library
[181] Wang Yu Emma, Wei Gu-Yeon, and Brooks David. 2019. Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv preprint arXiv:1907.10701 (2019).Google Scholar
[182] Wei Xuechao, Yu Cody Hao, Zhang Peng, Chen Youxiang, Wang Yuxin, Hu Han, Liang Yun, and Cong Jason. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference. 1–6.Google ScholarDigital Library
[183] Wiens Jenna and Shenoy Erica S.. 2018. Machine learning for healthcare: On the verge of a major shift in healthcare epidemiology. Clinical Infectious Diseases 66, 1 (2018), 149–153.Google ScholarCross Ref
[184] Wijerathne Dhananjaya, Li Zhaoying, Pathania Anuj, Mitra Tulika, and Thiele Lothar. 2021. HiMap: Fast and scalable high-quality mapping on CGRA via hierarchical abstraction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41, 10 (2021), 3290–3303.Google Scholar
[185] Wu Yen-Kuan, Sharifi Shervin, and Rosing Tajana Simunic. 2011. Distributed thermal management for embedded heterogeneous MPSoCs with dedicated hardware accelerators. In Proceedings of the IEEE 29th International Conference on Computer Design (ICCD’11). 183–189.Google ScholarDigital Library
[186] Xiang Yi and Pasricha Sudeep. 2015. Soft and hard reliability-aware scheduling for multicore embedded systems with energy harvesting. IEEE Transactions on Multi-Scale Computing Systems 1, 4 (2015), 220–235.Google ScholarCross Ref
[187] Xiao Yao, Nazarian Shahin, and Bogdan Paul. 2019. Self-optimizing and self-programming computing systems: A combined compiler, complex networks, and machine learning approach. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27, 6 (2019), 1416–1427.Google ScholarDigital Library
[188] Xiao Yao, Nazarian Shahin, and Bogdan Paul. 2021. Plasticity-on-chip design: Exploiting self-similarity for data communications. IEEE Transactions on Computers 70, 6 (2021), 950–962.Google ScholarCross Ref
[189] Xiong Yan, Zhou Jian, Pal Subhankar, Blaauw David, Kim Hun-Seok, Mudge Trevor, Dreslinski Ronald, and Chakrabarti Chaitali. 2020. Accelerating deep neural network computation on a low power reconfigurable architecture. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’20). 1–5.Google ScholarCross Ref
[190] Zhang Chen, Li Peng, Sun Guangyu, Guan Yijin, Xiao Bingjun, and Cong Jason. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 161–170.Google ScholarDigital Library
[191] Zhang Yunming, Yang Mengjiao, Baghdadi Riyadh, Kamil Shoaib, Shun Julian, and Amarasinghe Saman. 2018. Graphlt: A high-performance graph DSL. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), Article 121, 30 pages.Google ScholarDigital Library
[192] Zhao Zhongyuan, Sheng Weiguang, Wang Qin, Yin Wenzhi, Ye Pengfei, Li Jinchao, and Mao Zhigang. 2020. Towards higher performance and robust compilation for CGRA modulo scheduling. IEEE Transactions on Parallel and Distributed Systems 31, 9 (2020), 2201–2219.Google ScholarCross Ref
[193] Zhou Junlong, Sun Jin, Cong Peijin, Liu Zhe, Zhou Xiumin, Wei Tongquan, and Hu Shiyan. 2019. Security-critical energy-aware task scheduling for heterogeneous real-time MPSoCs in IoT. IEEE Transactions on Services Computing 13, 4 (2019), 745–758.Google ScholarCross Ref
[194] Zhou Junlong, Zhang Mingyue, Sun Jin, Wang Tian, Zhou Xiumin, and Hu Shiyan. 2022. DRHEFT: Deadline-constrained reliability-aware HEFT algorithm for real-time heterogeneous MPSoC systems. IEEE Transactions on Reliability 71, 1 (2022), 178–189.Google Scholar

Index Terms

Domain-Specific Architectures: Research Problems and Promising Approaches
1. Computer systems organization
  1. Architectures
  2. Embedded and cyber-physical systems
    1. System on a chip
2. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures
  2. Very large scale integration design
    1. On-chip resource management

Recommendations

Exploring Domain-Specific Architectures for Energy-Efficient Wearable Computing
Abstract
This paper explores the use of domain-specific architectures for energy-efficient and flexible computing of a variety of workloads, including signal processing applications, in wearable devices. As wearable devices become more popular, and with ...
Read More
Reconfigurable Coprocessor for Multimedia Application Domain

A new reconfigurable architectural template is presented. Such a template is composed of coarse-grained and fine-grained reconfigurable datapath and control to obtain performances at custom designed chip level. To show the adaptability/performance of ...
Read More
A Domain-Specific System-On-Chip Design for Energy Efficient Wearable Edge AI Applications
ISLPED '22: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

Artificial intelligence (AI) based wearable applications collect and process a significant amount of streaming sensor data. Transmitting the raw data to cloud processors wastes scarce energy and threatens user privacy. Wearable edge AI devices should ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Embedded Computing Systems Volume 22, Issue 2
March 2023
560 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3572826
Editor:
Tulika Mitra
National University of Singapore, Singapore
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 24 January 2023
- Online AM: 21 September 2022
- Accepted: 10 August 2022
- Revised: 13 July 2022
- Received: 5 July 2022
Published in tecs Volume 22, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Domain-specific architectures
domain-specific system-on-chip
DSA runtime resource management
hardware architectures
emerging systems
runtime frameworks
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 1,940
  Total Downloads
- Downloads (Last 12 months)951
- Downloads (Last 6 weeks)110
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Domain-Specific Architectures: Research Problems and Promising Approaches

ACM Transactions on Embedded Computing Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Exploring Domain-Specific Architectures for Energy-Efficient Wearable Computing

Reconfigurable Coprocessor for Multimedia Application Domain

A Domain-Specific System-On-Chip Design for Energy Efficient Wearable Edge AI Applications