Efficient communication support in predictable heterogeneous MPSoC designs for streaming applications

https://doi.org/10.1016/j.sysarc.2013.04.005Get rights and content

Abstract

Streaming applications are an important class of applications in emerging embedded systems such as smart camera network, unmanned vehicles, and industrial printing. These applications are usually very computationally intensive and have real-time constraints. To meet the increasing demand for performance and efficiency in these applications, the use of application specific IP cores in heterogeneous Multi-Processor System-on-Chips (MPSoCs) becomes inevitable. However, two of the key challenges in integrating these IP cores into MPSoCs are (i) how to properly handle inter-core communication; (ii) how to map streaming applications in an efficient and predictable way. In this paper, we first present a predictable high-performance communication assist (CA) that helps to tackle these design challenges. The proposed CA has zero throughput overhead, negligible latency overhead, and significantly less resource usage compared to existing CA designs. The proposed CA also provides a unified abstract interface for both processors and accelerator IP cores with flexible data access support.

Based on the proposed CA design, we present a predictable heterogeneous multi-processor platform template for streaming applications. The template is used in a predictable design flow that uses Synchronous Data Flow (SDF) graphs for design time analysis. An accurate SDF model of our CA is introduced, enabling the mapping of applications onto heterogeneous MPSoCs in an efficient and predictable way. As a case study, we map the complete high-speed vision processing pipeline of an industrial application, Organic Light Emitting Diode (OLED) screen printing, onto one instance of the proposed platform. The result demonstrates that system design and analysis effort is greatly reduced with the proposed CA-based design flow.

Introduction

There is an increasing demand for running applications with high performance requirements on embedded systems that have relatively limited resources. For example, a smartphone has to run high-definition video codecs, wireless signal processing, and 3D graphics processing. A smart camera may combine high resolution video sensing, low-level to high-level vision processing, and communication within a single embedded device. All these applications require an enormous amount of computation power, and yet embedded system designers have to meet all these requirements with a very small area and power budget. In addition, embedded systems often need to provide real-time guarantees, which require that the timing behaviors of the applications can be conservatively analyzed at design time. Due to technology limitations, single processor systems can no longer keep up with the increasing demand of emerging applications. Multi-Processor System-on-Chips (MPSoCs) are becoming a promising choice to fulfill these requirements. In general, an MPSoC can be categorized as one of the following types:

  • 1.

    Homogeneous MPSoC: all processing components are of the same type of programmable processor and this processor type usually supports a wide range of applications. Such a system provides high flexibility and is relatively easy to use. But the overhead of supporting a wide range of applications in all cores may cause severe inefficiencies.

  • 2.

    Heterogeneous MPSoC: a processing component in such a system can either be a general purpose processor or an application specific IP tailored for a specific type of computation. By properly designing and configuring the system for the applications mapped onto it, a heterogeneous MPSoC can achieve much higher efficiency than a homogeneous one. A common choice is to use hardwired or weakly programmable IPs to accelerate tasks that require high performance.

Many applications, such as image processing, vision, and multimedia applications, can be categorized as streaming applications, which periodically execute similar operations on a stream of data items. Such applications are very common in embedded systems and many of them are computationally intensive. The Synchronous Data Flow (SDF) model-of-computation is a very powerful model for analyzing streaming applications [1]. SDF and its variants can be used to analyze at design-time the temporal behavior and resource requirements (e.g., buffer sizes) of applications [2], [3], [4]. To design a predictable system, we use SDF graphs to model streaming applications.

The use of hardwired or weakly programmable accelerator IPs in heterogeneous MPSoCs can greatly improve the performance and efficiency of implementations of streaming applications [5]. However, implementing applications on such a system is much more difficult compared to implementing applications on a system that contains only programmable cores. When mapping an application modeled with an SDF graph onto an MPSoC that contains accelerators, several problems have to be solved: (i) how to generate accelerator IPs for the application; (ii) how to integrate these IPs into the MPSoC; and (iii) how to predict performance/resource usage at design time. The first problem can be handled through the use of IP libraries or high level synthesis tools [6], [7]. In this work, we focus on solving the other two issues, i.e., efficiently integrating IPs into an MPSoC and mapping applications on these MPSoCs using a predictable design flow.

We propose a predictable hardware module called communication assist (CA), which serves as an abstract and unified communication interface between a generic IP and the interconnect in an MPSoC. Communication is separated from computation through the proposed CA. The benefits of introducing such a predictable hardware module are: (i) the CA interface acts as a uniform interface between many different types of IP blocks and interconnects, which makes it easier to re-use IP blocks in different designs; (ii) interface design of accelerator IPs is easier as complex communication functionality is offloaded to the CA, which requires only a one time design effort; (iii) system performance may also be increased because the CA enables overlapped communication and computation; (iv) the proposed CA is capable of providing IP cores high data-access bandwidth, which is usually a bottleneck in the implementation of streaming applications; (v) the proposed CA also supports design-time resource and timing analysis using SDF analysis techniques. This is essential when designing systems that should have a predictable timing behavior.

The key contributions of this paper are the following:

  • 1.

    We propose an efficient and predictable communication assist (CA) for integrating generic IP cores into predictable heterogeneous MPSoCs. Compared to other CA designs, the proposed CA has zero throughput overhead, negligible latency overhead, and significantly less resource usage. Flexible data accesses, such as non-destructive reads and out-of-order access patterns, can be handled within our CA. This flexibility improves the performance and reduces the complexity of IP core designs.

  • 2.

    We present a cycle-accurate SDF model for the proposed communication assist. By integrating this SDF model onto our SDF analysis tool, SDF3 [8], worst-case system properties, such as throughput, latency, and buffer sizes can be conservatively analyzed at design time.

  • 3.

    We propose a CA-based heterogeneous MPSoC template for systems running embedded streaming applications. This template allows combining general purpose processors and accelerator IP cores in a single MPSoC. The proposed template is used in our MAMPS+ tool flow, which maps application onto the proposed platform.

  • 4.

    As a case study, we map the vision processing pipeline of a typical industrial application, Organic Light Emitting Diode (OLED) screen printing, onto the proposed platform. This case study demonstrates that the proposed design flow enables efficient integration of accelerator IPs into a heterogeneous MPSoC which targets streaming applications.

The remainder of this paper is organized as follows. Section 2 introduces basic SDF concepts and explains how inter-core communication can be modeled using SDF. Section 3 discusses design requirements for a CA. In this section, we also introduce our communication assist along with its SDF model. Based on the CA design, we introduce a heterogeneous MPSoC template in Section 4, as well as a complete design flow for designing heterogeneous MPSoCs with both programmable cores and accelerator IPs. In Section 5, an industrial application is mapped to the proposed platform to demonstrate its effectiveness. Section 6 discusses related work for our CA, MPSoC template, and design flow. Finally, we conclude this work in Section 7.

Section snippets

Modeling applications

Synchronous Data Flow (SDF) is a model of computation commonly used to model streaming applications [1]. There exist many analysis algorithms for SDF that can be used to analyze at design time the throughput, latency, and buffer size requirements of applications modeled with an SDF graph [2], [3], [4]. An application modeled with an SDF graph consists of nodes called actors and edges, called channels, between these actors. Actors transfer data items called tokens to each other via channels. An

Communication assist

A communicate assist (CA) is a module that handles the communication between an IP core and other components in the system connected through the communication network. It enables overlapped communication and computation, thereby improving the performance significantly. Apart from performing data transfers like a Direct Memory Access (DMA) controller, a CA supports the communication protocol used by the programming model, which fully decouples an IP’s communication from its computation. For a

Proposed architecture template & tool flow

Compared to traditional design trajectories, today’s system design has become so complex that it is too time-consuming and error-prone to start the design process from the Register Transfer Level (RTL). To solve the issues we discussed in Section 1, i.e., efficiently integrating IPs into an MPSoC and mapping applications on these MPSoCs using a predictable design flow, moving up to a more abstract system level seems to be the only option. However, the drawback is that this abstraction also

Case study: vision processing in OLED printing

To demonstrate that the proposed MAMPS+ design flow enables efficient integration of accelerator IP cores into a heterogeneous MPSoC, we use an industrial high-speed camera application, Organic-Light-Emitting-Diode (OLED) printing, as a case study. In OLED manufacturing, organic materials need to be accurately injected into the tiny OLED substrates on the wafer, the size of which are typically in the range of 10μm to 1000μm. This fine process has to be done at an extremely high speed due to

Related work

There exist several works addressing the issue of the inter-component communication in MPSoCs. Gangwal et al. presented a synchronization scheme for embedded systems with shared memory, in which channel controllers are used for synchronization between tasks [22]. Compared to our communication assist, it is much slower and consumes more hardware resources. The work in [23], [14] presents a CA-based platform for ISA (Instruction-Set-Architecture) processors, on which the C-HEAP protocol [24] is

Conclusions and future work

In this paper, we presented a communication assist (CA) to efficiently integrate generic IPs into an MPSoC with a predictable design flow. The CA separates inter-core communication from the IP’s computation, and provides a unified abstract interface for accelerator IPs and processors. We also presented an accurate SDF model for the proposed CA, which makes it possible to provide timing guarantees for systems using the CA.

Based on the proposed CA design, we introduced a heterogeneous MPSoC

Acknowledgement

This work is supported by the Ministry of Economic Affairs of the Netherlands, Project EVA PID07121.

Yifan He received the B.S. and M.S. degrees (cum laude) in electrical engineering from Zhejiang University, Zhejiang, China, in 2004 and 2006, respectively. In 2008, he received a second M.S. degree (cum laude) in electrical engineering from the Eindhoven University of Technology (TU/e), Eindhoven, The Netherlands. He is currently pursuing the Ph.D. degree from the Electronic System Group, TU/e. His current research interests include low-power computer architecture design, predictable

References (31)

  • A. Shabbir et al.

    CA-MPSoC: an automated design flow for predictable multi-processor architectures for multiple applications

    Journal of Systems Architecture

    (2010)
  • E. Lee et al.

    Synchronous data flow

    Proceedings of the IEEE

    (1987)
  • A.H. Ghamarian, M.C.W. Geilen, S. Stuijk, T. Basten, B.D. Theelen, M.R. Mousavi, A.J.M. Moonen, M.J.G. Bekooij,...
  • S. Stuijk et al.

    Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs

    IEEE Transactions on Computers

    (2008)
  • Y. Yang, M. Geilen, T. Basten, S. Stuijk, H. Corporaal, Automated bottleneck-driven design-space exploration of media...
  • K. van Berkel, Multi-core for mobile phones, in: Proceedings of the Conference on Design, Automation and Test in,...
  • J. Villarreal, W. Najjar, Compiled hardware acceleration of molecular dynamics code, in: Proceedings of the 18th...
  • AutoESL,...
  • S. Stuijk, M. Geilen, T. Basten, SDF3: SDF For Free, in: Proceedings of the 6th International Conference on Application...
  • A. Shabbir, S. Stuijk, A. Kumar, B. Theelen, B. Mesman, H. Corporaal, A predictable communication assist, in:...
  • Xilinx,...
  • ARM, PrimeCell DMA controller,...
  • S. Han, A. Baghdadi, M. Bonaciu, S. Chae, A. Jerraya, An efficient scalable and flexible data transfer architecture for...
  • H. Nikolov, T. Stefanov, E. Deprettere, Multi-processor system design with ESPAM, in: Proceedings of the 4th...
  • A. Moonen, M. Bekooij, R. van den Berg, J. van Meerbergen, Decoupling of computation and communication with a...
  • Cited by (0)

    Yifan He received the B.S. and M.S. degrees (cum laude) in electrical engineering from Zhejiang University, Zhejiang, China, in 2004 and 2006, respectively. In 2008, he received a second M.S. degree (cum laude) in electrical engineering from the Eindhoven University of Technology (TU/e), Eindhoven, The Netherlands. He is currently pursuing the Ph.D. degree from the Electronic System Group, TU/e. His current research interests include low-power computer architecture design, predictable hardware/software systems.

    Dongrui She received the B.S. degree in electrical engineering from Zhejiang University, Zhejiang, China, in 2007. In 2009, he received the M.S. degree in computer science from the Eindhoven University of Technology (TU/e), Eindhoven, The Netherlands. He is currently pursuing the Ph.D. degree from the Electronic System Group, TU/e. His current research interests include low-power computer architecture and code generation.

    Sander Stuijk received his M.Sc. degree (cum laude) in Electrical Engineering in 2002 and his Ph.D. degree in 2007 from the Eindhoven University of Technology (TU/e), Eindhoven, The Netherlands. He is currently an assistant professor in the Department of Electrical Engineering at the Eindhoven University of Technology. His research interests include modeling methods and mapping techniques for the design, specification, analysis and synthesis of predictable hardware/software systems.

    Henk Corporaal received the M.S. degree in theoretical physics from the University of Groningen, Groningen, The Netherlands, and the Ph.D. degree in electrical engineering, in the area of computer architecture, from the Delft University of Technology, Delft, The Netherland. He has been teaching at several schools for higher education. He has been an Associate Professor with the Delft University of Technology in the field of computer architecture and code generation. He was a Joint Professor with the National University of Singapore, Singapore, and was the Scientific Director of the joint NUSTUE Design Technology Institute. He was also the Department Head and Chief Scientist with the Design Technology for Integrated Information and Communication Systems Division, IMEC, Leuven, Belgium. Currently, he is a Professor of embedded system architectures with the Eindhoven University of Technology, Eindhoven, The Netherlands. He has coauthored over 250 journal and conference papers in the (multi) processor architecture and embedded system design area. Furthermore, he invented a new class of very long instruction word architectures, the Transport Triggered Architectures, which is used in several commercial products and by many research groups. His current research interests include single and multiprocessor architectures and the predictable design of soft and hard real-time embedded systems.

    View full text