research-article

Free Access

Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects

Authors:
Chunhua Xiao

Beijing University of Technology, Beijing, P. R. China

Beijing University of Technology, Beijing, P. R. China
View Profile

,
M-C. Frank Chang

University of California, Los Angels, CA

University of California, Los Angels, CA
View Profile

,
Jason Cong

University of California, Los Angels, CA

University of California, Los Angels, CA
View Profile

,
Michael Gill

University of California, Los Angels, CA

University of California, Los Angels, CA
View Profile

,
Zhangqin Huang

Beijing University of Technology, Beijing, P. R. China

Beijing University of Technology, Beijing, P. R. China
View Profile

,
Chunyue Liu

University of California, Los Angels, CA

University of California, Los Angels, CA
View Profile

,
Glenn Reinman

University of California, Los Angels, CA

University of California, Los Angels, CA
View Profile

,
Hao Wu

University of California, Los Angels, CA

University of California, Los Angels, CA
View Profile

ACM Transactions on Architecture and Code Optimization Volume 9 Issue 4Article No.: 60pp 1–27https://doi.org/10.1145/2400682.2400719

Published:20 January 2013Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Alternative interconnects are attractive for scaling on-chip communication bandwidth in a power-efficient manner. However, efficient utilization of the bandwidth provided by these emerging interconnects still remains an open problem due to the spatial and temporal communication heterogeneity. In this article, a Stream Arbitration scheme is proposed, where at runtime any source can compete for any communication channel of the interconnect to talk to any destination. We apply stream arbitration to radio frequency interconnect (RF-I). Experimental results show that compared to the representative token arbitration scheme, stream arbitration can provide an average 20% performance improvement and 12% power reduction.

References

Agarwal, N., Krishna, T., Peh L.-S., and Jha, N.K. 2009. GARNET: A detailed on-chip network model inside a full-system simulator, In IEEE International Symposium on Performance Analysis of Systems and Software, (ISPASS 2009). 33--42.Google Scholar
Beckmann, B. M. and Wood, D. A. 2004. Managing wire delay in large chip-multiprocessor caches. In MICRO 37: In Proceedings of the 37th Annual IEEE/ACM International Symposium on Micro Architecture, IEEE Computer Society, pp. 319--330. Google ScholarDigital Library
Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The PARSEC benchmark suite: Characterization and architectural implications. Tech. Rep. TR-811-08, Princeton University.Google Scholar
Chang, M. F., Cong, J., Kaplan, A., Liu, C., Naik, M., Prrvrumar, J., Retnman, G., Socher, E., and Tam, S. 2008a. Power reduction of CMP communication networks via RF-interconnects. In Proceedings of the 41^st Annual IEEE/ACM Internotional Symposium on Microarchitecture (MICRO 41). 376--387. Google ScholarDigital Library
Chang, M. F., Cong, J., Kaplan, A., Naik, M., Reinman, G., Socher, E., and Tam, S.-W. 2008b. CMP network-on-chip overlaid with multi-band RF-interconnect. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA). l9l--202.Google Scholar
Chang, M. F., Socher, E., Tam, R., Cong, J., and Reinman, G. 2008c. RF interconnects for communications on-chip. In Proceedings of the 2008 International Symposium on Physical Design (ISPD'08). ACM, New York, 78--83. Google ScholarDigital Library
Chang, M. F., Verbauwhede, I., Chien, C., Xu, Z., Kim, J., Ko, J., Gu, Q., and Lai, B. 2005. Advanced RF/baseband interconnect schemes for inter- and intra-ULSI communications. IEEE Trans. Elect. Dev. 52, 7, l27l--1285.Google ScholarCross Ref
Cho, S. and Jin, L. 2006. Managing distributed, shared L2 caches through OS-level page allocation. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39), 455--468. Google ScholarDigital Library
Cong, J., Ghodrat, M. A., Gill, M., Grigorian, B., and Reinman, G. 2012a. CHARM: A composable heterogeneous accelerator-rich microprocessor. In proceedings of the International Symposium on Low Power Electronics and Design (ISLPED 2012). Google ScholarDigital Library
Cong, J., Ghodrat, M. A., Gill, M., Chunyue, L., and Reinman, G. 2012b. BiN: A buffer-in-NUCA scheme for accelerator-rich CMPs. In proceedings of the International Symposium on Low Power Electronics and Design (ISLPED2012). Google ScholarDigital Library
Cong, J., Han, G., Jagannathan, A., Reinman, G., and Rutkowski, K. 2007. Accelerating sequential applications on CMPs using core spilling, IEEE Trans. Paral. Distrib. Syst., 18, 8, 1094--1107. Google ScholarDigital Library
Cong, J., Liu, B., Neuendorffer, S., Noguera, J., Vissers, K. and Zhang, Z. 2011. High-level synthesis for FPGAs: From prototyping to deployment. IEEE Trans. Comput.-Aided Desi. Integ. Circ. Syst., 30, 4, 473--491. Google ScholarDigital Library
Cong, J., Liu, C., and Reinman, G. 2010. ACES: Application-specific cycle elimination and splitting for deadlock-free routing on irregular network-on-chip. In Proceedings of the 47th Design Automation Conference (DAC). 443--448. Google ScholarDigital Library
Constantinou, T., Sazeides, Y., Michaud, P., Fetis, D., and Seznec, A. 2005. Performance implications of single thread migration on a chip multi-core. SIGARCH Comput. Archit. News 33, 4, 80--91. Google ScholarDigital Library
Duato, J. and Pinkston, T. M. 2001. A general theory for deadlock-free adaptive routing using a mixed set of resources. IEEE Trans. Paral. Distrib. Syst., 12, 12, 1219--1235. Google ScholarDigital Library
Goossens, K., Dielissen, J., and Radulescu, A. 2005. AETHEREAL network on-chip concepts. IEEE Desi. Test comput., 22, 5, 414--421. Google ScholarDigital Library
Hardavellas, N., Ferdman, M., Falsafi, B., and Ailamaki, A. 2009. Reactive NUCA: Near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, New York, 184--195. Google ScholarDigital Library
Jerger, N. E., Peh L-S., and Lipasti, M. 2007. Circuit-Switched Coherence. In computer architecture letters, 6, 1, 5--8. Google ScholarDigital Library
Kahng, A., Li, B., Peh, L.-S., and Samadi, K. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE 2009) 423--428. Google ScholarDigital Library
Kim, Y., Byun, G.-S., Tang, A., Jou, C.-P., Hsieh, H.-H., Reinman, G., Cong, J., and Chang, M. F. 2012. An 8Gb/s/pin 4pJ/b/pin single-t-line dual (Base+RF) band simultaneous bidirectional mobile memory I/O interface, In proceedings of the IEEE International Solid-State Circuits Conference (ISSCC) 50--51.Google Scholar
Kumar, R., Zyuban, V., and Tullsen, D. M. 2005. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA'05). IEEE Computer Society, 408--419. Google ScholarDigital Library
Lee, H., Cho, S., and Bruce R.C. 2010. StimulusCache: Boosting performance of chip multiprocessors with excess cache. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), 211--222.Google Scholar
Lee, H., Cho, S., and Bruce R.C. 2011. CloudCache: Expanding and shrinking private caches. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA). 219--230. Google ScholarDigital Library
Leroy, A., Marcnet, P., Shickova, A., Catthoor, F., Robert, F., and Vertest, D. 2005. Spatial division multiplexing: A novel approach for guaranteed throughput on NoCs. In Proceedings of the 3^rd IEEE/ACIWIFIP International Conference on Hqrdware/Software Co-Design and System Svnthesis. 8l--86. Google ScholarDigital Library
Lusalsa, A. K. and Legat, J.-D. 2010. A hybrid NoC combining SDM-based circuit switching with packet switching for real-time applications. NORCHIP, 15--16, Nov, 1--4.Google Scholar
Magnusson, P., Christensson, M., Eskilson, J., Forgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. SIMICS: A full system simulation platform. IEEE Computer, 35, 2, 50--58. Google ScholarDigital Library
Martin, M., Sorin, D., Beckmann, B., Marty, M., Xu, M., Alameldeen, A., Moore, K., Hill, M., and Wood, D. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset, In Comput. Archi. News, 33, 4, 92--99. Google ScholarDigital Library
Modarressi, M., Sarbazi-Azad, H., and Arjomand M. 2009. A hybrid packet-circuit switched on-chip network based on SDM. In Proceedings of the Conference on Design, Automation and test in Europe (DATE'09). 566--569. Google ScholarDigital Library
Qureshi, M. and Patt, Y. 2006. Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39), 423--432. Google ScholarDigital Library
Riedlinger, R. J., Bhatia, R., Biro, L., Bowhill, B., Fetzer, E., Gronowski, P., and Grutkowski, T. 2011. A 32nm 3.1 billion transistor 12-wide-issue Itanium® processor for mission-critical servers. In proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 84--86.Google Scholar
Tam, S.-W., Socher, E., Wong, A., and Chang, M. F. 2009. A simultaneous tri-band on-chip RF-Interconnect for future Network-on-Chip, In proceedings of the IEEE VLSI Symposium. 90--91.Google Scholar
Vantrease, D., Schreiber, R., Monchiero, M., Mclaren, M., Jouppi, N. P., Fiorentino, M., Davis, A., Binkert, N., Beausoleil, R. G., and Ahn, J. H. 2008. Corona: System implications of emerging nanophotonic technology. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). IEEE Computer Society, Washington, DC, 153--164. Google ScholarDigital Library
Vantrease, D., Binkert, N., Schreiber, R., and Lipasti, M. H. 2009. Light speed arbitration and flow control for nanophotonic interconnects. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, 304--315. Google ScholarDigital Library
Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C-C., Brown, J. F., and Agarwal, A. 2007. On-Chip Interconnection Architecture of the Tile Processor, Micro, IEEE 27, 5, 15--31. Google ScholarDigital Library
Wu, H., Nan, L., Tam, S.-2., Hsieh, H.-H., Jou, C., Reinman, G., Cong, J., and Chang, M.-C. 2012. A 60GHz on-chip RF-interconnect with λ/4 coupler for 5Gbps bi-directional communication and multi-drop arbitration In Proceedings of the IEEE 34th Custom Integrated Circuits Conference.Google Scholar

Index Terms

Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Interconnection architectures

Recommendations

Performance Analysis of Arbitration Policies for SoC Communication Architectures

As technology scales toward deep submicron, the integration of a large number of IP blocks on the same silicon die is becoming technically feasible, thus enabling large-scale parallel computations, such as those required for multimedia workloads. The ...
Read More
P2R2

Networks-on-Chip (NoCs) play an important role in the performance of Chip Multi-Processors (CMPs). Providing the desired performance under heavy traffics imposed by some applications necessitates NoC routers to have a large number of Virtual Channels (...
Read More
PPMB: A Partial-Multiple-Bus Multiprocessor Architecture with Improved Cost-Effectiveness

The authors address the design and performance analysis of partial-multiple-bus interconnection networks. They are bus architectures that have evolved from the multiple-bus structure by dividing buses into groups and reducing bus connections. Their ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Architecture and Code Optimization Volume 9, Issue 4
Special Issue on High-Performance Embedded Architectures and Compilers
January 2013
876 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2400682
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 January 2013
- Accepted: 1 November 2012
- Revised: 1 September 2012
- Received: 1 June 2012
Published in taco Volume 9, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Arbitration
Network-on-Chip
radio frequency interconnect
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 513
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Performance Analysis of Arbitration Policies for SoC Communication Architectures

P2R2

PPMB: A Partial-Multiple-Bus Multiprocessor Architecture with Improved Cost-Effectiveness

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Performance Analysis of Arbitration Policies for SoC Communication Architectures

P2R2

PPMB: A Partial-Multiple-Bus Multiprocessor Architecture with Improved Cost-Effectiveness

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media