skip to main content
research-article
Free Access

Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects

Published:20 January 2013Publication History
Skip Abstract Section

Abstract

Alternative interconnects are attractive for scaling on-chip communication bandwidth in a power-efficient manner. However, efficient utilization of the bandwidth provided by these emerging interconnects still remains an open problem due to the spatial and temporal communication heterogeneity. In this article, a Stream Arbitration scheme is proposed, where at runtime any source can compete for any communication channel of the interconnect to talk to any destination. We apply stream arbitration to radio frequency interconnect (RF-I). Experimental results show that compared to the representative token arbitration scheme, stream arbitration can provide an average 20% performance improvement and 12% power reduction.

References

  1. Agarwal, N., Krishna, T., Peh L.-S., and Jha, N.K. 2009. GARNET: A detailed on-chip network model inside a full-system simulator, In IEEE International Symposium on Performance Analysis of Systems and Software, (ISPASS 2009). 33--42.Google ScholarGoogle Scholar
  2. Beckmann, B. M. and Wood, D. A. 2004. Managing wire delay in large chip-multiprocessor caches. In MICRO 37: In Proceedings of the 37th Annual IEEE/ACM International Symposium on Micro Architecture, IEEE Computer Society, pp. 319--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The PARSEC benchmark suite: Characterization and architectural implications. Tech. Rep. TR-811-08, Princeton University.Google ScholarGoogle Scholar
  4. Chang, M. F., Cong, J., Kaplan, A., Liu, C., Naik, M., Prrvrumar, J., Retnman, G., Socher, E., and Tam, S. 2008a. Power reduction of CMP communication networks via RF-interconnects. In Proceedings of the 41st Annual IEEE/ACM Internotional Symposium on Microarchitecture (MICRO 41). 376--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chang, M. F., Cong, J., Kaplan, A., Naik, M., Reinman, G., Socher, E., and Tam, S.-W. 2008b. CMP network-on-chip overlaid with multi-band RF-interconnect. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA). l9l--202.Google ScholarGoogle Scholar
  6. Chang, M. F., Socher, E., Tam, R., Cong, J., and Reinman, G. 2008c. RF interconnects for communications on-chip. In Proceedings of the 2008 International Symposium on Physical Design (ISPD'08). ACM, New York, 78--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chang, M. F., Verbauwhede, I., Chien, C., Xu, Z., Kim, J., Ko, J., Gu, Q., and Lai, B. 2005. Advanced RF/baseband interconnect schemes for inter- and intra-ULSI communications. IEEE Trans. Elect. Dev. 52, 7, l27l--1285.Google ScholarGoogle ScholarCross RefCross Ref
  8. Cho, S. and Jin, L. 2006. Managing distributed, shared L2 caches through OS-level page allocation. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39), 455--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cong, J., Ghodrat, M. A., Gill, M., Grigorian, B., and Reinman, G. 2012a. CHARM: A composable heterogeneous accelerator-rich microprocessor. In proceedings of the International Symposium on Low Power Electronics and Design (ISLPED 2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cong, J., Ghodrat, M. A., Gill, M., Chunyue, L., and Reinman, G. 2012b. BiN: A buffer-in-NUCA scheme for accelerator-rich CMPs. In proceedings of the International Symposium on Low Power Electronics and Design (ISLPED2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cong, J., Han, G., Jagannathan, A., Reinman, G., and Rutkowski, K. 2007. Accelerating sequential applications on CMPs using core spilling, IEEE Trans. Paral. Distrib. Syst., 18, 8, 1094--1107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cong, J., Liu, B., Neuendorffer, S., Noguera, J., Vissers, K. and Zhang, Z. 2011. High-level synthesis for FPGAs: From prototyping to deployment. IEEE Trans. Comput.-Aided Desi. Integ. Circ. Syst., 30, 4, 473--491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cong, J., Liu, C., and Reinman, G. 2010. ACES: Application-specific cycle elimination and splitting for deadlock-free routing on irregular network-on-chip. In Proceedings of the 47th Design Automation Conference (DAC). 443--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Constantinou, T., Sazeides, Y., Michaud, P., Fetis, D., and Seznec, A. 2005. Performance implications of single thread migration on a chip multi-core. SIGARCH Comput. Archit. News 33, 4, 80--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Duato, J. and Pinkston, T. M. 2001. A general theory for deadlock-free adaptive routing using a mixed set of resources. IEEE Trans. Paral. Distrib. Syst., 12, 12, 1219--1235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Goossens, K., Dielissen, J., and Radulescu, A. 2005. AETHEREAL network on-chip concepts. IEEE Desi. Test comput., 22, 5, 414--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hardavellas, N., Ferdman, M., Falsafi, B., and Ailamaki, A. 2009. Reactive NUCA: Near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, New York, 184--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jerger, N. E., Peh L-S., and Lipasti, M. 2007. Circuit-Switched Coherence. In computer architecture letters, 6, 1, 5--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kahng, A., Li, B., Peh, L.-S., and Samadi, K. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE 2009) 423--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kim, Y., Byun, G.-S., Tang, A., Jou, C.-P., Hsieh, H.-H., Reinman, G., Cong, J., and Chang, M. F. 2012. An 8Gb/s/pin 4pJ/b/pin single-t-line dual (Base+RF) band simultaneous bidirectional mobile memory I/O interface, In proceedings of the IEEE International Solid-State Circuits Conference (ISSCC) 50--51.Google ScholarGoogle Scholar
  21. Kumar, R., Zyuban, V., and Tullsen, D. M. 2005. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA'05). IEEE Computer Society, 408--419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lee, H., Cho, S., and Bruce R.C. 2010. StimulusCache: Boosting performance of chip multiprocessors with excess cache. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), 211--222.Google ScholarGoogle Scholar
  23. Lee, H., Cho, S., and Bruce R.C. 2011. CloudCache: Expanding and shrinking private caches. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA). 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Leroy, A., Marcnet, P., Shickova, A., Catthoor, F., Robert, F., and Vertest, D. 2005. Spatial division multiplexing: A novel approach for guaranteed throughput on NoCs. In Proceedings of the 3rd IEEE/ACIWIFIP International Conference on Hqrdware/Software Co-Design and System Svnthesis. 8l--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lusalsa, A. K. and Legat, J.-D. 2010. A hybrid NoC combining SDM-based circuit switching with packet switching for real-time applications. NORCHIP, 15--16, Nov, 1--4.Google ScholarGoogle Scholar
  26. Magnusson, P., Christensson, M., Eskilson, J., Forgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. SIMICS: A full system simulation platform. IEEE Computer, 35, 2, 50--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Martin, M., Sorin, D., Beckmann, B., Marty, M., Xu, M., Alameldeen, A., Moore, K., Hill, M., and Wood, D. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset, In Comput. Archi. News, 33, 4, 92--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Modarressi, M., Sarbazi-Azad, H., and Arjomand M. 2009. A hybrid packet-circuit switched on-chip network based on SDM. In Proceedings of the Conference on Design, Automation and test in Europe (DATE'09). 566--569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Qureshi, M. and Patt, Y. 2006. Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39), 423--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Riedlinger, R. J., Bhatia, R., Biro, L., Bowhill, B., Fetzer, E., Gronowski, P., and Grutkowski, T. 2011. A 32nm 3.1 billion transistor 12-wide-issue Itanium® processor for mission-critical servers. In proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 84--86.Google ScholarGoogle Scholar
  31. Tam, S.-W., Socher, E., Wong, A., and Chang, M. F. 2009. A simultaneous tri-band on-chip RF-Interconnect for future Network-on-Chip, In proceedings of the IEEE VLSI Symposium. 90--91.Google ScholarGoogle Scholar
  32. Vantrease, D., Schreiber, R., Monchiero, M., Mclaren, M., Jouppi, N. P., Fiorentino, M., Davis, A., Binkert, N., Beausoleil, R. G., and Ahn, J. H. 2008. Corona: System implications of emerging nanophotonic technology. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). IEEE Computer Society, Washington, DC, 153--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vantrease, D., Binkert, N., Schreiber, R., and Lipasti, M. H. 2009. Light speed arbitration and flow control for nanophotonic interconnects. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, 304--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C-C., Brown, J. F., and Agarwal, A. 2007. On-Chip Interconnection Architecture of the Tile Processor, Micro, IEEE 27, 5, 15--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wu, H., Nan, L., Tam, S.-2., Hsieh, H.-H., Jou, C., Reinman, G., Cong, J., and Chang, M.-C. 2012. A 60GHz on-chip RF-interconnect with λ/4 coupler for 5Gbps bi-directional communication and multi-drop arbitration In Proceedings of the IEEE 34th Custom Integrated Circuits Conference.Google ScholarGoogle Scholar

Index Terms

  1. Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 9, Issue 4
      Special Issue on High-Performance Embedded Architectures and Compilers
      January 2013
      876 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/2400682
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 January 2013
      • Accepted: 1 November 2012
      • Revised: 1 September 2012
      • Received: 1 June 2012
      Published in taco Volume 9, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader