Abstract
Alternative interconnects are attractive for scaling on-chip communication bandwidth in a power-efficient manner. However, efficient utilization of the bandwidth provided by these emerging interconnects still remains an open problem due to the spatial and temporal communication heterogeneity. In this article, a Stream Arbitration scheme is proposed, where at runtime any source can compete for any communication channel of the interconnect to talk to any destination. We apply stream arbitration to radio frequency interconnect (RF-I). Experimental results show that compared to the representative token arbitration scheme, stream arbitration can provide an average 20% performance improvement and 12% power reduction.
- Agarwal, N., Krishna, T., Peh L.-S., and Jha, N.K. 2009. GARNET: A detailed on-chip network model inside a full-system simulator, In IEEE International Symposium on Performance Analysis of Systems and Software, (ISPASS 2009). 33--42.Google Scholar
- Beckmann, B. M. and Wood, D. A. 2004. Managing wire delay in large chip-multiprocessor caches. In MICRO 37: In Proceedings of the 37th Annual IEEE/ACM International Symposium on Micro Architecture, IEEE Computer Society, pp. 319--330. Google ScholarDigital Library
- Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The PARSEC benchmark suite: Characterization and architectural implications. Tech. Rep. TR-811-08, Princeton University.Google Scholar
- Chang, M. F., Cong, J., Kaplan, A., Liu, C., Naik, M., Prrvrumar, J., Retnman, G., Socher, E., and Tam, S. 2008a. Power reduction of CMP communication networks via RF-interconnects. In Proceedings of the 41st Annual IEEE/ACM Internotional Symposium on Microarchitecture (MICRO 41). 376--387. Google ScholarDigital Library
- Chang, M. F., Cong, J., Kaplan, A., Naik, M., Reinman, G., Socher, E., and Tam, S.-W. 2008b. CMP network-on-chip overlaid with multi-band RF-interconnect. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA). l9l--202.Google Scholar
- Chang, M. F., Socher, E., Tam, R., Cong, J., and Reinman, G. 2008c. RF interconnects for communications on-chip. In Proceedings of the 2008 International Symposium on Physical Design (ISPD'08). ACM, New York, 78--83. Google ScholarDigital Library
- Chang, M. F., Verbauwhede, I., Chien, C., Xu, Z., Kim, J., Ko, J., Gu, Q., and Lai, B. 2005. Advanced RF/baseband interconnect schemes for inter- and intra-ULSI communications. IEEE Trans. Elect. Dev. 52, 7, l27l--1285.Google ScholarCross Ref
- Cho, S. and Jin, L. 2006. Managing distributed, shared L2 caches through OS-level page allocation. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39), 455--468. Google ScholarDigital Library
- Cong, J., Ghodrat, M. A., Gill, M., Grigorian, B., and Reinman, G. 2012a. CHARM: A composable heterogeneous accelerator-rich microprocessor. In proceedings of the International Symposium on Low Power Electronics and Design (ISLPED 2012). Google ScholarDigital Library
- Cong, J., Ghodrat, M. A., Gill, M., Chunyue, L., and Reinman, G. 2012b. BiN: A buffer-in-NUCA scheme for accelerator-rich CMPs. In proceedings of the International Symposium on Low Power Electronics and Design (ISLPED2012). Google ScholarDigital Library
- Cong, J., Han, G., Jagannathan, A., Reinman, G., and Rutkowski, K. 2007. Accelerating sequential applications on CMPs using core spilling, IEEE Trans. Paral. Distrib. Syst., 18, 8, 1094--1107. Google ScholarDigital Library
- Cong, J., Liu, B., Neuendorffer, S., Noguera, J., Vissers, K. and Zhang, Z. 2011. High-level synthesis for FPGAs: From prototyping to deployment. IEEE Trans. Comput.-Aided Desi. Integ. Circ. Syst., 30, 4, 473--491. Google ScholarDigital Library
- Cong, J., Liu, C., and Reinman, G. 2010. ACES: Application-specific cycle elimination and splitting for deadlock-free routing on irregular network-on-chip. In Proceedings of the 47th Design Automation Conference (DAC). 443--448. Google ScholarDigital Library
- Constantinou, T., Sazeides, Y., Michaud, P., Fetis, D., and Seznec, A. 2005. Performance implications of single thread migration on a chip multi-core. SIGARCH Comput. Archit. News 33, 4, 80--91. Google ScholarDigital Library
- Duato, J. and Pinkston, T. M. 2001. A general theory for deadlock-free adaptive routing using a mixed set of resources. IEEE Trans. Paral. Distrib. Syst., 12, 12, 1219--1235. Google ScholarDigital Library
- Goossens, K., Dielissen, J., and Radulescu, A. 2005. AETHEREAL network on-chip concepts. IEEE Desi. Test comput., 22, 5, 414--421. Google ScholarDigital Library
- Hardavellas, N., Ferdman, M., Falsafi, B., and Ailamaki, A. 2009. Reactive NUCA: Near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, New York, 184--195. Google ScholarDigital Library
- Jerger, N. E., Peh L-S., and Lipasti, M. 2007. Circuit-Switched Coherence. In computer architecture letters, 6, 1, 5--8. Google ScholarDigital Library
- Kahng, A., Li, B., Peh, L.-S., and Samadi, K. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE 2009) 423--428. Google ScholarDigital Library
- Kim, Y., Byun, G.-S., Tang, A., Jou, C.-P., Hsieh, H.-H., Reinman, G., Cong, J., and Chang, M. F. 2012. An 8Gb/s/pin 4pJ/b/pin single-t-line dual (Base+RF) band simultaneous bidirectional mobile memory I/O interface, In proceedings of the IEEE International Solid-State Circuits Conference (ISSCC) 50--51.Google Scholar
- Kumar, R., Zyuban, V., and Tullsen, D. M. 2005. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA'05). IEEE Computer Society, 408--419. Google ScholarDigital Library
- Lee, H., Cho, S., and Bruce R.C. 2010. StimulusCache: Boosting performance of chip multiprocessors with excess cache. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), 211--222.Google Scholar
- Lee, H., Cho, S., and Bruce R.C. 2011. CloudCache: Expanding and shrinking private caches. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA). 219--230. Google ScholarDigital Library
- Leroy, A., Marcnet, P., Shickova, A., Catthoor, F., Robert, F., and Vertest, D. 2005. Spatial division multiplexing: A novel approach for guaranteed throughput on NoCs. In Proceedings of the 3rd IEEE/ACIWIFIP International Conference on Hqrdware/Software Co-Design and System Svnthesis. 8l--86. Google ScholarDigital Library
- Lusalsa, A. K. and Legat, J.-D. 2010. A hybrid NoC combining SDM-based circuit switching with packet switching for real-time applications. NORCHIP, 15--16, Nov, 1--4.Google Scholar
- Magnusson, P., Christensson, M., Eskilson, J., Forgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. SIMICS: A full system simulation platform. IEEE Computer, 35, 2, 50--58. Google ScholarDigital Library
- Martin, M., Sorin, D., Beckmann, B., Marty, M., Xu, M., Alameldeen, A., Moore, K., Hill, M., and Wood, D. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset, In Comput. Archi. News, 33, 4, 92--99. Google ScholarDigital Library
- Modarressi, M., Sarbazi-Azad, H., and Arjomand M. 2009. A hybrid packet-circuit switched on-chip network based on SDM. In Proceedings of the Conference on Design, Automation and test in Europe (DATE'09). 566--569. Google ScholarDigital Library
- Qureshi, M. and Patt, Y. 2006. Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39), 423--432. Google ScholarDigital Library
- Riedlinger, R. J., Bhatia, R., Biro, L., Bowhill, B., Fetzer, E., Gronowski, P., and Grutkowski, T. 2011. A 32nm 3.1 billion transistor 12-wide-issue Itanium® processor for mission-critical servers. In proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 84--86.Google Scholar
- Tam, S.-W., Socher, E., Wong, A., and Chang, M. F. 2009. A simultaneous tri-band on-chip RF-Interconnect for future Network-on-Chip, In proceedings of the IEEE VLSI Symposium. 90--91.Google Scholar
- Vantrease, D., Schreiber, R., Monchiero, M., Mclaren, M., Jouppi, N. P., Fiorentino, M., Davis, A., Binkert, N., Beausoleil, R. G., and Ahn, J. H. 2008. Corona: System implications of emerging nanophotonic technology. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). IEEE Computer Society, Washington, DC, 153--164. Google ScholarDigital Library
- Vantrease, D., Binkert, N., Schreiber, R., and Lipasti, M. H. 2009. Light speed arbitration and flow control for nanophotonic interconnects. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, 304--315. Google ScholarDigital Library
- Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C-C., Brown, J. F., and Agarwal, A. 2007. On-Chip Interconnection Architecture of the Tile Processor, Micro, IEEE 27, 5, 15--31. Google ScholarDigital Library
- Wu, H., Nan, L., Tam, S.-2., Hsieh, H.-H., Jou, C., Reinman, G., Cong, J., and Chang, M.-C. 2012. A 60GHz on-chip RF-interconnect with λ/4 coupler for 5Gbps bi-directional communication and multi-drop arbitration In Proceedings of the IEEE 34th Custom Integrated Circuits Conference.Google Scholar
Index Terms
- Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects
Recommendations
Performance Analysis of Arbitration Policies for SoC Communication Architectures
As technology scales toward deep submicron, the integration of a large number of IP blocks on the same silicon die is becoming technically feasible, thus enabling large-scale parallel computations, such as those required for multimedia workloads. The ...
PPMB: A Partial-Multiple-Bus Multiprocessor Architecture with Improved Cost-Effectiveness
The authors address the design and performance analysis of partial-multiple-bus interconnection networks. They are bus architectures that have evolved from the multiple-bus structure by dividing buses into groups and reducing bus connections. Their ...
Comments