Abstract
Power limitations and complexity constraints demand modular designs, such as chip multiprocessors (CMPs) and systems-on-chip (SOCs). Today’s CMPs feature up to a hundred discrete cores, with greater levels of integration anticipated in the future. Supporting effective on-chip resource sharing for cloud computing and server consolidation necessitates CMP-level quality-of-service (QOS) for performance isolation, service guarantees, and security. This work takes a topology-aware approach to on-chip QOS. We propose to segregate shared resources into dedicated, QOS-enabled regions of the chip. We than eliminate QOS-related hardware and its associated overheads from the rest of the die via a combination of topology and operating system support. We evaluate several topologies for the QOS-enabled regions, including a new organization called Destination Partitioned Subnets (DPS) which uses a light-weight dedicated network for each destination node. DPS matches or bests other topologies with comparable bisection bandwidth in performance, area- and energy-efficiency, fairness, and preemption resilience.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Balfour, J.D., Dally, W.J.: Design Tradeoffs for Tiled CMP On-Chip Networks. In: 20th International Conference on Supercomputing, pp. 187–198. ACM, New York (2006)
Bitirgen, R., Ipek, E., Martinez, J.F.: Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors: A Machine Learning Approach. In: 41st IEEE/ACM International Symposium on Microarchitecture, pp. 318–329. IEEE Computer Society, Washington, DC (2008)
Dally, W.J., Towles, B.: Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco (2004)
Das, R., Mutlu, O., Moscibroda, T., Das, C.R.: Application-aware Prioritization Mechanisms for On-Chip Networks. In: 42nd IEEE/ACM International Symposium on Microarchitecture, pp. 280–291. ACM, New York (2009)
Demers, A., Keshav, S., Shenker, S.: Analysis and Simulation of a Fair Queueing Algorithm. In: SIGCOMM 1989: Communications Architectures and Protocols, pp. 1–12. ACM, New York (1989)
Ebrahimi, E., Lee, C.J., Mutlu, O., Patt, Y.N.: Fairness via Source Throttling: a Configurable and High-performance Fairness Substrate for Multi-Core Memory Systems. In: 15th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 335–346. ACM, New York (2010)
Golestani, S.: Congestion-free Communication in High-Speed Packet Networks. IEEE Transactions on Communications 39(12), 1802–1812 (1991)
Grot, B., Hestness, J., Keckler, S.W., Mutlu, O.: Express Cube Topologies for On-Chip Interconnects. In: 15th International Symposium on High-Performance Computer Architecture, pp. 163–174. IEEE Computer Society, Washington, DC (2009)
Grot, B., Keckler, S.W., Mutlu, O.: Preemptive Virtual Clock: a Flexible, Efficient, and Cost-Effective QOS Scheme for Networks-on-Chip. In: 42nd IEEE/ACM International Symposium on Microarchitecture, pp. 268–279. ACM, New York (2009)
Iyer, R.: CQoS: a Framework for Enabling QoS in Shared Caches of CMP Platforms. In: 18th International Conference on Supercomputing, pp. 257–266. ACM, New York (2004)
Kahng, A., Li, B., Peh, L.S., Samadi, K.: ORION 2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration. In: Conference on Design, Automation, and Test in Europe, pp. 423–428 (2009)
Kermani, P., Kleinrock, L.: Virtual Cut-Through: a New Computer Communication Switching Technique. Computer Networks 3, 267–286 (1979)
Kim, J.H., Chien, A.A.: Rotating Combined Queueing (RCQ): Bandwidth and Latency Guarantees in Low-Cost, High-Performance Networks. In: 23rd International Symposium on Computer Architecture, pp. 226–236. ACM, New York (1996)
Kim, J., Balfour, J., Dally, W.: Flattened Butterfly Topology for On-Chip Networks. In: 40th IEEE/ACM International Symposium on Microarchitecture, pp. 172–182. IEEE Computer Society, Washington, DC (2007)
Lee, J.W., Ng, M.C., Asanovic, K.: Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks. In: 35th International Symposium on Computer Architecture, pp. 89–100. IEEE Computer Society, Washington, DC (2008)
Marty, M.R., Hill, M.D.: Virtual Hierarchies to Support Server Consolidation. In: 34th International Symposium on Computer Architecture, pp. 46–56. ACM, New York (2007)
Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In: 40th IEEE/ACM International Symposium on Microarchitecture, pp. 3–14. IEEE Computer Society, Washington, DC (2007)
Mutlu, O., Moscibroda, T.: Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems. In: 35th International Symposium on Computer Architecture, pp. 63–74. IEEE Computer Society, Washington, DC (2008)
Mutlu, O., Moscibroda, T.: Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In: 40th IEEE/ACM International Symposium on Microarchitecture, pp. 146–160. IEEE Computer Society, Washington, DC (2007)
Nesbit, K.J., Laudon, J., Smith, J.E.: Virtual Private Caches. In: 34th International Symposium on Computer Architecture, pp. 57–68. ACM, New York (2007)
NVIDIA Fermi architecture, http://www.nvidia.com/object/fermi_architecture.html
Rijpkema, E., Goossens, K.G.W., Radulescu, A., Dielissen, J., van Meerbergen, J., Wielage, P., Waterlander, E.: Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip. In: Conference on Design, Automation and Test in Europe, IEEE Computer Society, Washington, DC (2003)
Ristenpart, T., Tromer, E., Shacham, H., Savage, S.: Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds. In: 16th ACM Conference on Computer and Communications Security. ACM, New York (2009)
Shin, J., Tam, K., Huang, D., Petrick, B., Pham, H., Hwang, C., Li, H., Smith, A., Johnson, T., Schumacher, F., Greenhill, D., Leon, A., Strong, A.: A 40nm 16-core 128-thread CMT SPARC SoC Processor. In: IEEE International Solid-State Circuits Conference, pp. 98–99 (2010)
Suh, G.E., Devadas, S., Rudolph, L.: A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In: 8th International Symposium on High-Performance Computer Architecture, pp. 117–128. IEEE Computer Society, Washington, DC (2002)
Tilera TILE-Gx100, http://www.tilera.com/products/TILE-Gx.php
Wendel, D., Kalla, R., Cargoni, R., Clables, J., Friedrich, J., Frech, R., Kahle, J., Sinharoy, B., Starke, W., Taylor, S., Weitzel, S., Chu, S., Islam, S., Zyuban, V.: The Implementation of POWER7: A Highly Parallel and Scalable Multi-Core High-End Server Processor. In: IEEE International Solid-State Circuits Conference, pp. 102–103 (2010)
Zhang, L.: Virtual Clock: a New Traffic Control Algorithm for Packet Switching Networks. SIGCOMM Computer Communication Review 20(4), 19–29 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grot, B., Keckler, S.W., Mutlu, O. (2011). Topology-Aware Quality-of-Service Support in Highly Integrated Chip Multiprocessors. In: Varbanescu, A.L., Molnos, A., van Nieuwpoort, R. (eds) Computer Architecture. ISCA 2010. Lecture Notes in Computer Science, vol 6161. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24322-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-24322-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24321-9
Online ISBN: 978-3-642-24322-6
eBook Packages: Computer ScienceComputer Science (R0)