ABSTRACT
Parallel programs that use critical sections and are executed on a shared-memory multiprocessor with a write-invalidate protocol result in invalidation actions that could be eliminated. For this type of sharing, called migratory sharing, each processor typically causes a cache miss followed by an invalidation request which could be merged with the preceding cache-miss request.
In this paper we propose an adaptive protocol that invokes this optimization dynamically for migratory blocks. For other blocks, the protocol works as an ordinary write-invalidate protocol. We show that the protocol is a simple extension to a write-invalidate protocol.
Based on a program-driven simulation model of an architecture similar to the Stanford DASH, and a set of four benchmarks, we evaluate the potential performance improvements of the protocol. We find that it effectively eliminates most single invalidations which improves the performance by reducing the shared access penalty and the network traffic.
- 1.Anant Agarwal, Beng-Hong Lim, David Kranz, and John Kubiatowicz. APRIL: A Processor Architecture for Multiprocessing. In Proceedings of the 17th Annual international Symposium on Computer Architecture, pages i04-114, May 1990. Google ScholarDigital Library
- 2.Kendall Square Research. Kendall Square Researchl (KSR1) Technical Summary. 1992.Google Scholar
- 3.J. Boyle et al. Portable Programs for Parallel Processors. Holt, Rinehart, and Winston Inc. 1987. Google ScholarDigital Library
- 4.Mats Brorsson, Frextrik Dahlgren, H~tkan Nilsson and Per Stenstr6m. The CacheMire Test Bench -- A Flexible and Effective Approach for Simulation of Multiprocessors. In Proceedings of the 26th Annual Simulation Symposium, to a~, March 1993.Google ScholarCross Ref
- 5.Michel Dubois, Christoph Scheurich, and Faye Briggs. Memory Access Buffering in Multiprocessors. In Proceedings of the 13th Annual International Symposium on Computer Architecture, pages 434- 442, 1986. Google ScholarDigital Library
- 6.Kourosh Gharachorloo, Anoop Gupta, John L. Hennessy. Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors. In Fourth ASPLOS, pages 245-257, April 1991. Google ScholarDigital Library
- 7.Kourosh Gharachorloo, Daniel E. Lenoski, james P. Laudon, Philip Gibbons, Anoop Gupta, and John L. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In Proceedings of the 17th Annual international Symposium on Computer Architecture, pages 15-26, May 1990. Google ScholarDigital Library
- 8.Anoop Gupta and Wolf-Dietrich Weber. Cache Invalidation Patterns in Shared-Memory Multiprocessors. Transactions on Computers, Volume 41, Number 7, pages 794-810, July 1992. Google ScholarDigital Library
- 9.Erik Hagersten, Anders Landin, and Seif Haridi. DDM---A Cache-Only Memory Architecture. IEEE Computer Magazine, pages 44-54, September 1992. Google ScholarDigital Library
- 10.Leslie Lamport. How to make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. Transactions on Computers. C-28(9), pages 241-248, September 1979.Google Scholar
- 11.Daniel E. Lenoski, James P. Laudon, Kourosh Gharachorloo, Anoop Gupta, and John L. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990. Google ScholarDigital Library
- 12.Daniel E. Lenoski, james P. Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John L. Hennessy, Mark Horowitz, and Monica S. Lam. The Stanford DASH Multiprocessor. IEEE Computer Magazine, pages 63-79, March 1992. Google ScholarDigital Library
- 13.Todd Mowry and Anoop Gupta. Tolerating Latency Through Software-Controlled Prefetching in Shared- Memory Multiprocessors. Journal of Parallel and Distributed Computing, 2(4), pages 87-106, June 1991. Google ScholarDigital Library
- 14.Jaswinder P. Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford Parallel Applications for Shared-Memory. Computer Architecture News, 20(1). pages 5-44, March 1992. Google ScholarDigital Library
- 15.Per StenstrOm, Fredrik Dahlgren. and Lars Lundberg. A Lockup-free Multiprocessor Cache Design. In Proceedings of 1991 International Conference on Parallel Processing, Vol. I, pages 246-250, August 199 I.Google Scholar
- 16.Per StenstrOm, Truman Joe, and Anoop Gupta. Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 80-9 I, May 1992. Google ScholarDigital Library
Index Terms
- An adaptive cache coherence protocol optimized for migratory sharing
Recommendations
An adaptive cache coherence protocol optimized for migratory sharing
Special Issue: Proceedings of the 20th annual international symposium on Computer architecture (ISCA '93)Parallel programs that use critical sections and are executed on a shared-memory multiprocessor with a write-invalidate protocol result in invalidation actions that could be eliminated. For this type of sharing, called migratory sharing, each processor ...
Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection
Although directory-based write-invalidate cache coherence protocols have a potential to improve the performance of large-scale multiprocessors, coherence misses limit the processor utilization. Therefore, so-called competitive-update protocols hybrid ...
Comments