Abstract
Shared-memory multiprocessors commonly use shared variables for synchronization. Our simulations of real parallel applications show that large-scale cache-coherent multiprocessors suffer significant amounts of invalidation traffic due to synchronization. Large multiprocessors that do not cache synchronization variables are often more severely impacted. If this synchronization traffic is not reduced or managed adequately, synchronization references can cause severe congestion in the network. We propose a class of adaptive back-off methods that do not use any extra hardware and can significantly reduce the memory traffic to synchronization variables. These methods use synchronization state to reduce polling of synchronization variables. Our simulations show that when the number of processors participating in a barrier synchronization is small compared to the time of arrival of the processors, reductions of 20 percent to over 95 percent in synchronization traffic can be achieved at no extra cost. In other situations adaptive backoff techniques result in a tradeoff between reduced network accesses and increased processor idle time.
- 1 Norman Abramson. The ALOHA System - Another alternative for computer communications. In Proc. Fall Joint Computer Conf., pages 261-285, 1977.Google Scholar
- 2 Anant Agarwal and Mathews Cherian. Adaptive Backofi Synchronization Techniques. MIT VLSI Memo, April 1989.Google Scholar
- 3 Anant Agarwal, Richard Simoni, John Hennessy, and Mark Horowitz. An Evaluation of Directory Schemes for Cache Coherence. In Proc. 15th Intl. Symp. on Computer Architecture, IEEE, New York, June 1988. Google ScholarDigital Library
- 4 Lucien M. Censier and Paul Feautrier. A New Solution to Coherence Problems in Multicache Systems. IEEE Trans. on Computers, C-27(12):1112-1118, December 1978.Google ScholarDigital Library
- 5 J. W. Cooley and J. W. "Tukey. An Algorithm for the Machine Calculation of Complex Fourier Series. Math. Comput., 19:297-301, April 1965.Google ScholarCross Ref
- 6 W. P. Crowley and C. P. Hendrickson and T. E. Rudy. The Simple Code. Lawrence Livermore Laboratory TR, February 1978.Google Scholar
- 7 F. Darema-Rogers, D. A. George, V. A. Norton, and G. F. Pfister. Single-Program-Multiple-Data Computational Model for EPEX/FORTRAN. TR RC 11552 (55212), IBM T. J. Watson Research Center, Yorktown Heights, November 1986.Google Scholar
- 8 Daniel Gajski, David Kuck, Duncan Lawrie, and Ahmed Saleh. Cedar - A Large Scale Multiprocessor. In Proc. ICPP, pages 524-529, August 1983.Google Scholar
- 9 A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir. The NYU Ultracomputer - Designing a MIMD Shared-Memory Parallel Machine. IEEE Trans. on Computers, C-32(2):175- 189, February 1983.Google ScholarDigital Library
- 10 Tsutomu Hoshino. PAX Computer. High-Speed ParaL lel Processing and Scientific Computing. Addison Wesley, Reading Mass., 1989. Harold S. Stone, Editor. Google ScholarDigital Library
- 11 Eugenia Kalnay-Rivas and David Hoitsma. Documentation of the Fourth Order Band Model. Technical Report, NASA Modeling and Simulation Facility Laboratory for Atmospheric Science, NASA/Goddard Space Flight Center, Greenbelt, MD, 1979.Google Scholar
- 12 L. Kleinrock and Y. Yemini. An Optimal Adaptive Scheme for Multiple Acess Broadcast Communication. Proc. ICC, pages 7.2.1-7.2.5, June 1978.Google Scholar
- 13 S. S. Lam. A' Carrier Sense Multiple Access Protocol for Local Networks. Computer Networks, 4(1):21-32, Jan. 1980.Google Scholar
- 14 S. S. Lam and L. Kleinrock. Packet Switching in a Multiaccess Broadcast Channel: Dynamic Control Procedures. IEEE Trans. on Computers, C-23, Sept. 1975.Google Scholar
- 15 E. L. Lusk and R. A. Overbeek. Implementation of Monitors with Macros: A Programming Aid for the HEP and other Parallel Processors. TR ANL-83-97, Argonne National Laboratory, Argonne, Illinois, December 1983.Google Scholar
- 16 R. Metcalfe and D. Boggs. Ethernet: Distributed Packet Switching for Local Computer Networks. Communications of the ACM, 19(7), July 1976. Google ScholarDigital Library
- 17 Janak H. Patel. Analysis of Multiprocessors with Private Cache Memories. IEEE Trans. on Computers, C- 31(4):296-304, April 1982.Google ScholarDigital Library
- 18 G. F. Pfister, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder, K. P. McAuliffe, E. A. Melton, A. Norton, and J. Weiss. The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture. In Proc. ICPP, pages 764-771, August 1985.Google Scholar
- 19 G. F. Pfister and V. A. Norton. 'Hotspot' Contention and Combining in Multistage Interconnection Networks. IEEE Trans. on Computers, C-34(10), October 1985.Google Scholar
- 20 Steven Scott and Gurindar Sohi. Using Feedback to Control Tree Saturation In Multistage Interconnection Networks. In Proc. 16th Annual Int. Symp. on Computer Architecture, June 1989. Google ScholarDigital Library
- 21 K. So, F. Darema-Rogers, D. A. George, V. A. Norton, and G. F. Pfister. PSIMUL - A System for Parallel Simulation of Parallel Systems. Technical Report RC 11674 (58502), IBM T. J. Watson Research Center, Yorktown Heights, November 1987.Google Scholar
- 22 Peiyi Tang and Pen-Chung Yew. Processor Selfscheduling for Multiple-Nested Parallel Loops. In Proc. ICPP, pages 528-535, August 1986.Google Scholar
- 23 Wolf-Dietrich Weber and Anoop Gupta. Analysis of Cache Invalidation Patterns in Multiprocessors. In Proc. ASPLOS III, April 1989. Google ScholarDigital Library
- 24 P.-C. Yew, N.-F. Tzeng, and D. H. Lawrie. Distributed Hot-Spot Addressing in Large-Scale Multiprocessors. IEEE Tmns. on Computers, C-36(14):388-395, April 1987. Google ScholarDigital Library
Index Terms
- Adaptive backoff synchronization techniques
Recommendations
Adaptive backoff synchronization techniques
ISCA '89: Proceedings of the 16th annual international symposium on Computer architectureShared-memory multiprocessors commonly use shared variables for synchronization. Our simulations of real parallel applications show that large-scale cache-coherent multiprocessors suffer significant amounts of invalidation traffic due to ...
Adaptive synchronization in multi-hop TSCH networks
Time Slotted Channel Hopping (TSCH) enables highly reliable and ultra-low power wireless networking, and is at the heart of multiple industrial standards. It has become the de facto standard for industrial low-power wireless solutions, and a true ...
Comments