skip to main content
10.1145/237090.237205acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free Access

Operating system support for improving data locality on CC-NUMA compute servers

Authors Info & Claims
Published:01 September 1996Publication History

ABSTRACT

The dominant architecture for the next generation of shared-memory multiprocessors is CC-NUMA (cache-coherent non-uniform memory architecture). These machines are attractive as compute servers because they provide transparent access to local and remote memory. However, the access latency to remote memory is 3 to 5 times the latency to local memory. CC-NOW machines provide the benefits of cache coherence to networks of workstations, at the cost of even higher remote access latency. Given the large remote access latencies of these architectures, data locality is potentially the most important performance issue. Using realistic workloads, we study the performance improvements provided by OS supported dynamic page migration and replication. Analyzing our kernel-based implementation, we provide a detailed breakdown of the costs. We show that sampling of cache misses can be used to reduce cost without compromising performance, and that TLB misses may not be a consistent approximation for cache misses. Finally, our experiments show that dynamic page migration and replication can substantially increase application performance, as much as 30%, and reduce contention for resources in the NUMA memory system.

References

  1. ABL+91.T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy. Scheduler activations: effective kernel support for the user-level management of parallelism. in Proceedings of the 13th ACM Symposium on Operating System Principles, pages 95-109, October 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ACD+91.Anant Agarwal et al. The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor. MIT/LCS Memo TM-454, Massachusetts Institute of Technology, 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. BCZ90.J.K. Bennett, J. B. Carter, W. Zwaeneopoel. Munin: Distributed shared memory based on type-specific memory coherence. In Proceedings of the Second Symposium on Principles and Practiceof Parallel Programming, pages 168-175, March 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BZS93.B.N. Bershad, M. J. Zekauskas, and W. A. Sawdon. The Midway Distributed Shared Memory System. In Proceedings of the 1993 IEEE CompCon Conference, pages 528-537, February 1993.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. BGW89.D. Black, A. Gupta, and W. D. Weber. Competitive management of distributed shared memory. In Proceedings of COMPCON, pages 184-190, March 1989.]]Google ScholarGoogle ScholarCross RefCross Ref
  6. BSF+91.W. Bolosky, M. Scott, R. Fitzgerald, and A. Cox. NUMA policies and their relationship to memory architecture. In Proceedings, Architectural Support for Programming Languages and Operating Systems, pages 212-221, April 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CDV+94.R. Chandra, S Devine, B Verghese, A Gupta, and Mendel Rosenblum. Scheduling and Page Migration for Multiprocessor Compute Servers. in Proceedings, Architectural Support for Programming Languages and Operating Systems, 12-24, October 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. CoF89.A.L. Cox and R. J. Fowler. The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with Platinum. In Proceedings of the Twelfth A CM Symposium on Operating Systems Principles, pages 32-43, December 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hol89.M Holliday. Reference history, page size, and migration daemons in local/remote architectures. In Proceedings, Architectural Support for Programming Languages and Operating Systems, pages 104-112, April 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kus+94.J. Kuskin, et al. The Stanford FLASH Multiprocessor. In Proceedings of the 21st International Symposium on Computer Architecture, pages 302-313, April 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. LEK91.R.P. LaRowe Jr., C. S. Ellis, and L. S. Kaplan. The robustness of NUMA memory management. In Proceedings of the Thirteenth A CM Symposium on Operating System Principles, pages 137-151, October 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. LLG+90.D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessey. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Li88.K. Li. IVY: A shared virtual memory system for parallel computing. In Proceedings of the 1988 International Conference on Parallel Processing, pages 125-132, August 1988.]]Google ScholarGoogle Scholar
  14. LoC96.T. Lovett and R. Clapp. STING: A CC-NUMA Computer System for the Commercial Marketplace. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 308-317, May 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. NAB+95.A. Nowatzyk et al. The S3.mp Scalable Memory Multiprocessor. Proceedings of the 24th International Conference on Parallel Processing, Aug. 1995]]Google ScholarGoogle Scholar
  16. RHW+95.M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta. Complete Computer Simulation: the SimOS approach. In IEEE Parallel and Distributed Technology, Fall 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. RSL92.M. Rinard, D. Scales, M. Lam. Heterogeneous parallel programming in Jade. in Proceedings of Supercomputing '92, pages 245-56.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. ScL94.D.J. Scales and M. S. Lam. The design and evaluation of a shared object system for distributed memory machines. In Proceedings, Operating Systems Design and Implementation, pages 101-114, November 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. SWG92.J.P. Singh, W. Weber, A. Gupta. Splash: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20(1):5-44, 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. TUG91.A. Tucker and A. Gupta. Process control and scheduling issues for multiprogrammed sharedmemory multiprocessors. In Proceedings of the Twelfth A CM Symposium on Operating Systems Principles, pages 159-166, December 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. VaZ91.R. Vaswani and J Zahorjan. The implications of cache affinity on processor scheduling for multiprogrammed, shared-memory multiprocessors. In Proceedings of the Thirteenth A CM Symposium on Operating Systems Principles, pages 26-40, October 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Operating system support for improving data locality on CC-NUMA compute servers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
        October 1996
        290 pages
        ISBN:0897917677
        DOI:10.1145/237090

        Copyright © 1996 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 September 1996

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        ASPLOS VII Paper Acceptance Rate25of109submissions,23%Overall Acceptance Rate535of2,713submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader