Operating system support for improving data locality on CC-NUMA compute servers

Authors:
Ben Verghese

Computer Systems Laboratory, Stanford University, CA

Computer Systems Laboratory, Stanford University, CA
View Profile

,
Scott Devine

Computer Systems Laboratory, Stanford University, CA

Computer Systems Laboratory, Stanford University, CA
View Profile

,
Anoop Gupta

Computer Systems Laboratory, Stanford University, CA

Computer Systems Laboratory, Stanford University, CA
View Profile

,
Mendel Rosenblum

Computer Systems Laboratory, Stanford University, CA

Computer Systems Laboratory, Stanford University, CA
View Profile

ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systemsOctober 1996Pages 279–289https://doi.org/10.1145/237090.237205

Published:01 September 1996Publication History

ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems

Pages 279–289

ABSTRACT

The dominant architecture for the next generation of shared-memory multiprocessors is CC-NUMA (cache-coherent non-uniform memory architecture). These machines are attractive as compute servers because they provide transparent access to local and remote memory. However, the access latency to remote memory is 3 to 5 times the latency to local memory. CC-NOW machines provide the benefits of cache coherence to networks of workstations, at the cost of even higher remote access latency. Given the large remote access latencies of these architectures, data locality is potentially the most important performance issue. Using realistic workloads, we study the performance improvements provided by OS supported dynamic page migration and replication. Analyzing our kernel-based implementation, we provide a detailed breakdown of the costs. We show that sampling of cache misses can be used to reduce cost without compromising performance, and that TLB misses may not be a consistent approximation for cache misses. Finally, our experiments show that dynamic page migration and replication can substantially increase application performance, as much as 30%, and reduce contention for resources in the NUMA memory system.

References

ABL+91.T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy. Scheduler activations: effective kernel support for the user-level management of parallelism. in Proceedings of the 13th ACM Symposium on Operating System Principles, pages 95-109, October 1991.]] Google ScholarDigital Library
ACD+91.Anant Agarwal et al. The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor. MIT/LCS Memo TM-454, Massachusetts Institute of Technology, 1991.]] Google ScholarDigital Library
BCZ90.J.K. Bennett, J. B. Carter, W. Zwaeneopoel. Munin: Distributed shared memory based on type-specific memory coherence. In Proceedings of the Second Symposium on Principles and Practiceof Parallel Programming, pages 168-175, March 1990.]] Google ScholarDigital Library
BZS93.B.N. Bershad, M. J. Zekauskas, and W. A. Sawdon. The Midway Distributed Shared Memory System. In Proceedings of the 1993 IEEE CompCon Conference, pages 528-537, February 1993.]]Google ScholarCross Ref
BGW89.D. Black, A. Gupta, and W. D. Weber. Competitive management of distributed shared memory. In Proceedings of COMPCON, pages 184-190, March 1989.]]Google ScholarCross Ref
BSF+91.W. Bolosky, M. Scott, R. Fitzgerald, and A. Cox. NUMA policies and their relationship to memory architecture. In Proceedings, Architectural Support for Programming Languages and Operating Systems, pages 212-221, April 1991.]] Google ScholarDigital Library
CDV+94.R. Chandra, S Devine, B Verghese, A Gupta, and Mendel Rosenblum. Scheduling and Page Migration for Multiprocessor Compute Servers. in Proceedings, Architectural Support for Programming Languages and Operating Systems, 12-24, October 1994.]] Google ScholarDigital Library
CoF89.A.L. Cox and R. J. Fowler. The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with Platinum. In Proceedings of the Twelfth A CM Symposium on Operating Systems Principles, pages 32-43, December 1989.]] Google ScholarDigital Library
Hol89.M Holliday. Reference history, page size, and migration daemons in local/remote architectures. In Proceedings, Architectural Support for Programming Languages and Operating Systems, pages 104-112, April 1989.]] Google ScholarDigital Library
Kus+94.J. Kuskin, et al. The Stanford FLASH Multiprocessor. In Proceedings of the 21st International Symposium on Computer Architecture, pages 302-313, April 1994.]] Google ScholarDigital Library
LEK91.R.P. LaRowe Jr., C. S. Ellis, and L. S. Kaplan. The robustness of NUMA memory management. In Proceedings of the Thirteenth A CM Symposium on Operating System Principles, pages 137-151, October 1991.]] Google ScholarDigital Library
LLG+90.D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessey. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990.]] Google ScholarDigital Library
Li88.K. Li. IVY: A shared virtual memory system for parallel computing. In Proceedings of the 1988 International Conference on Parallel Processing, pages 125-132, August 1988.]]Google Scholar
LoC96.T. Lovett and R. Clapp. STING: A CC-NUMA Computer System for the Commercial Marketplace. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 308-317, May 1996.]] Google ScholarDigital Library
NAB+95.A. Nowatzyk et al. The S3.mp Scalable Memory Multiprocessor. Proceedings of the 24th International Conference on Parallel Processing, Aug. 1995]]Google Scholar
RHW+95.M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta. Complete Computer Simulation: the SimOS approach. In IEEE Parallel and Distributed Technology, Fall 1995.]] Google ScholarDigital Library
RSL92.M. Rinard, D. Scales, M. Lam. Heterogeneous parallel programming in Jade. in Proceedings of Supercomputing '92, pages 245-56.]] Google ScholarDigital Library
ScL94.D.J. Scales and M. S. Lam. The design and evaluation of a shared object system for distributed memory machines. In Proceedings, Operating Systems Design and Implementation, pages 101-114, November 1994.]] Google ScholarDigital Library
SWG92.J.P. Singh, W. Weber, A. Gupta. Splash: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20(1):5-44, 1992.]] Google ScholarDigital Library
TUG91.A. Tucker and A. Gupta. Process control and scheduling issues for multiprogrammed sharedmemory multiprocessors. In Proceedings of the Twelfth A CM Symposium on Operating Systems Principles, pages 159-166, December 1991.]] Google ScholarDigital Library
VaZ91.R. Vaswani and J Zahorjan. The implications of cache affinity on processor scheduling for multiprogrammed, shared-memory multiprocessors. In Proceedings of the Thirteenth A CM Symposium on Operating Systems Principles, pages 26-40, October 1991.]] Google ScholarDigital Library

Index Terms

Operating system support for improving data locality on CC-NUMA compute servers

Recommendations

Operating system support for improving data locality on CC-NUMA compute servers

The dominant architecture for the next generation of shared-memory multiprocessors is CC-NUMA (cache-coherent non-uniform memory architecture). These machines are attractive as compute servers because they provide transparent access to local and remote ...
Read More
Operating system support for improving data locality on CC-NUMA compute servers

The dominant architecture for the next generation of shared-memory multiprocessors is CC-NUMA (cache-coherent non-uniform memory architecture). These machines are attractive as compute servers because they provide transparent access to local and remote ...
Read More
OS Support for Improving Data Locality on CC-NUMA Compute Servers
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
October 1996
290 pages
ISBN:0897917677
DOI:10.1145/237090
Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggets
Univ. of Washington, Seattle
ACM SIGOPS Operating Systems Review Volume 30, Issue 5
Dec. 1996
273 pages
ISSN:0163-5980
DOI:10.1145/248208
Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggers
Univ. of Washington, Seattle
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 31, Issue 9
Sept. 1996
273 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/248209
Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggers
Univ. of Washington, Seattle
Issue’s Table of Contents
Copyright © 1996 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 1996
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ASPLOS VII Paper Acceptance Rate25of109submissions,23%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 216
  Total Citations
  View Citations
- 1,501
  Total Downloads
- Downloads (Last 12 months)120
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Operating system support for improving data locality on CC-NUMA compute servers

ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Operating system support for improving data locality on CC-NUMA compute servers

Operating system support for improving data locality on CC-NUMA compute servers

OS Support for Improving Data Locality on CC-NUMA Compute Servers