skip to main content
article

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

Published:01 June 2008Publication History
Skip Abstract Section

Abstract

In a chip-multiprocessor (CMP) system, the DRAM system isshared among cores. In a shared DRAM system, requests from athread can not only delay requests from other threads by causingbank/bus/row-buffer conflicts but they can also destroy other threads’DRAM-bank-level parallelism. Requests whose latencies would otherwisehave been overlapped could effectively become serialized. As aresult both fairness and system throughput degrade, and some threadscan starve for long time periods.This paper proposes a fundamentally new approach to designinga shared DRAM controller that provides quality of service to threads,while also improving system throughput. Our parallelism-aware batchscheduler (PAR-BS) design is based on two key ideas. First, PARBSprocesses DRAM requests in batches to provide fairness and toavoid starvation of requests. Second, to optimize system throughput,PAR-BS employs a parallelism-aware DRAM scheduling policythat aims to process requests from a thread in parallel in the DRAMbanks, thereby reducing the memory-related stall-time experienced bythe thread. PAR-BS seamlessly incorporates support for system-levelthread priorities and can provide different service levels, includingpurely opportunistic service, to threads with different priorities.We evaluate the design trade-offs involved in PAR-BS and compareit to four previously proposed DRAM scheduler designs on 4-, 8-, and16-core systems. Our evaluations show that, averaged over 100 4-coreworkloads, PAR-BS improves fairness by 1.11X and system throughputby 8.3% compared to the best previous scheduling technique, Stall-Time Fair Memory (STFM) scheduling. Based on simple request prioritizationrules, PAR-BS is also simpler to implement than STFM.

References

  1. S. Bhansali et al. Framework for instruction-level tracing and analysis of programs. In VEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Y. Chou, B. Fahs, and S. Abraham. Microarchitecture optimizations for exploiting memory-level parallelism. In ISCA-31, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. Cuppu, B. Jacob, B. T. Davis, and T. Mudge. A performance comparison of contemporary DRAM architectures. In ISCA-26, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. T. Davis. Modern DRAM Architectures. PhD thesis, University of Michigan, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Dundas and T. Mudge. Improving data cache performance by pre-executing instructions under a cache miss. In ICS-11, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. M. Frailong, W. Jalby, and J. Lenfant. XOR-Schemes: A flexible data organization in parallel memories. In ICPP, 1985.Google ScholarGoogle Scholar
  7. H. Frank. Analysis and optimization of disk storage devices for time-sharing systems. Journal of the ACM, 16(4):602-620, Oct. 1969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Gabor, S. Weiss, and A. Mendelson. Fairness and throughput in switch on event multithreading. In MICRO-39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Glew. MLP yes! ILP no! In ASPLOS Wild and Crazy Idea Session, Oct. 1998.Google ScholarGoogle Scholar
  10. I. Hur and C. Lin. Adaptive history-based memory schedulers. In MICRO-37, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Iyer et al. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. M. Jacobson and J. Wilkes. Disk scheduling algorithms based on rotational position. Technical Report HPLCSP917rev1, HP Labs, 1991.Google ScholarGoogle Scholar
  13. T. Karkhanis and J. E. Smith. A day in the life of a data cache miss. In Second Workshop on Memory Performance Issues, 2002.Google ScholarGoogle Scholar
  14. S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. PACT-13, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T.-C. Lin et al. Quality-aware memory controller for multimedia platform SoC. In IEEE Workshop on Signal Processing Systems, 2003.Google ScholarGoogle Scholar
  17. C.-K. Luk et al. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google ScholarGoogle Scholar
  19. C. Macian et al. Beyond performance: secure and fair memory management for multiple systems on a chip. In FPT, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. A. McKee et al. Dynamic access ordering for streamed computations. IEEE Transactions on Computers, 49(11):1255-1271, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Micron. 1Gb DDR2 SDRAM Component: MT47H128M8HQ-25, May 2007. http://download.micron.com/pdf/datasheets/dram/ddr2/1GbDDR2.pdf.Google ScholarGoogle Scholar
  22. T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O. Mutlu et al. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In HPCA-9, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. O. Mutlu, H. Kim, and Y. N. Patt. Efficient runahead execution: Power-efficient memory latency tolerance. IEEE Micro, 26(1):10-20, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Mutlu and T. Moscibroda. Enhancing the performance and fairness of shared dram systems with parallelism-aware batch scheduling. Technical Report MSR-TR- 2008-26, Microsoft Research, Feb. 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Natarajan et al. A study of performance impact of memory controller features in multi-processor server environment. In WMPI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair queuing memory systems. In MICRO-39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Patil et al. Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA-33, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Rafique, W.-T. Lim, and M. Thottethodi. Effective management of DRAM bandwidth in multicore processors. In PACT, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Rixner. Memory controller optimizations for web servers. In MICRO-37, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In ISCA-27, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Shao and B. T. Davis. A burst scheduling access reordering mechanism. In HPCA- 13, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. E. Smith and A. R. Pleszkun. Implementation of precise interrupts in pipelined processors. In ISCA-12, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W. E. Smith. Various optimizers for single stage production. Naval Research Logistics Quarterly, 3:59-66, 1956.Google ScholarGoogle ScholarCross RefCross Ref
  37. A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In ASPLOS-IX, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. HPCA-8, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. J. Teorey and T. B. Pinkerton. A comparative analysis of disk scheduling policies. Communications of the ACM, 15(3):177-184, 1972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R.M. Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 11:25-33, 1967.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. H. Woo et al. Analyzing performance vulnerability due to resource denial of service attack on chip multiprocessors. In CMP-MSI, 2007.Google ScholarGoogle Scholar
  42. Z. Zhang et al. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In MICRO-33, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Z. Zhu and Z. Zhang. A performance comparison of DRAM memory system optimizations for SMT processors. In HPCA-11, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. W. K. Zuravleff and T. Robinson. Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. U.S. Patent Number 5,630,096, May 1997.Google ScholarGoogle Scholar

Index Terms

  1. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader