skip to main content
10.1145/1508244.1508259acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations

Published:07 March 2009Publication History

ABSTRACT

Miss rate curves (MRCs) are useful in a number of contexts. In our research, online L2 cache MRCs enable us to dynamically identify optimal cache sizes when cache-partitioning a shared-cache multicore processor. Obtaining L2 MRCs has generally been assumed to be expensive when done in software and consequently, their usage for online optimizations has been limited. To address these problems and opportunities, we have developed a low-overhead software technique to obtain L2 MRCs online on current processors, exploiting features available in their performance monitoring units so that no changes to the application source code or binaries are required. Our technique, called RapidMRC, requires a single probing period of roughly 221 million processor cycles (147 ms), and subsequently 124 million cycles (83 ms) to process the data. We demonstrate its accuracy by comparing the obtained MRCs to the actual L2 MRCs of 30 applications taken from SPECcpu2006, SPECcpu2000, and SPECjbb2000. We show that RapidMRC can be applied to sizing cache partitions, helping to achieve performance improvements of up to 27%.

References

  1. D. Albonesi. Selective cache ways: on-demand cache resource allocation. In MICRO, pages 248--259, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Antonopoulos, D. Nikolopoulos, and T. Papatheodorou. Scheduling algorithms with bus bandwidth considerations for SMPs. In ICPP, pages 547--554, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Azimi, L. Soares, M. Stumm, T. Walsh, and A. Demke Brown. PATH: page access tracking to improve memory management. In ISMM, pages 31--42, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Azimi, M. Stumm, and R. Wisniewski. Online performance analysis by statistical sampling of microprocessor performance counters. In ICS, pages 101--110, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In MICRO, pages 245--257, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Berg and E. Hagersten. StatCache: A probabilistic approach to efficient and accurate data locality analysis. In ISPASS, pages 20--27, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Berg and E. Hagersten. Fast data-locality profiling of native execution. In SIGMETRICS, pages 169--180, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Berg, H. Zeffer, and E. Hagersten. A statistical multiprocessor cache model. In ISPASS, pages 89--99, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  9. D. Bruening, E. Duesterwald, and S. Amarasinghe. Design and implementation of a dynamic optimization framework for Windows. In FDDO, 2001.Google ScholarGoogle Scholar
  10. B. Buck and J. Hollingsworth. An API for runtime code patching. J. of High Performance Computing Applications, 14(4):317--329, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA, pages 340--351, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Cho and L Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, pages 455--468, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Edler and M. Hill. Dinero IV trace-driven uniprocessor cache simulator. URL http://www.cs.wisc.edu/~markhill/DineroIV.Google ScholarGoogle Scholar
  14. A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In USENIX ATC, pages 26--26, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Guo and Y. Solihin. An analytical model for cache replacement policy performance. In SIGMETRICS, pages 228--239, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS, pages 257--266, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Iyer, L. Zhao, F. Guo, R. Illikkal, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS, pages 25--36, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Kim, J. Choi, J. Kim, S. Noh, S. Min, Y. Cho, and C. Kim. A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references. In OSDI, pages 119--34, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, pages 111--122, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Kim, M. Hill, and D. Wood. Implementing stack simulation for highlyassociative memories. In SIGMETRICS, pages 212--213, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Liedtke, H. Hartig, and M. Hohmuth. OS-controlled cache predictability for real-time systems. In RTAS, pages 213--227, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA, pages 367--378, 2008.Google ScholarGoogle Scholar
  23. C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the last line of defense before hitting the memory wall for CMPs. In HPCA, pages 176--185, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI, pages 190--200, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Mattson, J. Gecsei, D. Slutz, and I. Traiger. Evaluation techniques and storage hierarchies. IBM Systems J., 9(2):78--117, 1970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Meng, R. Joseph, R. Dick, and L. Shang. Multi-optimization power management for chip multiprocessors. In PACT, pages 177--186, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Olszewski, K. Mierle, A. Czajkowski, and A. Demke Brown. JIT instrumentation: a novel approach to dynamically instrument operating systems. In EuroSys, pages 3--16, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Patterson, G. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed prefetching and caching. In SOSP, pages 79--95, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, pages 423--432, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Rafique, W. Lim, and M. Thottethodi. Architectural support for operating system-driven CMP cache management. In PACT, pages 2--12, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Rajkumar, C. Lee, J. Lehoczky, and D. Siewiorek. A resource allocation model for QoS management. In RTSS, pages 298--308, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Settle, J. Kihm, A. Janiszewski, and D. Connors. Architectural support for enhanced SMT job scheduling. In PACT, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Seward and N. Nethercote. Using Valgrind to detect undefined value errors with bit-precision. In USENIX ATC, pages 17--30, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In POPL, pages 55--61, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In ISCA, pages 336--349, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Snavely and D. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In ASPLOS, pages 234--244, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In MICRO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. Soundararajan, J. Chen, M. Sharaf, and C. Amza. Dynamic partitioning of the cache hierarchy in shared data centers. In VLDB, pages 635--646, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Srikantaiah, M. Kandemir, and M. Irwin. Adaptive set pinning: Managing shared caches in chip multiprocessors. In ASPLOS, pages 135--144, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. H. Stone, J. Turek, and J. Wolf. Optimal partitioning of cache memory. IEEE TOC, 41(9):1054--1068, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. E. Suh, L Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. The J. of Supercomputing, 28(1):7--26, Apr. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared L2 caches on multicore systems in software. In WIOSCA, pages 26--33, 2007.Google ScholarGoogle Scholar
  43. D. Tam, R. Azimi, and M. Stumm. Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In EuroSys, pages 47--58, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. Thiebaut, H. Stone, and J.Wolf. Improving disk cache hit-ratios through cache partitioning. IEEE TOC, 41(6):665--676, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. T. Yang, E. Berger, S. Kaplan, and J. Moss. CRAMM: virtual memory support for garbage-collected applications. In OSDI, pages 103--116, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Q. Zhao, R. Rabbah, S. Amarasinghe, L. Rudolph, and W.-F. Wong. Ubiquitous memory introspection. In CGO, pages 299--311, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In ASPLOS, pages 177--188, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Zhou, J. Philbin, and K. Li. The multi-queue replacement algorithm for second level buffer caches. In USENIX ATC, pages 91--104, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
              March 2009
              358 pages
              ISBN:9781605584065
              DOI:10.1145/1508244
              • cover image ACM SIGPLAN Notices
                ACM SIGPLAN Notices  Volume 44, Issue 3
                ASPLOS 2009
                March 2009
                346 pages
                ISSN:0362-1340
                EISSN:1558-1160
                DOI:10.1145/1508284
                Issue’s Table of Contents
              • cover image ACM SIGARCH Computer Architecture News
                ACM SIGARCH Computer Architecture News  Volume 37, Issue 1
                ASPLOS 2009
                March 2009
                346 pages
                ISSN:0163-5964
                DOI:10.1145/2528521
                Issue’s Table of Contents

              Copyright © 2009 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 7 March 2009

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate535of2,713submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader