skip to main content
10.1145/2063384.2063454acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation

Published:12 November 2011Publication History

ABSTRACT

Two major trends in high-performance computing, namely, larger numbers of cores and the growing size of on-chip cache memory, are creating significant challenges for evaluating the design space of future processor architectures. Fast and scalable simulations are therefore needed to allow for sufficient exploration of large multi-core systems within a limited simulation time budget. By bringing together accurate high-abstraction analytical models with fast parallel simulation, architects can trade off accuracy with simulation speed to allow for longer application runs, covering a larger portion of the hardware design space. Interval simulation provides this balance between detailed cycle-accurate simulation and one-IPC simulation, allowing long-running simulations to be modeled much faster than with detailed cycle-accurate simulation, while still providing the detail necessary to observe core-uncore interactions across the entire system. Validations against real hardware show average absolute errors within 25% for a variety of multi-threaded workloads; more than twice as accurate on average as one-IPC simulation. Further, we demonstrate scalable simulation speed of up to 2.0 MIPS when simulating a 16-core system on an 8-core SMP machine.

References

  1. A. Alameldeen and D. Wood. Variability in architectural simulations of multi-threaded workloads. In Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA), pages 7--18, Feb. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. C. Barr, H. Pan, M. Zhang, and K. Asanovic. Accelerating multiprocessor simulation with a memory timestamp record. In Proceedings of the 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 66--77, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 72--81, Oct. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26:52--60, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Chen, L. K. Dabbiru, D. Wong, M. Annavaram, and M. Dubois. Adaptive and speculative slack simulations of CMPs on CMPs. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 523--534. IEEE Computer Society, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Chiou, D. Sunwoo, J. Kim, N. A. Patil, W. Reinhart, D. E. Johnson, J. Keefe, and H. Angepat. FPGA-accelerated simulation technologies (FAST): Fast, full-system, cycle-accurate simulators. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 249--261, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. M. Conte, M. A. Hirsch, and K. N. Menezes. Reducing state loss for effective trace sampling of superscalar processors. In Proceedings of the International Conference on Computer Design (ICCD), pages 468--477, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Cui, W. Wu, Y. Wang, X. Guo, Y. Chen, and Y. Shi. A discrete event simulation model for understanding kernel lock thrashing on multi-core architectures. In Proceedings of the 16th International Conference on Parallel and Distributed Systems (ICPADS), pages 1--8, Dec. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Ekman and P. Stenström. Enhancing multiprocessor architecture simulation speed using matched-pair comparison. In Proceedings of the 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 89--99, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A mechanistic performance model for superscalar out-of-order processors. ACM Transactions on Computer Systems (TOCS), 27(2):42--53, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Fog. Instruction tables. http://www.agner.org/optimize/instruction_tables.pdf, April 2011.Google ScholarGoogle Scholar
  12. H. Franke, R. Russell, and M. Kirkwood. Fuss, futexes and furwocks: Fast userlevel locking in Linux. In Proceedings of the 2002 Ottawa Linux Summit, pages 479--495, 2002.Google ScholarGoogle Scholar
  13. R. M. Fujimoto. Parallel discrete event simulation. Communications of the ACM, 33(10):30--53, Oct. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Genbrugge, S. Eyerman, and L. Eeckhout. Interval simulation: Raising the level of abstraction in architectural simulation. In Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), pages 307--318, Feb. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  15. L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. D. an B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 102--113, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Jaleel, R. S. Cohn, C.-K. Luk, and B. Jacob. CMP$im: A Pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA 2008, pages 28--36, June 2008.Google ScholarGoogle Scholar
  17. A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques (PACT), pages 208--219, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Lee, J. Collins, H. Wang, and D. Brooks. CPR: Composable performance regression for scalable multiprocessor models. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 270--281, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 190--200. ACM, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. ACM SIGARCH Computer Architecture News, 33(4):92--99, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. M. G. Maynard, C. M. Donnelly, and B. R. Olszewski. Contrasting characteristics and cache performance of technical and multi-user commercial workloads. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 145--156, Oct. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. E. Miller, H. Kasture, G. Kurian, C. Gruenwald III, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A distributed parallel simulator for multicores. In Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), pages 1--12, Jan. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  23. K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. LogTM: Log-based transactional memory. In Proceedings of the 13th International Symposium on High Performance Computer Architecture (HPCA), pages 254--265, Feb. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. Pellauer, M. Adler, M. Kinsy, A. Parashar, and J. Emer. HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing. In Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA), pages 406--417, Feb. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. K. Reinhardt, M. D. Hill, J. R. Larus, A. R. Lebeck, J. C. Lewis, and D. A. Wood. The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 48--60, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 45--57, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. J. Sorin, V. S. Pai, S. V. Adve, M. K. Vernon, and D. A. Wood. Analytic evaluation of shared-memory systems with ILP processors. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA), pages 380--391, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. Uzelac and A. Milenkovic. Experiment flows and microbenchmarks for reverse engineering of branch predictor structures. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 207--217, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  29. J. Wawrzynek, D. Patterson, M. Oskin, S.-L. Lu, C. Kozyrakis, J. C. Hoe, D. Chiou, and K. Asanovic. RAMP: Research accelerator for multiple processors. IEEE Micro, 27(2):46--57, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture (ISCA), pages 24--36, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA), pages 84--95, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Yourst. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proceedings of the 2007 IEEE International Symmposium on Performance Analysis of Systems and Software (ISPASS), pages 23--34. Apr. 2007.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
      November 2011
      866 pages
      ISBN:9781450307710
      DOI:10.1145/2063384

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 November 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SC '11 Paper Acceptance Rate74of352submissions,21%Overall Acceptance Rate1,516of6,373submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader