Abstract
Simulation has historically been the primary technique used for evaluating the performance of new proposals in computer architecture. Speed and complexity considerations have traditionally limited its applicability to single-thread processors running application-level code. This is no longer sufficient to model modern multicore systems running the complex workloads of commercial interest today.
COTSon is a simulator framework jointly developed by HP Labs and AMD. The goal of COTSon is to provide fast and accurate evaluation of current and future computing systems, covering the full software stack and complete hardware models. It targets cluster-level systems composed of hundreds of commodity multicore nodes and their associated devices connected through a standard communication network. COTSon adopts a functional-directed philosophy, where fast functional emulators and timing models cooperate to improve the simulation accuracy at a speed sufficient to simulate the full stack of applications, middleware and OSs.
This paper describes the changes in simulation philosophy we embraced in COTSon to address these new challenges. We base functional emulation on established, fast and validated tools that support commodity OSs and complex multitier applications. Through a robust interface between the functional and timing domain, we can leverage other existing simulators for individual sub-components, such as disks or networks. We abandon the idea of "always-on" cycle-based simulation in favor of statistical sampling approaches that can trade accuracy for speed.
COTSon opens up a new dimension in the speed/accuracy space, allowing simulation of a cluster of nodes several orders of magnitude faster with a minimal accuracy loss.
- Ambric. Massively Parallel Processor Array technology. http://www.ambric.com.Google Scholar
- R. Bedicheck. SimNow: Fast platform simulation purely in software. In Hot Chips 16, Aug. 2004.Google Scholar
- S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, and M. Reif. TILE64 processor: A 64-core SoC with mesh interconnect. In Proceedings of the International Solid-State Circuits Conference (ISSCC 2008), Feb. 2008.Google ScholarCross Ref
- F. Bellard. QEMU, a fast and portable dynamic translator. In USENIX 2005 Annual Technical Conf., FREENIX Track, pages 41--46, Apr. 2005. Google ScholarDigital Library
- B. Calder. SimPoint. http://www.cse.ucsd.edu/~calder/simpoint.Google Scholar
- J. Dorsey, S. Searles, M. Ciraula, S. Johnson, N. Bujanos, D. Wu, M. Braganza, S. Meyers, E. Fang, and R. Kumar. An integrated quad-core Opteron processor. In IEEE International Solid-State Circuits Conference (ISSCC 2007), Feb. 2007.Google ScholarCross Ref
- A. Falcón, P. Faraboschi, and D. Ortega. Combining simulation and virtualization through dynamic sampling. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software, Apr. 2007.Google ScholarCross Ref
- A. Falcón, P. Faraboschi, and D. Ortega. An adaptive synchronization technique for parallel simulation of networked clusters. In Proc. of the 2008 IEEE International Symp. on Performance Analysis of Systems & Software, Apr. 2008. Google ScholarDigital Library
- R. M. Fujimoto. Parallel discrete event simulation. Commun. ACM, 33(10):30--53, 1990. Google ScholarDigital Library
- M. Gschwind, H. P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic processing in Cell's multicore architecture. IEEE Micro, 26(2):10--24, 2006. Google ScholarDigital Library
- T. Lafage and A. Seznec. Choosing representative slices of program execution for microarchitecture simulations: A preliminary application to the data stream. Workload Characterization of Emerging Computer applications, pages 145--163, 2001. Google ScholarDigital Library
- J. Lau, J. Sampson, E. Perelman, G. Hamerly, and B. Calder. The strong correlation between code signatures and performance. In Proceedings of the Intl. Symposium on Performance Analysis of Systems and Software, pages 236--247, Mar. 2005. Google ScholarDigital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI), 2005. Google ScholarDigital Library
- C. J. Mauer, M. D. Hill, and D. A. Wood. Full-system timing-first simulation. In SIGMETRICS '02: Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 108--116, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- J. Misra. Distributed discrete-event simulation. ACM Comput. Surv., 18(1):39--65, 1986. Google ScholarDigital Library
- M. Monchiero, J.-H. Ahn, A. Falcón, D. Ortega, and P. Faraboschi. How to simulate 1000 cores. In Workshop on Design, Architecture and Simulation of Chip Multiprocessors (dasCMP'08), Nov. 2008.Google Scholar
- NASA Ames Research Center. The NAS parallel benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.Google Scholar
- U. G. Nawathe, M. Hassan, L. Warriner, K. Yen, B. Upputuri, D. Greenhill, A. Kumar, and H. Park. An 8-core 64-thread 64-bit power efficient SPARC SoC (Niagara2). In Proceedings of the International Solid-State Circuits Conference (ISSCC 2007), pages 108--109, 2007.Google Scholar
- J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, and K. Schulten. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 26(16):1781--1802, Oct. 2005.Google ScholarCross Ref
- M. Rosenblum. VMware's virtual platform: A virtual machine monitor for commodity PCs. In Hot Chips 11, Aug. 1999.Google Scholar
- M. Rosenblum, S. A. Herrod, E. Witchel, and A. Gupta. Complete computer system simulation: The SimOS approach. IEEE Parallel Distrib. Technol., 3(4):34--43, 1995. Google ScholarDigital Library
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the 10th Intl. Conference on Architectural Support for Programming Languages and Operating Systems, pages 45--57, Oct. 2002. Google ScholarDigital Library
- A. Srivastava and A. Eustace. ATOM -- a system for building customized program analysis tools. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI), 1994. Google ScholarDigital Library
- Standard Performance Evaluation Corporation. SPEC CPU2000. http://www.spec.org/cpu2000.Google Scholar
- S. Thoziyoor, J. H. Ahn, M. Monchiero, J. B. Brockman, and N. P. Jouppi. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In Proc. of the 35th Annual International Symposium on Computer Architecture, June 2008. Google ScholarDigital Library
- TOP500 Project. TOP500 Supercomputer Sites. http://www.top500.org.Google Scholar
- D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. P. Jouppi, M. Fiorentino, A. Davis, N. Binkert, R. G. Beausoleil, and J. H. Ahn. Corona: System implications of emerging nanophotonic technology. In ISCA '08: Proceedings of the 35th International Symposium on Computer Architecture, pages 153--164, 2008. Google ScholarDigital Library
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. of the 22nd Annual International Symposium on Computer Architecture, pages 24--36, June 1995. Google ScholarDigital Library
- R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the 30th Annual Intl. Symposium on Computer Architecture, pages 84--97, June 2003. Google ScholarDigital Library
- J. J. Yi, L. Eeckhout, D. J. Lilja, B. Calder, L. K. John, and J. E. Smith. The future of simulation: A field of dreams. Computer, 39(11):22--29, 2006. Google ScholarDigital Library
- J. J. Yi, S. V. Kodakara, R. Sendag, D. J. Lilja, and D. M. Hawkins. Characterizing and comparing prevailing simulation techniques. In Proceedings of the 11th Intl. Conference on High Performance Computer Architecture, pages 266--277, Feb. 2005. Google ScholarDigital Library
Index Terms
- COTSon: infrastructure for full system simulation
Recommendations
Co-designing OpenMP Features Using OMPT and Simulation Tools
OpenMP: Portable Multi-Level Parallelism on Modern SystemsAbstractThe design of future HPC systems is trending towards more heterogeneity with different types of accelerators, special purpose instructions sets, system-on-chip designs, complex memory hierarchies, and multiple memory coherence domains. This ...
Accurately evaluating application performance in simulated hybrid multi-tasking systems
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arraysEvaluating the performance of reconfigurable computing applications in multi-tasking systems using simulation (as can be needed in early design-space exploration) faces several challenges. The complexity of full-system, cycle-accurate simulation ...
Extracting Threaded Traces in Simulation Environments
NPC 2013: Proceedings of the 10th IFIP International Conference on Network and Parallel Computing - Volume 8147Instruction traces play an important role in analyzing and understanding the behavior of target applications; however, existing tracing tools are built on specific platforms coupled with excessive reliance on compilers and operating systems. In this ...
Comments