ABSTRACT
Abstract: We measure the experimental error that arises from the use of non-validated simulators in computer architecture research, with the goal of increasing the rigor of simulation- based studies. We describe the methodology that we used to validate a microprocessor simulator against a Compaq DS-10L workstation, which contains an Alpha 21264 processor. Our evaluation suite consists of a set of 21 microbenchmarks that stress different aspects of the 21264 microarchitecture. Using the microbenchmark suite as the set of workloads, we describe how we reduced our simulator error to an arithmetic mean of 2%, and include details about the specific aspects of the pipeline that required extra care to reduce the error. We show how these low-level optimizations reduce average error from 40% to less than 20% on macrobenchmarks drawn from the SPEC2000 suite. Finally, we examine the degree to which performance optimizations are stable across different simulators, showing that researchers would draw different conclusions, in some cases, if using validated simulators.
- {1} Vikas Agarwal, M. S. Hrishikesh, Stephen W. Keckler, and Doug Burger. Clock rate versus IPC: The end of the road for conventional microarchitectures. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000. Google ScholarDigital Library
- {2} Bryan Black and John Paul Shen. Calibration of microprocessor performance models. Computer, 31(5):59-65, May 1998. Google ScholarDigital Library
- {3} P. Bose and T. Conte. Performance analysis and its impact on design. Computer, 31(5):41-49, May 1998. Google ScholarDigital Library
- {4} Doug Burger and Todd M. Austin. The simplescalar tool set version 2.0. Technical Report 1342, Department of Computer Sciences, University of Wisconsin-Madison, June 1997.Google Scholar
- {5} Compaq Computer Corporation. Alpha 21264 Microprocessor Hardware Reference Manual, July 1999.Google Scholar
- {6} Compaq Computer Corporation. Compiler Writer's Guide for the Alpha 21264, 1999.Google Scholar
- {7} José-Lorenzo Cruz, Antonio González, Mateo Valero, and Nigel P. Topham. Multiple-banked register file architectures. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 316-325, June 2000. Google ScholarDigital Library
- {8} Vinodh Cuppu, Bruce Jacob, Brian Davis, and Trevor Mudge. A performance comparison of contemporary DRAM architectures. In Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 222-233, May 1999. Google ScholarDigital Library
- {9} Jeff Gibson, Robert Kunz, David Ofelt, Mark Horowitz, John Hennessy, and Mark Heinrich. Flash vs. (simulated) Flash: Closing the simulation loop. In Proceedings of the 9th International Symposium on Architectural Support for Programming Languages and Operating Systems, November 2000. Google ScholarDigital Library
- {10} N. P. Jouppi and S. J. E. Wilton. Tradeoffs in two-level on-chip caching. In Proceedings of the 21st Annual International Symposium on Computer Architecture, April 1994. Google ScholarDigital Library
- {11} R. Kessler. The Alpha 21264 microprocessor. IEEE micro, 19(2):24-36, March 1999. Google ScholarDigital Library
- {12} R. Kessler, E. McLellan, and D. Webb. The Alpha 21264 micro-processor architecture. In Proceedings of International Conference on Computer Design, pages 90-105, October 1998. Google ScholarDigital Library
- {13} David Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the Eighth International Symposium on Computer Architecture, pages 81-87, May 1981. Google ScholarDigital Library
- {14} P. S. Magnusson, F. Dahlgren, H. Grahn, M. Karlsson, F. Larsson, F. Lundholm, A. Moestedt, J. Nilsson, P. Stenstrom, and B. Werner. Simics/sun4m: A virtual workstation. In Proceedings of the Usenix Annual Technical Conference, pages 119-130, June 1998. Google ScholarDigital Library
- {15} J. McCalpin. The stream benchmark site. http://www.cs.virginia.edu/stream/.Google Scholar
- {16} L. McVoy and C. Staelin. Lmbench: Portable tools for performance analysis. In Proceedings of the USENIX 1996 Annual Technical Conference, pages 279-294, January 1996. Google ScholarDigital Library
- {17} V. Pai, P. Ranganathan, and S. Adve. RSim: A simulator for shared-memory multiprocessor and uniprocessor systems that exploit ILP. In Proceedings of the 3rd Workshop on Computer Architecture Education, 1997. Google ScholarDigital Library
- {18} Matt Reilly and John Edmondson. Performance simulation of an Alpha microprocessor. Computer, 31(5):50-58, May 1998. Google ScholarDigital Library
- {19} M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta. Complete computer simulation: The SimOS approach. In IEEE Parallel and Distributed Technology, 1995. Google ScholarDigital Library
- {20} Gurindar S. Sohi. Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers. IEEE Transactions on Computers, 39(3):349-359, March 1990. Google ScholarDigital Library
Index Terms
- Measuring Experimental Error in Microprocessor Simulation
Recommendations
Errata on "Measuring Experimental Error in Microprocessor Simulation"
This short paper serves to correct the errors contained in the paper entitled "Measuring Experimental Error in Microprocessor Simulation," presented at the 2001 International Symposium on Computer Architecture (ISCA-28) [2]. That paper contained a study ...
The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor
COMPCON '95: Proceedings of the 40th IEEE Computer Society International ConferenceThe PowerPC 620 RISC microprocessor is the first chip for the application server and technical workstation product line within the PowerPC family. It utilizes a high performance microarchitecture with many advanced superscalar features to exploit ...
The IBM z13 multithreaded microprocessor
The IBM z13™ system is the latest generation of the IBM z Systems™ mainframes. The z13 microprocessor improves upon the IBM zEnterprise® EC12 (zEC12) processor with two vector execution units, higher instruction execution parallelism, and a simultaneous ...
Comments