ABSTRACT
Chip-multiprocessors are quickly gaining momentum in all segments of computing. However, the practical success of CMPs strongly depends on addressing the difficulty of multithreaded application development. To address this challenge, it is necessary to co-develop new CMP architecture with novel programming models. Currently, architecture research relies on software simulators which are too slow to facilitate interesting experiments with CMP software without using small datasets or significantly reducing the level of detail in the simulated models. An alternative to simulation is to exploit the rich capabilities of modern FPGAs to create FPGA-based platforms for novel CMP research. This paper presents ATLAS, the first prototype for CMPs with hardware support for Transactional Memory (TM), a technology aiming to simplify parallel programming. ATLAS uses the BEE2 multi-FPGA board to provide a system with 8 PowerPC cores that run at 100MHz and runs Linux. ATLAS provides significant benefits for CMP research such as 100x performance improvement over a software simulator and good visibility that helps with software tuning and architectural improvements. In addition to presenting and evaluating ATLAS, we share our observations about building a FPGA-based framework for CMP research. Specifically, we address issues such as overall performance, challenges of mapping ASIC-style CMP RTL on to FPGAs, software support, the selection criteria for the base processor, and the challenges of using pre-designed IP libraries.
- H. Sutter, "The free lunch is over: A fundamental turn toward concurrency in software," Dr. Dobb's Journal, vol. 30, March 2005.Google Scholar
- B. Lewis and D. J. Berg, Multithreaded Programming with Pthreads. Prentice Hall, 1998. Google ScholarDigital Library
- M. Herlihy and J. E. B. Moss, "Transactional memory: Architectural support for lock-free data structures," in Proceedings of the 20th International Symposium on Computer Architecture, pp. 289--300, 1993. Google ScholarDigital Library
- L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun, "Transactional memory coherence and consistency," in Proceedings of the 31st International Symposium on Computer Architecture, pp. 102--113, June 2004. Google ScholarDigital Library
- C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie, "Unbounded Transactional Memory," in Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA'05), (San Franscisco, California), pp. 316--327, February 2005. Google ScholarDigital Library
- R. Rajwar, M. Herlihy, and K. Lai, "Virtualizing Transactional Memory," in ISCA '05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, (Washington, DC, USA), pp. 494--505, IEEE Computer Society, June 2005. Google ScholarDigital Library
- K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood, "LogTM: Log-Based Transactional Memory," in 12th International Conference on High-Performance Computer Architecture, February 2006.Google Scholar
- N. Shavit and D. Touitou, "Software transactional memory," in Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing, (Ottawa, Canada), pp. 204--213, August 1995. Google ScholarDigital Library
- M. Herlihy, V. Luchangco, M. Moir, and I. William N. Scherer, "Software transactional memory for dynamic-sized data structures," in PODC '03: Proceedings of the twenty-second annual symposium on Principles of distributed computing, (New York, NY, USA), pp. 92--101, ACM Press, July 2003. Google ScholarDigital Library
- T. Harris and K. Fraser, "Language support for lightweight transactions," in OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pp. 388--402, ACM Press, 2003. Google ScholarDigital Library
- A. Welc, S. Jagannathan, and A. L. Hosking, "Transactional monitors for concurrent objects," in Proceedings of the European Conference on Object-Oriented Programming (M. Odersky, ed.), vol. 3086 of Lecture Notes in Computer Science, pp. 519--542, Springer-Verlag, 2004.Google Scholar
- B. Saha, A.-R. Adl-Tabatabai, R. L. Hudson, C. Cao Minh, and B. Hertzberg, "A high performance software transactional memory system for a multi-core runtime," in PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, (New York, NY, USA), ACM Press, March 2006. Google ScholarDigital Library
- M. F. Ringenburg and D. Grossman, "Atomcaml: first-class atomicity via rollback," in ICFP '05: Proceedings of the tenth ACM SIGPLAN international conference on Functional programming, (New York, NY, USA), pp. 92--104, ACM Press, 2005. Google ScholarDigital Library
- V. J. Marathe, W. N. Scherer III, and M. L. Scott, "Adaptive Software Transactional Memory," in 19th International Symposium on Distributed Computing, September 2005. Google ScholarDigital Library
- Arvind, K. Asanovic, D. Chiou, J. C. Hoe, C. Kozyrakis, S.-L. Lu, M. Oskin, D. Patterson, J. Rabaey, and J. Wawrzynek, "RAMP: Research accelerator for multiple processors - a community vision for a shared experimental parallel HW/SW platform," tech. rep., 2005.Google Scholar
- C. Chang, J. Wawrzynek, and R. W. Brodersen, "BEE2: A high-end reconfigurable computing system," IEEE Design and Test of Computers, vol. 22, pp. 114--125, Mar/Apr 2005. Google ScholarDigital Library
- J. Chung, H. Chafi, C. Cao Minh, A. McDonald, B. D. Carlstrom, C. Kozyrakis, and K. Olukotun, "The Common Case Transactional Behavior of Multithreaded Programs," in Proceedings of the 12th International Conference on High-Performance Computer Architecture, February 2006.Google Scholar
- J. Chung, C. Cao Minh, A. McDonald, H. Chafi, B. D. Carlstrom, T. Skare, C. Kozyrakis, and K. Olukotun, "Tradeoffs in transactional memory virtualization," in ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, ACM Press, Oct 2006. Google ScholarDigital Library
- A. McDonald, J. Chung, H. Chafi, C. Cao Minh, B. D. Carlstrom, L. Hammond, C. Kozyrakis, and K. Olukotun, "Characterization of TCC on Chip-Multiprocessors," in PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, (Washington, DC, USA), pp. 63--74, IEEE Computer Society, September 2005. Google ScholarDigital Library
- K. Oner, L. A. Barroso, S. Iman, J. Jeong, K. Ramamurthy, and M. Dubois, "The design of RPM: an FPGA-based multiprocessor emulator," in FPGA '95: Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays, pp. 60--66, 1995. Google ScholarDigital Library
- J. D. Davis, S. E. Richardson, C. Charitsis, and K. Olukotun, "A chip prototyping substrate: the flexible architecture for simulation and testing (fast)," vol. 33, pp. 34--43, New York, NY, USA: ACM Press, 2005. Google ScholarDigital Library
- D. Chiou, H. Sunjeliwala, D. Sunwoo, J. Xu, and N. Patil, "Fpga-based fast, cycle-accurate, full-system simulators," in 2nd Workshop on Architecture Research using FPGA Platforms, 12th International Symposium on High-Performance Computer Architecture (HPCA-12), February 2006.Google Scholar
- N. Dave, M. Pellauer, Arvind, and J. Emer, "Implementing a functional/timing partitioned microprocessor simulator with an fpga," in 2nd Workshop on Architecture Research using FPGA Platforms, 12th International Symposium on High-Performance Computer Architecture (HPCA-12), February 2006.Google Scholar
- D. A. Penry, D. Fay, D. Hodgdon, R. Wells, G. Schelle, D. I. August, and D. A. Connors, "Exploiting parallelism and structure to accelerate the simulation of chip multi-processors," in Proceedings of the 12th International Conference on High-Performance Computer Architecture, February 2006.Google Scholar
- J. Hong, E. Nurvitadhi, and S.-L. L. Lu, "Design, implementation, and verification of active cache emulator (ace)," in FPGA '06: Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field programmable gate arrays, (New York, NY, USA), pp. 63--72, ACM Press, 2006. Google ScholarDigital Library
- F. J. Mesa-Martinez et al., "SCOORE: Santa Cruz out-of-order RISC engine, FPGA design issues," in Workshop on Architectural Research Prototyping (WARP), held in conjunction with ISCA-33, 2006.Google Scholar
- L. Hammond, B. D. Carlstrom, V. Wong, B. Hertzberg, M. Chen, C. Kozyrakis, and K. Olukotun, "Programming with transactional coherence and consistency (TCC)," in ASPLOS-XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, (New York, NY, USA), pp. 1--13, ACM Press, October 2004. Google ScholarDigital Library
- R. A. Hankins, G. N. Chinya, J. D. Collins, P. H. Wang, R. Rakvic, H. Wang, and J. P. Shen, "Multiple instruction stream processor," in ISCA '06: Proceedings of the 33rd International Symposium on Computer Architecture, (Washington, DC, USA), pp. 114--127, IEEE Computer Society, 2006. Google ScholarDigital Library
- H. Chafi, C. Cao Minh, A. McDonald, B. D. Carlstrom, J. Chung, L. Hammond, C. Kozyrakis, and K. Olukotun, "TAPE: A Transactional Application Profiling Environment," in ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 199--208, June 2005. Google ScholarDigital Library
- A. S. G. Gibeling and K. Asanovic, "The RAMP architecture & description language," tech. rep., 2005.Google Scholar
- J. D. Gilbert, S. H. Hunt, D. Gunadi, and G. Srinivasa, "TULSA, A Dual P4 Core Large Shared Cache Intel Xeon Processor for the MP Server Market Segment, Intel," in Conference Record of Hot Chips 18, 2006.Google Scholar
Index Terms
- A practical FPGA-based framework for novel CMP research
Recommendations
Tradeoffs in transactional memory virtualization
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systemsFor transactional memory (TM) to achieve widespread acceptance, transactions should not be limited to the physical resources of any specific hardware implementation. TM systems should guarantee correct execution even when transactions exceed scheduling ...
Tradeoffs in transactional memory virtualization
Proceedings of the 2006 ASPLOS ConferenceFor transactional memory (TM) to achieve widespread acceptance, transactions should not be limited to the physical resources of any specific hardware implementation. TM systems should guarantee correct execution even when transactions exceed scheduling ...
Enhanced global congestion awareness (EGCA) for load balance in networks-on-chip
As the core count increases in a single chip, traditionally centralized communication architecture has not met the communication demand in new situations, such as system-on-chip (SoC) and chip multi-processor (CMP). Networks-on-chip (NoC), which emerges ...
Comments