skip to main content
10.1145/1216919.1216936acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
Article

A practical FPGA-based framework for novel CMP research

Published:18 February 2007Publication History

ABSTRACT

Chip-multiprocessors are quickly gaining momentum in all segments of computing. However, the practical success of CMPs strongly depends on addressing the difficulty of multithreaded application development. To address this challenge, it is necessary to co-develop new CMP architecture with novel programming models. Currently, architecture research relies on software simulators which are too slow to facilitate interesting experiments with CMP software without using small datasets or significantly reducing the level of detail in the simulated models. An alternative to simulation is to exploit the rich capabilities of modern FPGAs to create FPGA-based platforms for novel CMP research. This paper presents ATLAS, the first prototype for CMPs with hardware support for Transactional Memory (TM), a technology aiming to simplify parallel programming. ATLAS uses the BEE2 multi-FPGA board to provide a system with 8 PowerPC cores that run at 100MHz and runs Linux. ATLAS provides significant benefits for CMP research such as 100x performance improvement over a software simulator and good visibility that helps with software tuning and architectural improvements. In addition to presenting and evaluating ATLAS, we share our observations about building a FPGA-based framework for CMP research. Specifically, we address issues such as overall performance, challenges of mapping ASIC-style CMP RTL on to FPGAs, software support, the selection criteria for the base processor, and the challenges of using pre-designed IP libraries.

References

  1. H. Sutter, "The free lunch is over: A fundamental turn toward concurrency in software," Dr. Dobb's Journal, vol. 30, March 2005.Google ScholarGoogle Scholar
  2. B. Lewis and D. J. Berg, Multithreaded Programming with Pthreads. Prentice Hall, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Herlihy and J. E. B. Moss, "Transactional memory: Architectural support for lock-free data structures," in Proceedings of the 20th International Symposium on Computer Architecture, pp. 289--300, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun, "Transactional memory coherence and consistency," in Proceedings of the 31st International Symposium on Computer Architecture, pp. 102--113, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie, "Unbounded Transactional Memory," in Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA'05), (San Franscisco, California), pp. 316--327, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Rajwar, M. Herlihy, and K. Lai, "Virtualizing Transactional Memory," in ISCA '05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, (Washington, DC, USA), pp. 494--505, IEEE Computer Society, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood, "LogTM: Log-Based Transactional Memory," in 12th International Conference on High-Performance Computer Architecture, February 2006.Google ScholarGoogle Scholar
  8. N. Shavit and D. Touitou, "Software transactional memory," in Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing, (Ottawa, Canada), pp. 204--213, August 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Herlihy, V. Luchangco, M. Moir, and I. William N. Scherer, "Software transactional memory for dynamic-sized data structures," in PODC '03: Proceedings of the twenty-second annual symposium on Principles of distributed computing, (New York, NY, USA), pp. 92--101, ACM Press, July 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Harris and K. Fraser, "Language support for lightweight transactions," in OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pp. 388--402, ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Welc, S. Jagannathan, and A. L. Hosking, "Transactional monitors for concurrent objects," in Proceedings of the European Conference on Object-Oriented Programming (M. Odersky, ed.), vol. 3086 of Lecture Notes in Computer Science, pp. 519--542, Springer-Verlag, 2004.Google ScholarGoogle Scholar
  12. B. Saha, A.-R. Adl-Tabatabai, R. L. Hudson, C. Cao Minh, and B. Hertzberg, "A high performance software transactional memory system for a multi-core runtime," in PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, (New York, NY, USA), ACM Press, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. F. Ringenburg and D. Grossman, "Atomcaml: first-class atomicity via rollback," in ICFP '05: Proceedings of the tenth ACM SIGPLAN international conference on Functional programming, (New York, NY, USA), pp. 92--104, ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. J. Marathe, W. N. Scherer III, and M. L. Scott, "Adaptive Software Transactional Memory," in 19th International Symposium on Distributed Computing, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Arvind, K. Asanovic, D. Chiou, J. C. Hoe, C. Kozyrakis, S.-L. Lu, M. Oskin, D. Patterson, J. Rabaey, and J. Wawrzynek, "RAMP: Research accelerator for multiple processors - a community vision for a shared experimental parallel HW/SW platform," tech. rep., 2005.Google ScholarGoogle Scholar
  16. C. Chang, J. Wawrzynek, and R. W. Brodersen, "BEE2: A high-end reconfigurable computing system," IEEE Design and Test of Computers, vol. 22, pp. 114--125, Mar/Apr 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Chung, H. Chafi, C. Cao Minh, A. McDonald, B. D. Carlstrom, C. Kozyrakis, and K. Olukotun, "The Common Case Transactional Behavior of Multithreaded Programs," in Proceedings of the 12th International Conference on High-Performance Computer Architecture, February 2006.Google ScholarGoogle Scholar
  18. J. Chung, C. Cao Minh, A. McDonald, H. Chafi, B. D. Carlstrom, T. Skare, C. Kozyrakis, and K. Olukotun, "Tradeoffs in transactional memory virtualization," in ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, ACM Press, Oct 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. McDonald, J. Chung, H. Chafi, C. Cao Minh, B. D. Carlstrom, L. Hammond, C. Kozyrakis, and K. Olukotun, "Characterization of TCC on Chip-Multiprocessors," in PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, (Washington, DC, USA), pp. 63--74, IEEE Computer Society, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Oner, L. A. Barroso, S. Iman, J. Jeong, K. Ramamurthy, and M. Dubois, "The design of RPM: an FPGA-based multiprocessor emulator," in FPGA '95: Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays, pp. 60--66, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. D. Davis, S. E. Richardson, C. Charitsis, and K. Olukotun, "A chip prototyping substrate: the flexible architecture for simulation and testing (fast)," vol. 33, pp. 34--43, New York, NY, USA: ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Chiou, H. Sunjeliwala, D. Sunwoo, J. Xu, and N. Patil, "Fpga-based fast, cycle-accurate, full-system simulators," in 2nd Workshop on Architecture Research using FPGA Platforms, 12th International Symposium on High-Performance Computer Architecture (HPCA-12), February 2006.Google ScholarGoogle Scholar
  23. N. Dave, M. Pellauer, Arvind, and J. Emer, "Implementing a functional/timing partitioned microprocessor simulator with an fpga," in 2nd Workshop on Architecture Research using FPGA Platforms, 12th International Symposium on High-Performance Computer Architecture (HPCA-12), February 2006.Google ScholarGoogle Scholar
  24. D. A. Penry, D. Fay, D. Hodgdon, R. Wells, G. Schelle, D. I. August, and D. A. Connors, "Exploiting parallelism and structure to accelerate the simulation of chip multi-processors," in Proceedings of the 12th International Conference on High-Performance Computer Architecture, February 2006.Google ScholarGoogle Scholar
  25. J. Hong, E. Nurvitadhi, and S.-L. L. Lu, "Design, implementation, and verification of active cache emulator (ace)," in FPGA '06: Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field programmable gate arrays, (New York, NY, USA), pp. 63--72, ACM Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. J. Mesa-Martinez et al., "SCOORE: Santa Cruz out-of-order RISC engine, FPGA design issues," in Workshop on Architectural Research Prototyping (WARP), held in conjunction with ISCA-33, 2006.Google ScholarGoogle Scholar
  27. L. Hammond, B. D. Carlstrom, V. Wong, B. Hertzberg, M. Chen, C. Kozyrakis, and K. Olukotun, "Programming with transactional coherence and consistency (TCC)," in ASPLOS-XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, (New York, NY, USA), pp. 1--13, ACM Press, October 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. A. Hankins, G. N. Chinya, J. D. Collins, P. H. Wang, R. Rakvic, H. Wang, and J. P. Shen, "Multiple instruction stream processor," in ISCA '06: Proceedings of the 33rd International Symposium on Computer Architecture, (Washington, DC, USA), pp. 114--127, IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Chafi, C. Cao Minh, A. McDonald, B. D. Carlstrom, J. Chung, L. Hammond, C. Kozyrakis, and K. Olukotun, "TAPE: A Transactional Application Profiling Environment," in ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 199--208, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. S. G. Gibeling and K. Asanovic, "The RAMP architecture & description language," tech. rep., 2005.Google ScholarGoogle Scholar
  31. J. D. Gilbert, S. H. Hunt, D. Gunadi, and G. Srinivasa, "TULSA, A Dual P4 Core Large Shared Cache Intel Xeon Processor for the MP Server Market Segment, Intel," in Conference Record of Hot Chips 18, 2006.Google ScholarGoogle Scholar

Index Terms

  1. A practical FPGA-based framework for novel CMP research

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      FPGA '07: Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
      February 2007
      248 pages
      ISBN:9781595936004
      DOI:10.1145/1216919

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 February 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate125of627submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader