skip to main content
article

Atomic Vector Operations on Chip Multiprocessors

Published:01 June 2008Publication History
Skip Abstract Section

Abstract

The current trend is for processors to deliver dramatic improvements in parallel performance while only modestly improving serial performance. Parallel performance is harvested through vector/SIMD instructions as well as multithreading (through both multithreaded cores and chip multiprocessors). Vector parallelism can be more efficiently supported than multithreading, but is often harder for software to exploit. In particular, code with sparse data access patterns cannot easily utilize the vector/SIMD instructions of mainstream processors. Hardware to scatter and gather sparse data has previously been proposed to enable vector execution for these codes. However, on multithreaded architectures, a number of applications spend significant time on atomic operations (e.g., parallel reductions), which cannot be vectorized using previously proposed schemes. This paper proposes architectural support for atomic vector operations (referred to as GLSC) that addresses this limitation. GLSC extends scatter-gather hardware to support atomic memory operations. Our experiments show that the GLSC provides an average performance improvement on a set of important RMS kernels of 54% for 4-wide SIMD.

References

  1. AMD Opteron Processor Family. http://www.amd.com/.Google ScholarGoogle Scholar
  2. CRAY-2 Engineering Maintenance Manual. Cray Research Inc., Publication No. HM-2032, 1985.Google ScholarGoogle Scholar
  3. IBM Corporation. System/370 Principles of Operation. IBM Corporation, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Intel Pentium/Core/Core 2 Processors. http://www.intel.com/.Google ScholarGoogle Scholar
  5. NVIDIA CUDA (Compute Unified Device Architecture). http://www.nvidia.com/, 2007.Google ScholarGoogle Scholar
  6. PowerPC User Instruction Set Architecture (Book I). 2003.Google ScholarGoogle Scholar
  7. D. Abts, A. Bataineh, S. Scott, G. Faanes, J. Schwarzmeier, E. Lundberg, M. Bye, and G. Schwoerer. The cray black-widow: A highly scalable vector multiprocessor. In Supercomputing , 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Ahn, M. Erez, and W. J. Dally. Scatter-add in data parallel architectures. In HPCA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. Princeton University Technical Report TR-811-08, 2008.Google ScholarGoogle Scholar
  10. S. Chatterjee, G. E. Blelloch, and M. Zagha. Scan primitives for vector computers. In Supercomputing, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Dubey. Recognition, Mining and Synthesis Moves Computers to the Era of Tera. Technology@Intel Magazine, February 2005.Google ScholarGoogle Scholar
  12. C. Ericson. Real-time Collision Detection. Morgan-Kauffman, San Francisco, CA, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Fang, L. Zhang, J. B. Carter, A. Ibrahim, and M. A. Parker. Active memory operations. In ICS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. Journal of the ACM, 35(4):921- 940, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Gottlieb, B. D. Lubachevsky, and L. Rudolph. Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors. ACM TOPLAS, 5(2):164-189, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Gschwind. Chip multiprocessing and the Cell broadband engine. In ACM Computing Frontier, pages 1-8, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Gummaraju, M. Erez, J. Coburn, M. Rosenblum, and W. J. Dally. Architectural support for the stream execution model on general-purpose processors. In PACT, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA, pages 289-300, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Kane and J. Heirich. MIPS RISC Architecture: reference for the R2000, R3000, R6000 and the new R4000 instruction set computer architecture. Prentice-Hall, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. R. Larus and R. Rajwar. Transactional Memory. Morgan and Claypool, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  21. J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In ISCA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. E. Lorensen and H. E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. ACM Computer Graphics, 21(4):163-169, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Rattner. Cool Codes for Hot Chips: A Quantitative Basis for Multi-Core Design. HotChips Keynote, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  24. O. Schenk. Scalable Parallel Sparse LU Factorization Methods on Shared Memory Multiprocessors. PhD thesis, ETH Zurich, Zurich, Switzerland, 2005.Google ScholarGoogle Scholar
  25. S. L. Scott. Synchronization and communication in the T3E multiprocessor. In ASPLOS, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Smith. Open dynamics engine v0.5 user guide. http://www.ode.org/ode-latest-userguide.html, 2006.Google ScholarGoogle Scholar
  27. J. Z. Wang. Integrated Region-Based Image Retrieval. Kluwer Academic Publishers, Boston, MA, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Atomic Vector Operations on Chip Multiprocessors

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGARCH Computer Architecture News
              ACM SIGARCH Computer Architecture News  Volume 36, Issue 3
              June 2008
              449 pages
              ISSN:0163-5964
              DOI:10.1145/1394608
              Issue’s Table of Contents
              • cover image ACM Conferences
                ISCA '08: Proceedings of the 35th Annual International Symposium on Computer Architecture
                June 2008
                449 pages
                ISBN:9780769531748

              Copyright © 2008 Authors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 June 2008

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader