skip to main content
10.1145/3350755.3400252acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
extended-abstract
Public Access

A Computational Model for Tensor Core Units

Published:09 July 2020Publication History

ABSTRACT

To respond to the need for efficient training and inference of deep neural networks, a plethora of domain-specific architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature of these architectures is the design for efficiently computing a dense matrix product of a given small size. In order to broaden the class of algorithms that exploit these systems, we propose a computational model, named the TCU model, that captures the ability to natively multiply small matrices. We then use the TCU model for designing fast algorithms for several problems, including dense and sparse matrix multiplication and the Discrete Fourier Transform. We finally highlight a relation between the TCU model and the external memory model.

References

  1. G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Graph expansion and communication costs of fast matrix multiplication. J. ACM, 59(6):32:1--32:23, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. A. Chowdhury, F. Silvestri, and F. Vella. A computational model for tensor core units, 2020. Arxiv 1908.06649.Google ScholarGoogle Scholar
  3. A. Dakkak, C. Li, J. Xiong, I. Gelado, and W.-M. Hwu. Accelerating reduction and scan using tensor core units. In Proc. Int. Conf. on Supercomputing (ICS), pages 46--57, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Jacob and M. Stöckel. Fast output-sensitive matrix multiplication. In Proc. European Symposium on Algorithms (ESA), pages 766--778, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  5. N. P. Jouppi et al. In-datacenter performance analysis of a tensor processing unit. In Proc. 44th Int. Symposium on Computer Architecture (ISCA), pages 1--12, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nvidia Tesla V100 GPU architecture. http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf.Google ScholarGoogle Scholar
  7. R. Raz. On the complexity of matrix product. SIAM Journal on Computing, 32(5):1356--1369, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Sorna, X. Cheng, E. D'Azevedo, K. Won, and S. Tomov. Optimizing the fast fourier transform using mixed precision on tensor core hardware. In Proc. 25th Int. Conf. on High Performance Computing Workshops (HiPCW), pages 3--7, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. S. Vitter. Algorithms and data structures for external memory. Foundations and Trends in Theoretical Computer Science, 2(4):305--474, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Computational Model for Tensor Core Units

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SPAA '20: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures
        July 2020
        601 pages
        ISBN:9781450369350
        DOI:10.1145/3350755

        Copyright © 2020 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 July 2020

        Check for updates

        Qualifiers

        • extended-abstract

        Acceptance Rates

        Overall Acceptance Rate447of1,461submissions,31%

        Upcoming Conference

        SPAA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader