skip to main content
10.1145/3061639.3072944acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Accelerator Design for Deep Learning Training: Extended Abstract: Invited

Published:18 June 2017Publication History

ABSTRACT

Deep Neural Networks (DNNs) have emerged as a powerful and versatile set of techniques showing successes on challenging artificial intelligence (AI) problems. Applications in domains such as image/video processing, autonomous cars, natural language processing, speech synthesis and recognition, genomics and many others have embraced deep learning as the foundation. DNNs achieve superior accuracy for these applications with high computational complexity using very large models which require 100s of MBs of data storage, exaops of computation and high bandwidth for data movement. In spite of these impressive advances, it still takes days to weeks to train state of the art Deep Networks on large datasets - which directly limits the pace of innovation and adoption. In this paper, we present a multi-pronged approach to address the challenges in meeting both the throughput and the energy efficiency goals for DNN training.

References

  1. Gupta.S., Agrawal.A., Gopalakrishnan.K., Narayanan.P., "Deep Learning with Limited Numerical Precision," ICML, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal A., Choi J., Gopalakrishnan K., Gupta S., Nair R., Oh J., Prener D., Shukla S., Srinivasan V., Sura Z., "Approximate computing: Challenges and opportunities", IEEE International Conference on Rebooting Computing (ICRC) 2016.Google ScholarGoogle ScholarCross RefCross Ref
  3. Gupta S., Zhang W., Wang F., "Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study", IEEE International Conference on Data Mining (ICDM) 2016.Google ScholarGoogle Scholar
  4. Venkataramani S., Choi J., Srinivasan V., Gopalakrishnan K., Chang L., "DeepMatrix: A Systematic Framework to Analyze Deep Neural Network Performance on Shared Memory Accelerator Systems", IEEE/ACM Parallel Architectures and Compiler Techniques (PACT) 2017 (under review)Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017
    June 2017
    533 pages
    ISBN:9781450349277
    DOI:10.1145/3061639

    Copyright © 2017 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 18 June 2017

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,770of5,499submissions,32%

    Upcoming Conference

    DAC '24
    61st ACM/IEEE Design Automation Conference
    June 23 - 27, 2024
    San Francisco , CA , USA

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader