skip to main content
research-article
Free Access

Convolution engine: balancing efficiency and flexibility in specialized computing

Published:23 March 2015Publication History
Skip Abstract Section

Abstract

General-purpose processors, while tremendously versatile, pay a huge cost for their flexibility by wasting over 99% of the energy in programmability overheads. We observe that reducing this waste requires tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the algorithms. Hence, by backing off from full programmability and instead targeting key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications within that domain.

We present the Convolution Engine (CE)---a programmable processor specialized for the convolution-like data-flow prevalent in computational photography, computer vision, and video processing. The CE achieves energy efficiency by capturing data-reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We demonstrate that the CE is within a factor of 2--3× of the energy and area efficiency of custom units optimized for a single kernel. The CE improves energy and area efficiency by 8--15× over data-parallel Single Instruction Multiple Data (SIMD) engines for most image processing applications.<!-- END_PAGE_1 -->

References

  1. Bakhoda, A., Yuan, G., Fung, W.W.L., Wong, H., Aamodt, T.M. Analyzing CUDA workloads using a detailed GPU simulator. In ISPASS: IEEE International Symposium on Performance Analysis of Systems and Software (2009).Google ScholarGoogle ScholarCross RefCross Ref
  2. Balfour, J., Dally, W., Black-Schaffer, D., Parikh, V., Park, J. An energy-efficient processor architecture for embedded systems. Comput. Architect. Lett. 7, 1 (2007), 29--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bayer, B. Color Imaging Array. US Patent Application No. 3971065 (1976).Google ScholarGoogle Scholar
  4. Chen, T.-C., Chien, S.-Y., Huang, Y.-W., Tsai, C.-H., Chen, C.-Y., Chen, T.-W., Chen, L.-G. Analysis and architecture design of an HDTV720p 30 frames/sec H.264/AVC encoder. IEEE Trans. Circuits Syst. Video Technol. 16, 6 (2006), 673--688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Corbal, J., Valero, M., Espasa, R. Exploiting a new level of DLP in multimedia applications. In Proceedings of the 32nd Annual International Symposium on Microarchitecture (Nov. 1999), 72--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gonzalez, R. Xtensa: A configurable and extensible processor. Micro IEEE 20, 2 (Mar. 2000), 60--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B.C., Richardson, S., Kozyrakis, C., Horowitz, M. Understanding sources of inefficiency in general-purpose chips. In ISCA '10: Proceedings of the 37th Annual International Symposium on Computer Architecture (2010), ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hamilton, J.F., Adams, J.E. Adaptive Color Plane Interpolation in Single Sensor Color Electronic Camera. US Patent Application No. 5629734 (1997).Google ScholarGoogle Scholar
  9. Leng, J., Gilani, S., Hetherington, T., Tantawy, A.E., Kim, N.S., Aamodt, T.M., Reddi, V.J. GPUWattch: Enabling energy optimizations in GPGPUs. In ISCA 2013: International Symposium on Computer Architecture (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lowe, D. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2 (2004), 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. NVIDIA Inc. Tegra mobile processors. http://www.nvidia.com/object/tegra-4-processor.html.Google ScholarGoogle Scholar
  12. Shacham, O., Azizi, O., Wachs, M., Qadeer, W., Asgar, Z., Kelley, K., Stevenson, J., Solomatnikov A., Firoozshahian, A., Lee, B., Richardson, S., Horowitz, M. Rethinking digital design: Why design must change. IEEE Micro 30, 6 (Nov. 2010), 9--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Stratton, J.A., Rodrigues, C., Sung, I.-J., Obeid, N., Chang, L.W., Anssari, N., Liu, G.D., Hwu, W.-M.W. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. IMPACT Technical Report. In IMPACT-12-01, 2012.Google ScholarGoogle Scholar
  14. Tensilica Inc. Tensilica Instruction Extension (TIE) Language Reference Manual.Google ScholarGoogle Scholar
  15. Texas Instruments Inc. OMAP 5 platform. www.ti.com/omap.Google ScholarGoogle Scholar
  16. Venkatesh, G., Sampson, J., Goulding, N., Garcia, S., Bryksin, V., Lugo-Martinez, J., Swanson, S., Taylor, M.B. Conservation cores: Reducing the energy of mature computations. In ASPLOS'10 (2010), ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Convolution engine: balancing efficiency and flexibility in specialized computing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 58, Issue 4
        April 2015
        86 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/2749359
        • Editor:
        • Moshe Y. Vardi
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 March 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDFChinese translation

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format