research-article

Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators

Authors:
Salim Ullah

Technische Universität Dresden, Germany

Technische Universität Dresden, Germany
View Profile

,
Semeen Rehman

Vienna University of Technology, Austria

Vienna University of Technology, Austria
View Profile

,
Bharath Srinivas Prabakaran

Vienna University of Technology, Austria

Vienna University of Technology, Austria
View Profile

,
Florian Kriebel

Vienna University of Technology, Austria

Vienna University of Technology, Austria
View Profile

,
Muhammad Abdullah Hanif

Vienna University of Technology, Austria

Vienna University of Technology, Austria
View Profile

,
Muhammad Shafique

Vienna University of Technology, Austria

Vienna University of Technology, Austria
View Profile

,
Akash Kumar

Technische Universität Dresden, Germany

Technische Universität Dresden, Germany
View Profile

DAC '18: Proceedings of the 55th Annual Design Automation ConferenceJune 2018Article No.: 159Pages 1–6https://doi.org/10.1145/3195970.3195996

Published:24 June 2018Publication History

DAC '18: Proceedings of the 55th Annual Design Automation Conference

Pages 1–6

ABSTRACT

The architectural differences between ASICs and FPGAs limit the effective performance gains achievable by the application of ASIC-based approximation principles for FPGA-based reconfigurable computing systems. This paper presents a novel approximate multiplier architecture customized towards the FPGA-based fabrics, an efficient design methodology, and an open-source library. Our designs provide higher area, latency and energy gains along with better output accuracy than those offered by the state-of-the-art ASIC-based approximate multipliers. Moreover, compared to the multiplier IP offered by the Xilinx Vivado, our proposed design achieves up to 30%, 53%, and 67% gains in terms of area, latency, and energy, respectively, while incurring an insignificant accuracy loss (on average, below 1% average relative error). Our library of approximate multipliers is open-source and available online at https://cfaed.tudresden.de/pd-downloads to fuel further research and development in this area, and thereby enabling a new research direction for the FPGA community.

References

K. Bhardwaj et al. 2014. Power-and area-efficient Approximate Wallace Tree Multiplier for error-resilient systems. In ISQED. IEEE.Google Scholar
N. Brunie et al. 2013. Arithmetic core generation using bit heaps. In FPL.Google Scholar
V. K Chippa et el. 2013. Analysis and characterization of inherent application resilience for approximate computing. In DAC. Google ScholarDigital Library
A. K. Verma et al. 2008. Variable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit Design. In DATE. Google ScholarDigital Library
M. Shafique et al. 2015. A low latency generic accuracy configurable adder. In DAC. Google ScholarDigital Library
P. Kulkarni et al. 2011. Trading Accuracy for Power with an Underdesigned Multiplier Architecture. In Internatioal Conference on VLSI Design. Google ScholarDigital Library
S. Hashemi et al. {n. d.}. Drum: A dynamic range unbiased multiplier for approximate applications. In ICCAD. Google ScholarDigital Library
V. Gupta et al. 2013. Low-Power Digital Signal Processing Using Approximate Adders. IEEE Transactions on CAD of Integrated Circuits and Systems (2013). Google ScholarDigital Library
V. Gupta et al. 2011. IMPACT: imprecise adders for low-power approximate computing. In ISLPED. Google ScholarDigital Library
Intel. 2017. Integer Arithmetic IP Cores User Guide. (2017). https://www.altera.com/en_US/pdfs/literature/ug/ug_lpm_alt_mfug.pdfGoogle Scholar
A. B Kahng et al. 2012. Accuracy-configurable adder for approximate arithmetic designs. In DAC. Google ScholarDigital Library
M. Kumm et al. 2015. An efficient softcore multiplier architecture for Xilinx FPGAs. In ARITH. Google ScholarDigital Library
Ian Kuon and Jonathan Rose. 2007. Measuring the gap between FPGAs and ASICs. IEEE TCAD 26, 2 (2007). Google ScholarDigital Library
Chia-Hao Lin et al. 2013. High accuracy approximate multiplier with error correction. In ICCD.Google Scholar
C. Liu et al. 2014. A low-power, high-performance approximate multiplier with configurable partial error recovery. In DATE. Google ScholarDigital Library
J. Mody et al. 2015. Study of approximate compressors for multiplication using FPGA. In IC-GET.Google Scholar
V. Mrazek et al. 2017. EvoApproxSb: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods. In DATE. Google ScholarDigital Library
H. Parandeh-Afshar et al. 2011. Measuring and reducing the performance gap between embedded and soft multipliers on FPGAs. In FPL. Google ScholarDigital Library
S. Rehman et al. 2016. Architectural-space exploration of approximate multipliers. In ICCAD. Google ScholarDigital Library
Xilinx. 2011. LogiCORE IP Multiplier v11.2. (2011). https://www.xilinx.com/support/documentation/ip_documentation/mult_gen_ds255.pdfGoogle Scholar
Xilinx. 2016. 7 Series FPGAs Configurable Logic Block User Guide. (2016). https://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdfGoogle Scholar

Recommendations

Area-Optimized Low-Latency Approximate Multipliers for FPGA-based Hardware Accelerators
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
The architectural differences between ASICs and FPGAs limit the effective performance gains achievable by the application of ASIC-based approximation principles for FPGA-based reconfigurable computing systems. This paper presents a novel approximate ...
Read More
A low-area yet performant FPGA implementation of Shabal
SAC'10: Proceedings of the 17th international conference on Selected areas in cryptography

In this paper, we present an efficient FPGA implementation of the SHA-3 hash function candidate Shabal [7]. Targeted at the recent Xilinx Virtex-5 FPGA family, our design achieves a relatively high throughput of 2 Gbit/s at a cost of only 153 slices, ...
Read More
On-Chip Reconfigurable Hardware Accelerators for Popcount Computations

Popcount computations are widely used in such areas as combinatorial search, data processing, statistical analysis, and bio- and chemical informatics. In many practical problems the size of initial data is very large and increase in throughput is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAC '18: Proceedings of the 55th Annual Design Automation Conference
June 2018
1089 pages
ISBN:9781450357005
DOI:10.1145/3195970

Copyright © 2018 ACM
© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 673
  Total Downloads
- Downloads (Last 12 months)83
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators

DAC '18: Proceedings of the 55th Annual Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

Area-Optimized Low-Latency Approximate Multipliers for FPGA-based Hardware Accelerators

A low-area yet performant FPGA implementation of Shabal

On-Chip Reconfigurable Hardware Accelerators for Popcount Computations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators

DAC '18: Proceedings of the 55th Annual Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

Area-Optimized Low-Latency Approximate Multipliers for FPGA-based Hardware Accelerators

A low-area yet performant FPGA implementation of Shabal

On-Chip Reconfigurable Hardware Accelerators for Popcount Computations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media