Accelerating multi-media processing by implementing memoing in multiplication and division units

Authors:
Daniel Citron

Department of Computer Science, The Hebrew University of Jerusalem, 91904 Jerusalem, Israel

Department of Computer Science, The Hebrew University of Jerusalem, 91904 Jerusalem, Israel
View Profile

,
Dror Feitelson

Department of Computer Science, The Hebrew University of Jerusalem, 91904 Jerusalem, Israel

Department of Computer Science, The Hebrew University of Jerusalem, 91904 Jerusalem, Israel
View Profile

,
Larry Rudolph

Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA

Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 33 Issue 11Nov. 1998pp 252–261https://doi.org/10.1145/291006.291056

Published:01 October 1998Publication History

ACM SIGPLAN Notices

Abstract

This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root …) computations in a single cycle. The technique is based on the notion of memoing: saving the input and output of previous calculations and using the output if the input is encountered again. This technique is especially suitable for Multi-Media (MM) processing. In MM applications the local entropy of the data tends to be low which results in repeated operations on the same datum.The inputs and outputs of assembly level operations are stored in cache-like lookup tables and accessed in parallel to the conventional computation. A successful lookup gives the result of a multi-cycle computation in a single cycle, and a failed lookup doesn't necessitate a penalty in computation time.Results of simulations have shown that on the average, for a modestly sized memo-table, about 40% of the floating point multiplications and 50% of the floating point divisions, in Multi-Media applications, can be avoided by using the values within the memo-table, leading to an average computational speedup of more than 20%.

References

1 Hennessy J. L. and Patterson D. A., "Computer Architecture: A Quantitative Approach," Morgan Kaufmann Publishers, San Mateo CA, 1990. Google ScholarDigital Library
2 http:///www.intel.com/design/Google Scholar
3 http:~//www.digital.com/infoGoogle Scholar
4 http://www.sgi.com/MIPS/products/rl0kGoogle Scholar
5 http://www.mot.com/SPS/PowerPC/productsGoogle Scholar
6 http://www.sun.com/microelectronics/datasheetsGoogle Scholar
7 http:/l/www.hp.com/wsg/strategiesGoogle Scholar
8 Michie D., "Memo Functions and Machine Learning," Nature 218, pp 19-22, 1968.Google ScholarCross Ref
9 L. Sterling and E. Shapiro, "The Art of Prolog, ~nd Ed.", MIT Press Cambridge MA, 1992.Google Scholar
10 Abelson, H. and Sussman, G.J. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, Mass. 1985. Google ScholarDigital Library
11 R. Milner, M. Tofte, R. Harper, and D. MacQueen, The Definition of Standard ML (Revised).MIT Press, Cambridge, Mass. 1997. Google ScholarDigital Library
12 P. Soderquist and M. Leeser, "An area/performance comparison of subtractive and multiplicative divide/square root implementations," Proc. 12th IEEE Syrup. Computer Arithmetic, pp. 132-139, July 1995. Google ScholarDigital Library
13 Atkins, D.E. "Higher-radix division using estimates of the divisor and partial reminders," IEEE Trans. on Computers C-17:10, 925-934,1968.Google Scholar
14 S. Richardson, "Exploiting Trivial and Redundant Computation", Proc. of the 11th Syrup. on Computer Arithmetic, pp. 220-227, July 1993.Google Scholar
15 S. Oberman, M. Flynn, "Reducing Division Latency with Reciprocal Caches", Reliable Computing, Vol 2, no. 2, pages 147-153, April 1996.Google ScholarCross Ref
16 Price W.J. , "A Benchmark Tutorial," IEEE Micro, pp. 28-43, October 1989. Google ScholarDigital Library
17 http://www.netlib.org/benchwebGoogle Scholar
18 A. Sodani, G. Sohi, "Dynamic Instruction Reuse", Proc. of the ~~th Int. Syrup. on Computer Architecture, June 1997. Google ScholarDigital Library
19 Cmelik R. and Keppel D., Shade: A Fast instruction- Set Simulator for Execution Profiling, Sun Microsystems Laboratories. Google ScholarDigital Library
20 D. Argiro and C. Gage, "Khoros User's Manual," U. of New Mexico, 1991.Google Scholar
21 M. Franklin and G.Sohi, "Register Traffic Analysis for Streamlining Inter-Operation Communication in Fine- Grain Parallel Processors," Proc. of Micro 25, pp 236- 245, 1992. Google ScholarDigital Library
22 A. K. Jalin, "Fundamentals of Digital Image Processing," Prentice Hall, Englewood Cliffs N J, 1989. Google ScholarDigital Library
23 T. Yeh and Y. Patt, "A Comparison of Dynamic Branch Predictors that Use Two Levels of Branch History," Proc. of the 20th Int. Syrup. on Computer Architecture, pp 191-201, 1993. Google ScholarDigital Library
24 N. Jouppi, "Cache Write Policies and Performances," Proc. of the 20th int. Symp. on Computer Architecture, pp 191-201, 1993. Google ScholarDigital Library
25 J. Chen, A. Borg, N. Jouppi, "A Simulation Based Study of TLB Performance," Proc. of the 18th int. Syrup. on Computer Architecture, pp 114-123, 1991. Google ScholarDigital Library

Index Terms

Accelerating multi-media processing by implementing memoing in multiplication and division units

Recommendations

Accelerating multi-media processing by implementing memoing in multiplication and division units

This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root …) computations in a single cycle. The technique is based on the notion of memoing: saving the input and output of previous calculations ...
Read More
Accelerating multi-media processing by implementing memoing in multiplication and division units
ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems

This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root …) computations in a single cycle. The technique is based on the notion of memoing: saving the input and output of previous calculations ...
Read More
A Hardware Algorithm for Modular Multiplication/Division

A mixed radix-4/2 algorithm for modular multiplication/division suitable for VLSI implementation is proposed. The algorithm is based on Montgomery method for modular multiplication and on the extended Binary GCD algorithm for modular division. Both ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 33, Issue 11
Nov. 1998
309 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/291006
Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwel
Massachusetts Institute of Technology, Cambridge
Issue’s Table of Contents
ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
October 1998
326 pages
ISBN:1581131070
DOI:10.1145/291069
Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwal
Massachusetts Institute of Technology, Cambridge
Copyright © 1998 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 1998
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 708
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accelerating multi-media processing by implementing memoing in multiplication and division units

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Accelerating multi-media processing by implementing memoing in multiplication and division units

Accelerating multi-media processing by implementing memoing in multiplication and division units

A Hardware Algorithm for Modular Multiplication/Division