skip to main content
article
Free Access

Accelerating multi-media processing by implementing memoing in multiplication and division units

Authors Info & Claims
Published:01 October 1998Publication History
Skip Abstract Section

Abstract

This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root …) computations in a single cycle. The technique is based on the notion of memoing: saving the input and output of previous calculations and using the output if the input is encountered again. This technique is especially suitable for Multi-Media (MM) processing. In MM applications the local entropy of the data tends to be low which results in repeated operations on the same datum.The inputs and outputs of assembly level operations are stored in cache-like lookup tables and accessed in parallel to the conventional computation. A successful lookup gives the result of a multi-cycle computation in a single cycle, and a failed lookup doesn't necessitate a penalty in computation time.Results of simulations have shown that on the average, for a modestly sized memo-table, about 40% of the floating point multiplications and 50% of the floating point divisions, in Multi-Media applications, can be avoided by using the values within the memo-table, leading to an average computational speedup of more than 20%.

References

  1. 1 Hennessy J. L. and Patterson D. A., "Computer Architecture: A Quantitative Approach," Morgan Kaufmann Publishers, San Mateo CA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 http:///www.intel.com/design/Google ScholarGoogle Scholar
  3. 3 http:~//www.digital.com/infoGoogle ScholarGoogle Scholar
  4. 4 http://www.sgi.com/MIPS/products/rl0kGoogle ScholarGoogle Scholar
  5. 5 http://www.mot.com/SPS/PowerPC/productsGoogle ScholarGoogle Scholar
  6. 6 http://www.sun.com/microelectronics/datasheetsGoogle ScholarGoogle Scholar
  7. 7 http:/l/www.hp.com/wsg/strategiesGoogle ScholarGoogle Scholar
  8. 8 Michie D., "Memo Functions and Machine Learning," Nature 218, pp 19-22, 1968.Google ScholarGoogle ScholarCross RefCross Ref
  9. 9 L. Sterling and E. Shapiro, "The Art of Prolog, ~nd Ed.", MIT Press Cambridge MA, 1992.Google ScholarGoogle Scholar
  10. 10 Abelson, H. and Sussman, G.J. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, Mass. 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11 R. Milner, M. Tofte, R. Harper, and D. MacQueen, The Definition of Standard ML (Revised).MIT Press, Cambridge, Mass. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 P. Soderquist and M. Leeser, "An area/performance comparison of subtractive and multiplicative divide/square root implementations," Proc. 12th IEEE Syrup. Computer Arithmetic, pp. 132-139, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 Atkins, D.E. "Higher-radix division using estimates of the divisor and partial reminders," IEEE Trans. on Computers C-17:10, 925-934,1968.Google ScholarGoogle Scholar
  14. 14 S. Richardson, "Exploiting Trivial and Redundant Computation", Proc. of the 11th Syrup. on Computer Arithmetic, pp. 220-227, July 1993.Google ScholarGoogle Scholar
  15. 15 S. Oberman, M. Flynn, "Reducing Division Latency with Reciprocal Caches", Reliable Computing, Vol 2, no. 2, pages 147-153, April 1996.Google ScholarGoogle ScholarCross RefCross Ref
  16. 16 Price W.J. , "A Benchmark Tutorial," IEEE Micro, pp. 28-43, October 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 http://www.netlib.org/benchwebGoogle ScholarGoogle Scholar
  18. 18 A. Sodani, G. Sohi, "Dynamic Instruction Reuse", Proc. of the ~~th Int. Syrup. on Computer Architecture, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19 Cmelik R. and Keppel D., Shade: A Fast instruction- Set Simulator for Execution Profiling, Sun Microsystems Laboratories. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 D. Argiro and C. Gage, "Khoros User's Manual," U. of New Mexico, 1991.Google ScholarGoogle Scholar
  21. 21 M. Franklin and G.Sohi, "Register Traffic Analysis for Streamlining Inter-Operation Communication in Fine- Grain Parallel Processors," Proc. of Micro 25, pp 236- 245, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 A. K. Jalin, "Fundamentals of Digital Image Processing," Prentice Hall, Englewood Cliffs N J, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23 T. Yeh and Y. Patt, "A Comparison of Dynamic Branch Predictors that Use Two Levels of Branch History," Proc. of the 20th Int. Syrup. on Computer Architecture, pp 191-201, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24 N. Jouppi, "Cache Write Policies and Performances," Proc. of the 20th int. Symp. on Computer Architecture, pp 191-201, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25 J. Chen, A. Borg, N. Jouppi, "A Simulation Based Study of TLB Performance," Proc. of the 18th int. Syrup. on Computer Architecture, pp 114-123, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Accelerating multi-media processing by implementing memoing in multiplication and division units

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 33, Issue 11
          Nov. 1998
          309 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/291006
          Issue’s Table of Contents
          • cover image ACM Conferences
            ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
            October 1998
            326 pages
            ISBN:1581131070
            DOI:10.1145/291069

          Copyright © 1998 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 October 1998

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader