Abstract
This paper addresses the problem of multiplication with large operand sizes (N≥32). We propose a new recursive recoding algorithm that shortens the critical path of the multiplier and reduces the hardware complexity of partial-product-generators as well. The new recoding algorithm provides an optimal space/time partitioning of the multiplier architecture for any size N of the operands. As a result, the critical path is drastically reduced to 33√ N / 2 -- 3 with no area overhead in comparison to modified Booth algorithm that shows a critical path of N/2 in adder stages. For instance, only 7 adder stages are needed for a 64-bit two's complement multiplier. Confronted to reference algorithms for N=64, important gain ratios of 1.62, 1.71, 2.64 are obtained in terms of multiply-time, energy consumption per multiplyoperation, and total gate count, respectively.
- Reports on System Drivers of the International Technology Roadmap for Semiconductors (ITRS), 2009 and 2010. Available: www.itrs.net/reports.htmlGoogle Scholar
- H. Sam, and A. Gupta, "A Generalized Multibit Recoding of Two's Complement Binary Numbers and its Proof with Application in Multiplier Implementation," IEEE Trans. on Computers, vol. 39, No. 8, August 1990. Google ScholarDigital Library
- G. Kim et al., "A Low-Energy Hybrid Radix-4/-8 Multiplier for Portable Multimedia Applications," Proceedings of IEEE International Symposium on Circuits and Systems, (ISCAS), pp. 1171--1174, Rio de Janeiro, Brazil, May 15-18, 2011.Google Scholar
- B.J. Benschneider et al, "A Pipelined 50MHz CMOS 64-Bit Floating-Point Arithmetic Processor," IEEE Journal of Solid-State Circuits, vol. (24) 5, pp. 1317--1323, October 1989.Google ScholarCross Ref
- C.F. Webb et al, "A 400-MHz s/390 Microprocessor," IEEE Journal of Solid-State Circuits, vol. (32) 11, pp. 1665--1675, November 1997.Google ScholarCross Ref
- J. Clouser et al, "A 600-MHz Superscalar Floating-point Processor," IEEE Journal of Solid-State Circuits, vol. (34) 7, pp. 1026--1029,July 1999.Google ScholarCross Ref
- R. Senthinathan et al, "A 650-MHz, IA-32 Microprocessor with Enhanced Data Streaming for Graphics and Video," IEEE Journal of Solid-State Circuits, vol. (34) 11, pp. 1454--1465, November 1999.Google ScholarCross Ref
- A. Scherer et al, "An Out-of-Order Tree-Way Superscalar Multimedia Floating Point Unit," Proceeding of IEEE International Solid-State Circuits Conference (ISSCC), pp. 94--95, 1999.Google Scholar
- Intel Corp., "Intel 64 and IA-32 Architectures Software Developers Manual," volume 1, order number 253668, Copyright May 2011.Google Scholar
- R.J. Rieldlinger, "A 32 nm 3.1 Billion Transistor 12-Wide-Issue Itanium Processor for Mission-Critical Servers," Proceedings of IEEE International Solid-State Circuits Conference (ISSCC), pp. 84--86, San Francisco, CA, USA, February 20-24, 2011.Google Scholar
- P.M. Seidel, L. D. McFearin, and D.W. Matula, "Secondary Radix Recodings for Higher Radix Multipliers," IEEE Trans. on Computers, vol. 54, No. 2, February 2005. Google ScholarDigital Library
- V.S. Dimitrov, K.U. Järvinen, and J. adikari, "Area Efficient Multipliers Based on Multiple-Radix Representations," IEEE Trans. on Computers, vol. 60, No. 2, pp 189--201, February 2011. Google ScholarDigital Library
- O.L. McSorley, "High-Speed Arithmetic in Binary Computers," Proceedings of the IRE, Vol. 49(1), pp. 67--91, January 1961.Google ScholarCross Ref
- F. Lamberti, "Reducing the Computation Time in (Short Bit-Width) Two's Complement Multiplier," IEEE Trans. on Computers, vol. 60, No. 2, pp. 148--156, February 2011. Google ScholarDigital Library
- S.R. Kuang, J.P. Wang, and C.Y. Guo, "Modified Booth Multipliers with a Regular Partial Product Array," IEEE Trans. on Circuit and Systems II, Express Brief, vol. 56, No. 5, May 2009. Google ScholarDigital Library
- S.R. Kuang, J.P. Wang, "Design of Power-Efficient Configurable Booth Multiplier," IEEE Trans. on Circuit and Systems I, vol. 57, No. 3, March 2010. Google ScholarDigital Library
- M. Själander and P. Larsson-Edefors, "Multiplication Acceleration Through Twin Precision," IEEE Trans. on Very Large Scale Integration (VLSI) Systems, Vol. 17, No. 9, September 2009. Google ScholarDigital Library
- A. D. Booth, "A Signed Binary Multiplication Te:chnique," Quarterly J. Mech. Appl. Math., Vol. 4, part 2, pp. 236--240, 1951.Google ScholarCross Ref
- P.M. Seidel, L. D. McFearin, and D.W. Matula, "Binary Multiplication Radix-32 and Radix-256," Proceedings of the IEEE Symposium on Computer Arithmetic (ARITH-15), ISBN: 0-7695-1150-3, pp. 23--32, USA, June 2001. Google ScholarDigital Library
- A.K. Oudjida et al., "New High-Speed and Low-Power Radix-2r Multiplication Algorithms," Proceedings of IEEE-FTFC Conference on Low-Voltage Low-Power, DOI: 10.1109/FTFC.2012.6231732, Paris, June 6-8, 2012.Google Scholar
- E. Manmasson et al., "FPGA in Industrial Control Applications," IEEE Trans. on Industrial Informatics, vol. 7, No. 2, May 2011.Google Scholar
- M. Alioto, Elio Consoli, and Gaetano Palumbo, "Metrics and Design Consideration on the Energy-Delay Tradoff of Digital Circuits," Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'09), pp. 3150--3153, Taiwan, May 24-27 2009.Google Scholar
- A.K. Oudjida et al., "High-Speed and Low-Power PID Structures for Embedded Applications," Proceedings of the 21th edition of the International Workshop on Power and Timing Modeling, Optimization and Simulation PATMOS, LNCS 6951, pp. 257--266, Springer-Verlag Editor. Madrid, Spain, September 26-29, 2011. Google ScholarDigital Library
Index Terms
- A new high radix-2r (r≥8) multibit recoding algorithm for large operand size (N≥32) multipliers
Recommendations
Approximate Radix-8 Booth Multipliers for Low-Power and High-Performance Operation
The Booth multiplier has been widely used for high performance signed multiplication by encoding and thereby reducing the number of partial products. A multiplier using the radix-<inline-formula><tex-math notation="LaTeX">$4$ </tex-math><alternatives><...
Parallel High-Radix Nonrestoring Division
An algorithm for high-radix nonrestoring division is proposed which combines a cost-efficient quotient estimation technique with collapsing of the division into one operation each iteration. The quotient estimation technique is a direct combinatorial ...
Very High Radix Scalable Montgomery Multipliers
IWSOC '05: Proceedings of the Fifth International Workshop on System-on-Chip for Real-Time ApplicationsThis paper describes a very high radix scalable Montgomery multiplier. It extends the radix-2 Tenca-Koç scalable architecture using w v-bit integer multipliers in place of AND gates. The new design can perform 1024-bit modular exponentiation in 6.6 ms ...
Comments