

# High Performance Energy Efficient Computation Elements of Processing Unit

Dhanabal Rengasamy, Ramakrishnan V. N.

Abstract: in our manuscript, various circuits for arithmetic summation are compared. Cadence 90nm technology and Quartus II EP2C20F484C7 are used for implementation of design. Logic gate-based adders, PFCA, TG and HSD technique-based adders characteristics are analyzed. Y finding is PFCA with 10T transistor performs slightly efficient compare to its counterpart. Exclusive OR-NOR design is optimum for least delay Adders for high performance energy efficient processing unit.

Keywords: Adders, Multipliers, Power Dissipation, Delay.

# I. INTRODUCTION

The use of portable devices like laptops, mobile, tablets etc have been increasing tremendously over the years. We know that, ALU consists of Arithmetic and Logical Units. Arithmetic operations are performed by Adders, Multipliers and Subtractor. PFCA has been implemented using 11T.But there is voltage degradation during that implementation. So, the number of transistors has been reduced to 10 to reduce the voltage degradation to some extent. If in the 10T transistors, buffers to the both Summation and carry-out, waveform without any glitches and voltage degradation is obtained.

#### II. ADDER

#### A. Adder with ripple carry design (CLA)

RCA [1] is a sequence of full-adders. For n-bit RCA, n full adders are required. Carry bit generated is propagated from initial stage to next.





**Dhanabal rengasamy\***, Assistant Professor, DMNE, SENSE, Vellore Institute of Technology, Vellore (Tamil Nadu) India.

E-mail: rdhanabal@vit.ac.in **Ramakrishnan V. N.,** Associate Professor, DMNE, SENSE, Vellore Institute of Technology, Vellore (Tamil Nadu) India.

© The Authors. Published by Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an <u>open access</u> article under the CC BY-NC-ND license (<u>http://creativecommons.org/licenses/by-nc-nd/4.0/</u>)

RCA given in Fig.1 is a power efficient processing element. Here, the processing element in the next stage of addition has to wait for the computation of previous stage.



#### Fig.2. Wave forms of 4-bit RCA.

Here, input 1=4'b0101 and input 2=4'b0010 and we get the output as sum=4'b0111, here the carry of the 1<sup>st</sup> stage is rippled into the 2<sup>nd</sup> stage.

# **B.** Plagiarism Adder with Carry look ahead design (CLA)

To reduce the delay, CLA [1] uses propagation (P) and generation (G) signals. P=input 1 ^ input 2 and G = input 1. input2 where input 1 & input 2 are labeled as A and B. The delay is less than RCA since the computation of the output of the next stage doesn't depend upon the output carry of the previous stage.



Fig.3. Carry Look Ahead Adder.

Retrieval Number: B3964129219/2019©BEIESP DOI: 10.35940/ijeat.B3964.129219 Journal Website: <u>www.ijeat.org</u>



#### Simulation Report of CLA (4-bit)



Fig. 4: Wave forms of CLA

Here, a=4'b0010 and b=4'b1011 and we get the output as sum=4'b1110.

# C. CARRY SELECT ADDER (CSA)

In CSA it is not required to wait for carry in every stage. Once the carry is known immediately the result can be obtained. For a 4-bit CSA, four-bit inputs are given. These inputs are applied to full-adders. Multiplexers are also used to generate resultant values in final stage.



Fig. 5: CSA circuit.





Fig.6. Input and output wave forms of Carry Select Adder.

Here, a=4'b0010 and b=4'b1011 and sum=4'b1110.

Retrieval Number: B3964129219/2019©BEIESP DOI: 10.35940/ijeat.B3964.129219 Journal Website: <u>www.ijeat.org</u>

#### D. PARALLEL FEEDBACK CARRY ADDER (PFCA)

The basic building component of PFCA is a half- adder. As a Full- adder is made up of two HA with OR logical element, whereas HA has one AND gate and one XOR gate. So, half-adder is just a component of PFCA. It has a parallel mode of operation with a feedback which as a whole provides speed to the adder and computation time is also low. The architecture of 4-bit PFCA is shown in Fig 4.



Fig.7: PFCA with half-adder as the component.

Implementing the circuit with CMOS gate shows that voltage degradation occurs when PMOS transistor passes 0 and NMOS transistor passes 1 logical output. Fig. 5 displays schematic structure of PFCA using 10T. The power dissipation is also high due to leakage current of the inverter. This leakage current makes the NMOS transistor to turn on in some unwanted situations. It can be avoided by using a PMOS transistor in series with the pull-up transistor of the inverter. The schematic is shown in Fig. 6. It has been implemented in Cadence by using the library gpdk90 with  $V_{DD}$ =1.5V.



Fig.8. Schematic of 1 bit full-adder using 10T.





### **III. SIMULATION RESULT**



For Carry Output:



Fig. 9. PFCA I/O waveforms using 10T.

# IV. ADDER USING HSD (HIGH SPEED DOMINO) TECHNIQUE

In adders based on HSD technique [2], P-Type keeper transistor (kt) pullup network reduce the current leakage which is connected to the kt. a clock signal is applied to its gate.



Fig.10. Schematic Diagram of Adder using HSD Technique with Sum as the output.

Retrieval Number: B3964129219/2019©BEIESP DOI: 10.35940/ijeat.B3964.129219 Journal Website: <u>www.ijeat.org</u> When the clock is low, the dynamic node is pre-charged to  $V_{\rm DD}$  and the output is low. When the CLK signal goes from low level to high level, the output evaluated. It has been implemented in Cadence by using gpdk90 library and  $V_{\rm DD}{=}1.5V$ 



Fig.11. Schematic Diagram of Adder using HSD Technique with Carry as output.





Fig. 12. Input and Cout waveforms of adders using HSD Technique.





Fig. 13. Domino Logic



The Domino Logic [3] is shown in Fig. above. When the clock goes zero,  $M_p$  is on and pre-charging takes place and the dynamic node is charged. During high level clock, the PDN network is operating and it evaluates the dynamic node called as the evaluation phase. The high-level dynamic node is preserved until the next pre-charge phase. The schematic diagram of 2-bit Domino Logic is shown in Fig. 14.



Fig. 14. Adder using 2-bit Domino Logic having Sum Output



Fig.15. Schematic of 2-bit adder based on domino logic with carry as output

Retrieval Number: B3964129219/2019©BEIESP DOI: 10.35940/ijeat.B3964.129219 Journal Website: <u>www.ijeat.org</u> Simulation Results:



Fig.16. Input and sum waveforms of 2-bit Domino Logic



Fig. 17. Input and carryout waveforms of 2-bit domino logic adder

EX-OR with EX-XNOR adder design



Fig.18. Full Adder Circuit using EX-OR /EX-NOR.

Published By: Blue Eyes Intelligence Engineering & Sciences Publication

2453





In order to obtain the complete voltage swing [4] for the output values, the TG  $M_1,M_2,M_3,M_4$  are used. $M_1$  and  $M_2$  transmission gates(TG) are connected to the XOR/XNOR output from where we get the sum as result. The transmission gates  $M_3$  and  $M_4$  are connected to the output of XOR/XNOR circuit from here we get the carry as result.

When XOR generates logic 1 and XNOR generates logic 0, the TG  $M_1$  switches on and  $M_2$  switches off. The inverted input Cin is given to the sum output. When the output of the XOR is at logic 0 and XNOR is at logic 1, M1 is off and M2 is on. Hence, the sum output is same as input Cin. This is the operation for the sum output.

When the output of XOR is at logic high and XNOR is logic low, the transmission gate M3 switches on and M4 gate switches off. The carry-out bit is similar to Cin. When the output of XOR is low (0) and XNOR output is high (1), M4 turns on and M3 turns off. Hence, the carry-out is similar to the B input. This circuit can be implemented in Cadence in 90nm technology using  $V_{DD}$ =1.5V.



Fig. 19. Schematic EX-OR and EX-XNOR FA design *Simulation result:* 



Fig.20. I/O waveforms of adder with X-OR and X-NOR gates.

## PROPOSED CIRCUIT

In the 10T PFCA, voltage degradation issue can be prevented by introducing buffers to both outputs in inverter form. The generated output is free from glitches with no degradation in voltage.

| TABLE 1. Comparison of adder circuits using Cadence |
|-----------------------------------------------------|
| and V <sub>DD</sub> =1.5V                           |

| Adders                           | Power (µW) | Delay (ps) |
|----------------------------------|------------|------------|
| 2-bit Domino<br>logic            | 6.455      | 39.8       |
| High Speed<br>Domino             | 41.74      | 35.6       |
| PFCA (11T)                       | 120.4      | 45.76      |
| PFCA (10T)                       | 106.2      | 41.55      |
| Based on X-OR<br>and X-NOR gates | 34.8       | 1.989      |

### TABLE 2. Comparison of adder circuits implemented in Cyclone II Family in Quartus II

| Adders | No.<br>of<br>pin<br>s | No. of<br>logic<br>eleme<br>nts | Dynamic<br>power<br>dissipation<br>(mW) | Static<br>power<br>dissipation<br>(mW) | Delay<br>(ns) |
|--------|-----------------------|---------------------------------|-----------------------------------------|----------------------------------------|---------------|
| RCA    | 14                    | 10                              | 0                                       | 47.35                                  | 20.751        |
| CLA    | 14                    | 15                              | 0                                       | 47.35                                  | 17.234        |
| CSA    | 15                    | 8                               | 0                                       | 47.35                                  | 15.012        |



Fig.21. Simulation result of 10T PFCA with buffers.



Published By: Blue Eyes Intelligence Engineering & Sciences Publication

Retrieval Number: B3964129219/2019©BEIESP DOI: 10.35940/ijeat.B3964.129219 Journal Website: <u>www.ijeat.org</u>



Fig. 22. Comparison of 10T with 11T PFCA power dissipation.

### V. CONCLUSION

11T design implementation of PFCA suffers with voltage degradation. If it is modified with 10T instead of 11T, the voltage degradation is reduced to some extent. Analysis of different adder circuits by implementing ASIC library and FPGA families is tabulated. Comparison of existing multipliers and proposal of a new design for optimized performance [11] may be the next level of development., Implementation of Design of basic building blocks of ALU [9], MAC[15], FPU[14], RNS MAC[13], ALU-DBGU [12], FinFET based design [5], Reversible logic based ALU [6], Floating-Point Fused Multiply-Add Unit[7], Reversible logic based ALU [8] are few design were modified basic blocks can be used as library component for future analysis.

#### REFERENCES

- Singh, R., Singh, J. and Singla, M., 2012. Comparative analysis of tg based 16-bit adders using 180 nm technology. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (IJAREEIE), 2, p.136.
- Ajayan, J., Nirmal, D., Sivasankari, S., Sivaranjani, D. and Manikandan, M., 2014, March. High speed low power Full Adder circuit design using current comparison-based domino. In 2014 2nd International Conference on Devices, Circuits and Systems (ICDCS) (pp. 1-5). IEEE.
- 3. Rabaey, J.M., Chandrakasan, A.P. and Nikolic, B., 2002. Digital integrated circuits (Vol. 2). Englewood Cliffs: Prentice hall.
- Masala, S. and Reddy, B.R., 2013, December. Implementation of a full adder circuit with new full swing EX-OR/EX-NOR gate. In 2013 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (Prime Asia) (pp. 29-33). IEEE.
   Dhanabal, R., Sahoo, S.K.," Design and implementation of
- Dhanabal, R., Sahoo, S.K.," Design and implementation of floating-point unit using 15 nm FIFET", (2016) Indian Journal of Science and Technology, 9 (37), art. no. 102131,
- Dhanabal, R., Sahoo, S.K., Bharathi, V., Bhavya, V., Chandrakant, P.A. and Sarannya, K., 2016. Design of Reversible Logic Based ALU. In Advances in Intelligent Systems and Computing, 397, pp. 303-313. Springer, New Delhi.
- Dhanabal, R., Sahoo, S.K. and Bharathi, V., 2016. Implementation of Low Power and Area Efficient Floating-Point Fused Multiply-Add Unit. Advances in Intelligent Systems and Computing, 397, pp. 329-342. Springer, New Delhi.
- Dhanabal, R., Sahoo, S.K., Bharathi, V., Bhavya, V., Chandrakant, P.A., Sarannya, K.,2016. Design of reversible logic based ALU in Intelligent Systems and Computing, 397, pp. 303-313. Springer, New Delhi.
- Dhanabal, R., Sahoo, S.K., Bharathi, V., Devi, A., Sarma, R., Chowdary, D.,2016. Design of basic building blocks of ALU, Advances in Intelligent Systems and Computing, 397, pp. 315-327. Springer, New Delhi.
- Dhanabal, R., Bharathi, V., Shilpa, K., Sujana, D.V., Sahoo, S.K., 2014. Design and implementation of low power floating point arithmetic unit", International Journal of Applied Engineering Research, 9 (3), pp. 339-346.

Retrieval Number: B3964129219/2019©BEIESP DOI: 10.35940/ijeat.B3964.129219 Journal Website: <u>www.ijeat.org</u>

- Dhanabal, R., Bharathi, V., Anand, N., Joseph, G., Oommen, S.S., Sahoo, S.K. ,2013. Comparison of existing multipliers and proposal of a new design for optimized performance, International Journal of Engineering and Technology, 5 (2), pp. 1704-1709.
- Dhanabal, R., Bharathi, V., Salim, S., Thomas, B., Soman, H., Sahoo, S.K.,2013, Design of 16-bit low power ALU-DBGPU, International Journal of Engineering and Technology, 5 (3), pp. 2172-2180.
- Dhanabal, R., Barathi, V., Sahoo, S.K., Samhitha, N.R., Cherian, N.A., Jacob, P.M.,2014, Implementation of floating-point MAC using Residue Number System", ICROIT 2014 - Proceedings of the 2014 International Conference on Reliability, Optimization and Information Technology, art. no. 6798385, pp. 461-465.
- Ushasree, G., Dhanabal, R., Kumar Sahoo, S.,2013," VLSI implementation of a high-speed single precision floating point unit using Verilog ", 2013 IEEE Conference on Information and Communication Technologies, ICT 2013, art. no. 6558204, pp. 803-808.
- Dhanabal, R., Barathi, V., Sahoo, S.K., Samhitha, N.R., Cherian, N.A. and Jacob, P.M., 2014, February. Implementation of floating-point MAC using residue number system. In 2014 International Conference on Reliability Optimization and Information Technology (ICROIT) (pp. 461-465). IEEE.

#### **AUTHOR PROFILE**



Dhanabal Rengasamy, Assistant professor (Sr) in SENSE,VIT,Vellore, received the B.E. degree in Electronics and Communication Engineering from Bharathidasan University, Tiruchirappalli, Tamil Nadu, India in 2001, and M.Tech degree in VLSI Design from SASTRA University, Tanjore, Tamilnadu, India in 2002. His research interests are in the area of Low power

VLSI design and Mixed Signal IC Design.



**Dr. Ramakrishnan V. N.,** Associate Professor, SENSE, VIT, Vellore, His area of research is in carbon nanotube field effect transistor, MOSFET, Logic gates, low power electronics, memristors, nanoelectronics.

