# FPGA Carry Chain Adder (1A)

- - - •

Copyright (c) 2010 -- 2020 Young W. Lim.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

Please send corrections (or suggestions) to youngwlim@hotmail.com.

This document was produced by using OpenOffice and Octave.



$$s_i = (a_i \oplus b_i) \oplus c_i = p_i \oplus c_i$$
  

$$c_{i+1} = (a_i \cdot b_i) + (a_i \oplus b_i) c_i = \overline{p_i} \cdot g_i + p_i \cdot c_i = \overline{p_i} \cdot a_i + p_i \cdot c_i = \overline{p_i} \cdot b_i + p_i \cdot c_i$$

| when $\overline{p}_i = 1$ , then $a_i = b_i$ | p(i) | 0 | 1 | g(i) | 0 | 1 |
|----------------------------------------------|------|---|---|------|---|---|
|                                              | 0    | 0 | 1 | 0    | 0 | 0 |
| when $g_i = 1$ , then $a_i = b_i = 1$        | 1    | 1 | 0 | 1    | 0 | 1 |

Synthesis of Arithmetic Circuits: FPGA, ASIC and Ebedded Systems, J-P Deschamps et al



Synthesis of Arithmetic Circuits: FPGA, ASIC and Ebedded Systems, J-P Deschamps et al

4

### **Carry Chain Adder**

Young Won Lim 10/7/20

# **FPGA** Carry Chain

FPGAs generally contain dedicated computation resources for generating fast adders

The Virtex family programmable arrays include logic gates (**XOR**) and **multiplexers** that along with the general purpose **lookup tables** allow one to build effective carry-chain adders

The carry chain is made up of multiplexers belonging to adjacent configurable blocks

the lookup table is used for implementing the exclusive or function

p(i) = x(i) xor y(i)

https://en.wikipedia.org/wiki/Carry-lookahead\_adder



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



|   |   | Cin   | Cin   |                             |
|---|---|-------|-------|-----------------------------|
| Х | Y | Cout1 | Cout0 |                             |
| 0 | 0 | 0     | 0     | $\overline{X} \overline{Y}$ |
| 0 | 1 | 1     | 0     | ΧY                          |
| 1 | 0 | 1     | 0     | XΫ                          |
| 1 | 1 | 1     | 1     | ХҮ                          |

Cout : functions of X, Y, Cin

Cout(X, Y, 1) = Cout1 = X + YCout(X, Y, 0) = Cout0 = X Y

Cout1 = X + Y when Cin=1 Cout0 = XY when Cin=0

Cout1 = P'  $\underline{Cin}$  ... P' = relaxed P Cout0 = G  $\overline{Cin}$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

If  $\underline{Cin}$ , then  $Cout = (\overline{X} Y + X \overline{Y} + X Y)$ If  $\overline{Cin}$ , then Cout = X Y

 $Cin (X + Y) + \overline{Cin} X Y$   $Cin (\overline{X} Y + X \overline{Y} + X Y) + \overline{Cin} X Y$   $Cin (\overline{X} Y + X \overline{Y}) + (Cin + \overline{Cin}) X Y$ P Cin + G

```
Cin (X + Y) + \overline{Cin} X Y
Cin P' + Cin G ... P' : relaxed P
```



| Cout1 | Cout                      | Name              |
|-------|---------------------------|-------------------|
| 0     | 0                         | Kill              |
| 1     | Cin                       | Propagate         |
| 0     | Cin                       | Inverse Propagate |
| 1     | 1                         | Generate          |
|       | Cout1<br>0<br>1<br>0<br>1 |                   |

| Cout0 | Cout                      | Name              |
|-------|---------------------------|-------------------|
| 0     | 0                         | Kill              |
| 1     | Cin                       | Inverse Propagate |
| 0     | Cin                       | Propagate         |
| 1     | 1                         | Generate          |
|       | Cout0<br>0<br>1<br>0<br>1 |                   |



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

# **Carry Chain**



|   |   | Cin   | Cin   |                             |
|---|---|-------|-------|-----------------------------|
| Х | Y | Cout1 | Cout0 |                             |
| 0 | 0 | 0     | 0     | $\overline{X} \overline{Y}$ |
| 0 | 1 | 1     | 0     | ΧY                          |
| 1 | 0 | 1     | 0     | ΧŸ                          |
| 1 | 1 | 1     | 1     | ХҮ                          |

| Cout1 | Cout0 | Cout | Name              |
|-------|-------|------|-------------------|
| 0     | 0     | 0    | Kill              |
| 0     | 1     | Cin  | Inverse Propagate |
| 1     | 0     | Cin  | Propagate         |
| 1     | 1     | 1    | Generate          |

Carry Out

|   | <b>J</b> | -   |     |  |
|---|----------|-----|-----|--|
| Х | Y        | Cin |     |  |
| 0 | 0        | Cin | Cin |  |
| 0 | 1        | Cin | Cin |  |
| 1 | 0        | Cin | Cin |  |
| 1 | 1        | Cin | Cin |  |
|   |          |     |     |  |

Cout1=1 when Cin=1 Cout0=0 when Cin=0 Cout = Cin propage Cout1=0 when Cin=1 Cout0=1 when Cin=0

Cout = Cin inverse propagate

# Parity Checker



|   |   | Cin   | Cin   |                             |
|---|---|-------|-------|-----------------------------|
| Х | Y | Cout1 | Cout0 |                             |
| 0 | 0 | 1     | 0     | $\overline{X} \overline{Y}$ |
| 0 | 1 | 0     | 1     | ΧY                          |
| 1 | 0 | 0     | 1     | XΫ                          |
| 1 | 1 | 1     | 0     | ΧY                          |

| Cout1 | Cout0 | Cout | Name              |
|-------|-------|------|-------------------|
| 0     | 0     | 0    | Kill              |
| 0     | 1     | Cin  | Inverse Propagate |
| 1     | 0     | Cin  | Propagate         |
| 1     | 1     | 1    | Generate          |

Cout1=1 when Cin=1 Cout0=0 when Cin=0 Cout = Cin propagate Cout1=0 when Cin=1 Cout0=1 when Cin=0 Cout = Cin inverse propagate

Computing Parity  $X \oplus Y \oplus Cin$ 

| 0 ⊕ 0 ⊕ Cin<br>0 ⊕ 1 ⊕ Cin | Cin<br>Cin |
|----------------------------|------------|
| 1 ⊕ 0 ⊕ Cin                | Cin        |
| 1 ⊕ 1 ⊕ Cin                | Cin        |

# **Ripple Carry Chain**



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



the logic cells - resources to compute a function the exact location of logic cells depends on the user. a user can start or end a carry computation at any place in an fpga.

to start a carry chain, the first cell in the chain must be programmed to ignore the Cin signal

program mux2 in the cell to route input Z to mux1 instead of Cin

When it is desired to have a carry input to the first cell of the chain (implementing combined adder/subtractors)

But in many carry computations, the first cell has only 2 inputs, and forcing the carry chain to wait for the arrival of an additional, unnecessary input will only needlessly slow down the circuit's computation.





#### the first cell in the chain

the same LUTs

the <u>same</u> output regardless of Z and Cin Cout1 = Cout0 = Cout regardless of the select

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

# **Ripple Carry Chain**





fig1b shows an implementation of a mux that does not obey this requirement

since the carry chain is part of an fpga, the input to this mux could be connected to some unused logic in another row which is generating unknown values.

if that unused logic had multiple transitions which caused the signal to change quicker than the gate could react, then it is possible that **the select signal** to this mux could be stuck midway between true and false (2.5V for 5V CMOS)

in this case, it will <u>not</u> be able to <u>pass a true value</u> from the input to the output and thus will not function properly for this application.

# **Ripple Carry Chain**



however a mux built with both n-transistor and p-transistor pass gates will operate properly for this case

assume this mux implementation will be used

tristate driver based muxes could be used, which restore signal drive and cut series RC chains

# Unit Gate Delay Model

All simple gate of two or three inputs that are directly implementable in one logic level in CMOS are considered to have a delay of one.

All other gate must be implemented by such gates, and have the delay of the underlying circuit.

#### Delay of one

- inverters and
- 2 to 3 input NAND
- 2 to 3 input NOR gates

A 2:1 mux has a delay of one from the I0 or I1 inputs to the output, But has a delay of two from the select input to the output due to the Inverter delay

#### Delay of zero (constant delay)

- the delay of the 2-LUTs,
- any routing leading to them,







Delay 1 for first cell Delay 3 for each additional cell in the carry chain 1 delay for mux2 and 2 delays for mux1

Overall 2n-2 for an n-cell carry chain

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



The critical path comes from the 2-LUTs and not input Z since the delay through the 2-LUTs will be larger than through mux 2 in the first cell

to reduce the delay of the ripple carry chain

- remove mux2 from the carry path.
- no need to choose between Cin and Z for the select line to the output mux1
- two separate muxes, mux1 and mux2, controlled by Cin and Z respectively.
- the circuit chooses between these outputs with mux3.



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



although this design is 1 gate delay slower than that of fig 2a, it provides the ability to have a carry input to the first cell in a carry chain, something that is important in many computations.

Also, for carry computations that do not need this feature, the first cell in a carry chain built from fig 2b can be configured to bypass mux1, reducing the overall delay to 2n, which is identical to that of fig2a.



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



on the other hand, in order to implement a n-bit carry chain with a carry input, the design of fig 2a requires an additional cell at the beginning of the chain to bring in this input, resulting in a delay of 2(n+1)=2n+2, which is lower than that of the design in fig2b thus, the design of fig 2b is the preferred ripple carry design among those presented so far



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

various high performance carry chains can be developed based on the carry cell of fig 2c

this cell is very similar to that of fig 2b, except that the actual carry chain (mux4) has been replaced by an abstract fast carry logic unit and mux5 has been added

this extra mux5 is present because although some of our faster carry chains will have much faster carry propagation for long carry chains, they incur significant delay for non-carry computations

thus, when the cell is used as a simple normal 3LUT, using inputs X, Y, and Z mux5 allows us to bypass the carry chain by selecting the output of mux1





#### 1 gate delay slower

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

However, carry chains built from this design have a delay of 3 in the first cell

(1 in mux1, 1 in mux2, 1 in mux4) and 2 in all other cells in the carry chain,

yielding an overall delay of 2n+1 for an n-bit carry chain.

thus, although this design is 1 gate delay slower than that of fig 2a,

it provides the ability to have a carry input to the first cell in a carry chain,

something that is important in many computations.

Also, for carry computations that do not need this feature, the first cell in a carry chain built from fig 2b can be configured to bypass mux1, reducing the overall delay to 2n, which is identical to that of fig2e

which is identical to that of fig2a.

on the other hand, in order to implement a n-bit carry chain with a carry input,

the design of fig 2a requires an additional cell at the beginning of the chain to bring in this input,

resulting in a delay of 2(n+1)=2n+2, which is lower than that of the design in fig2b

thus, the design of fig 2b is the preferred ripple carry design among those presented so far



to optimize a ripple carry chain structure for use in FPGAs while this provides some performance gain over the basis ripple carry scheme found in many current FPGAs, it is still much slower than what is done in custom logic There have been tremendous amounts of work done on developing alternative carry chain scheme that overcome the linear delay growth of ripple carry adders Although these techniques have not yet been applied to FPGAs, demonstrate how these advanced adder techniques can be integrated into reconfigurable logic





- not logically equivalent
- no longer use the Z input in the <u>first</u> cell since Z is only attached to mux2 and mux 2 does not lead to the carry cells







For the first cell in a carry chain with a carry input (provided by input Z), mux2 and mux3 both pass the value from mux1

the two main inputs to mux4 are identical the output of mux4 (Cout) will be the same as the output of mux1 (ignoring Cin)

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



mux1's main inputs are driven by two 2-LUTs (OR, AND) controlled by X and Y mux1 forms a 3-LUT with the other 2-LUTs

When mux2 and mux3 pass the value from mux1 (Cout1 and Cout2 respectively) the circuit is configured to continue the carry chain

Functionally equivalent

29



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



a delay of 3 in the first cell(1 in mux1, 1 in mux2, 1 in mux4)2 in all other cells in the carry chainan total delay of 2n+1 for an n-bit carry chain

1 gate delay slower than that of fig 2a, a carry input to the first cell is enabled

30



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

#### **Carry Chain Adder**

Young Won Lim 10/7/20



Also, for carry computations that do not need this feature, the first cell in a carry chain built from fig 2b can be configured to bypass mux1, reducing the overall delay to 2n, which is identical to that of fig2a.

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

in order to implement a n-bit carry chain with a carry the design of fig 2a requires an additional cell at the beginning of the chain to bring in this input, resulting in a delay of 2(n+1)=2n+2, which is lower than that of the design in fig2b

thus, the design of fig 2b is the preferrred ripple carry design among those presented so far



in order to implement a n-bit carry chain with a carry input the design of fig 2a requires an additional cell at the beginning of the chain to bring in this input, resulting in a delay of 2(n+1)=2n+2, which is lower than that of the design in fig2b

thus, the design of fig 2b is the preferrred ripple carry design among those presented so far

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



a delay of 3 in the first cell
(1 in mux1, 1 in mux2, 1 in mux4)
2 in all other cells in the carry chain
an total delay of 2n+1 for an n-bit carry chain

t1 gate delay slower than that of fig 2a, a carry input to the first cell is enabled

Also, for carry computations that do not need this feature, the first cell in a carry chain built from fig 2b can be configured to bypass mux1, reducing the overall delay to 2n, which is identical to that of fig2a.

in order to implement a n-bit carry chain with a carry input, the design of fig 2a requires an additional cell at the beginning of the chain to bring in this input, resulting in a delay of 2(n+1)=2n+2, which is lower than that of the design in fig2b

thus, the design of fig 2b is the preferrred ripple carry design among those presented so far





High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



 $Cout_i = (Cout_{i-1} \cdot C1_i) + (\overline{Cout_{i-1}} \cdot C0_i)$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



$$Cout_i = (Cout_{i-1} \cdot C1_i) + (\overline{Cout_{i-1}} \cdot C0_i)$$

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

# Fast Carry Logc

Carry Select Adder Carry Lookahead Adder Brent-Kung Variable Block Ripple Carry Adder

https://en.wikipedia.org/wiki/Carry-lookahead\_adder



 $Cout_i = (Cout_{i-1} \cdot C1_i) + (\overline{Cout_{i-1}} \cdot C0_i)$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



 $(C1_{3}C1_{2}+C0_{3}\overline{C1}_{2})Cout_{1}+(C1_{3}C0_{2}+C0_{3}\overline{C0}_{2})\overline{Cout}_{1}$ 

$$Cout_{i} = (Cout_{i-1} \cdot C \mathbf{1}_{i}) + (\overline{Cout_{i-1}} \cdot C \mathbf{0}_{i})$$

$$Cout_{i+1} = (Cout_{i} \cdot C \mathbf{1}_{i+1}) + (\overline{Cout_{i}} \cdot C \mathbf{0}_{i+1})$$

$$Cout_{2} = (Cout_{1} \cdot C \mathbf{1}_{2}) + (\overline{Cout_{1}} \cdot C \mathbf{0}_{2})$$

$$Cout_{3} = (Cout_{2} \cdot C \mathbf{1}_{3}) + (\overline{Cout_{2}} \cdot C \mathbf{0}_{3})$$

$$= (((Cout_{1} \cdot C \mathbf{1}_{2}) + (\overline{Cout_{1}} \cdot C \mathbf{0}_{2})) \cdot C \mathbf{1}_{3})$$

$$+ (\overline{((Cout_{1} \cdot C \mathbf{1}_{2}) + (\overline{Cout_{1}} \cdot C \mathbf{0}_{2}))} \cdot C \mathbf{0}_{3})$$

$$(((Cout_1 \cdot C \mathbf{1}_2) + (\overline{Cout_1} \cdot C \mathbf{0}_2)) \cdot C \mathbf{1}_3)$$
  
=  $(C\mathbf{1}_3 C \mathbf{1}_2 Cout_1 + C \mathbf{1}_3 C \mathbf{0}_2 \overline{Cout_1})$ 

$$((\overline{(Cout_1 \cdot C 1_2)} \cdot \overline{(Cout_1 \cdot C 0_2)}) \cdot C 0_3)$$
  
=  $(((\overline{Cout_1} + \overline{C 1_2}) \cdot (Cout_1 + \overline{C 0_2})) \cdot C 0_3)$   
=  $(\overline{Cout_1}Cout_1 + \overline{C 1_2}Cout_1 + \overline{Cout_1}\overline{C 0_2} + \overline{C 1_2}\overline{C 0_2}) \cdot C 0_3$   
=  $(\overline{C 1_2}Cout_1 + \overline{C 0_2}\overline{Cout_1}) \cdot C 0_3$   
=  $(C 0_3\overline{C 1_2}Cout_1 + C 0_3\overline{C 0_2}\overline{Cout_1})$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



 $= (\overline{Cout_1}Cout_1 + \overline{C1_2}Cout_1 + \overline{Cout_1}\overline{C0_2} + \overline{C1_2}\overline{C0_2}) \cdot C0_3$  $= (\overline{C1_2}Cout_1 + \overline{C0_2}\overline{Cout_1}) \cdot C0_3$  $= (C0_3\overline{C1_2}Cout_1 + C0_3\overline{C0_2}\overline{Cout_1})$ 

 $(C1_{3}C1_{2}+C0_{3}\overline{C1}_{2})Cout_{1}+(C1_{3}C0_{2}+C0_{3}\overline{C0}_{2})\overline{Cout}_{1}$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

#### **Carry Chain Adder**

42



$$Cout_{i} = (Cout_{i-1} \cdot C \mathbf{1}_{i}) + (\overline{Cout_{i-1}} \cdot C \mathbf{0}_{i})$$
$$Cout_{i+1} = (Cout_{i} \cdot C \mathbf{1}_{i+1}) + (\overline{Cout_{i}} \cdot C \mathbf{0}_{i+1})$$

$$Cout_{i+1} = \left( \left[ \left( Cout_{i-1} \cdot C \mathbf{1}_i \right) + \left( \overline{Cout_{i-1}} \cdot C \mathbf{0}_i \right) \right] \cdot C \mathbf{1}_{i+1} \right) \\ + \left( \overline{\left[ \left( Cout_{i-1} \cdot C \mathbf{1}_i \right) + \left( \overline{Cout_{i-1}} \cdot C \mathbf{0}_i \right) \right]} \cdot C \mathbf{0}_{i+1} \right)$$

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

### References

- [1] http://en.wikipedia.org/
- [2] J-P Deschamps, et. al., "Sunthesis of Arithmetic Circuits", 2006