# FPGA Carry Chain Adder (1A)

- - - •

Copyright (c) 2010 -- 2020 Young W. Lim.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

Please send corrections (or suggestions) to youngwlim@hotmail.com.

This document was produced by using OpenOffice and Octave.



$$s_i = (a_i \oplus b_i) \oplus c_i = p_i \oplus c_i$$
  

$$c_{i+1} = (a_i \cdot b_i) + (a_i \oplus b_i) c_i = \overline{p_i} \cdot g_i + p_i \cdot c_i = \overline{p_i} \cdot a_i + p_i \cdot c_i = \overline{p_i} \cdot b_i + p_i \cdot c_i$$

| when $\overline{p}_i = 1$ , then $a_i = b_i$ | p(i) | 0 | 1 | g(i) | 0 | 1 |
|----------------------------------------------|------|---|---|------|---|---|
|                                              | 0    | 0 | 1 | 0    | 0 | 0 |
| when $g_i = 1$ , then $a_i = b_i = 1$        | 1    | 1 | 0 | 1    | 0 | 1 |

Synthesis of Arithmetic Circuits: FPGA, ASIC and Ebedded Systems, J-P Deschamps et al



Synthesis of Arithmetic Circuits: FPGA, ASIC and Ebedded Systems, J-P Deschamps et al

4

### **Carry Chain Adder**

Young Won Lim 1/19/21

### **FPGA** Carry Chain

FPGAs generally contain dedicated computation resources for generating fast adders

The Virtex family programmable arrays include logic gates (**XOR**) and **multiplexers** that along with the general purpose **lookup tables** allow one to build effective carry-chain adders

The carry chain is made up of multiplexers belonging to adjacent configurable blocks

the lookup table is used for implementing the exclusive or function

p(i) = x(i) xor y(i)

https://en.wikipedia.org/wiki/Carry-lookahead\_adder



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



|   |   | Cin   | Cin   |                             |
|---|---|-------|-------|-----------------------------|
| Х | Y | Cout1 | Cout0 |                             |
| 0 | 0 | 0     | 0     | $\overline{X} \overline{Y}$ |
| 0 | 1 | 1     | 0     | ΧY                          |
| 1 | 0 | 1     | 0     | XΫ                          |
| 1 | 1 | 1     | 1     | ХҮ                          |

Cout : functions of X, Y, Cin

Cout(X, Y, 1) = Cout1 = X + YCout(X, Y, 0) = Cout0 = X Y

Cout1 = X + Y when Cin=1 Cout0 = XY when Cin=0

Cout1 = P'  $\underline{Cin}$  ... P' = relaxed P Cout0 = G  $\overline{Cin}$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

If  $\underline{Cin}$ , then  $Cout = (\overline{X} Y + X \overline{Y} + X Y)$ If  $\overline{Cin}$ , then Cout = X Y

 $Cin (X + Y) + \overline{Cin} X Y$  $Cin (X + Y) + \overline{Y} + X Y + \overline{Y} + \overline$ 

```
Cin (X + Y) + \overline{Cin} X Y
Cin P' + Cin G ... P' : relaxed P
```



|   |   | Cin   | Cin   |                             |
|---|---|-------|-------|-----------------------------|
| Х | Y | Cout1 | Cout0 |                             |
| 0 | 0 | 0     | 0     | $\overline{X} \overline{Y}$ |
| 0 | 1 | 1     | 0     | ΧY                          |
| 1 | 0 | 1     | 0     | ΧŸ                          |
| 1 | 1 | 1     | 1     | ΧY                          |

| Х | Y | Cin | Cout |       |
|---|---|-----|------|-------|
| 0 | 0 | 0   | 0    | Cout0 |
| 0 | 1 | 0   | 0    | Cout0 |
| 1 | 0 | 0   | 0    | Cout0 |
| 1 | 1 | 0   | 1    | Cout0 |
| 0 | 0 | 1   | 0    | Cout1 |
| 0 | 1 | 1   | 1    | Cout1 |
| 1 | 0 | 1   | 1    | Cout1 |
| 1 | 1 | 1   | 1    | Cout1 |
|   |   |     |      |       |

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



| Cout1 | Cout                      | Name              |
|-------|---------------------------|-------------------|
| 0     | 0                         | Kill              |
| 1     | Cin                       | Propagate         |
| 0     | Cin                       | Inverse Propagate |
| 1     | 1                         | Generate          |
|       | Cout1<br>0<br>1<br>0<br>1 |                   |

| Cout1 | Cout0 | Cout | Name              |
|-------|-------|------|-------------------|
| 0     | 0     | 0    | Kill              |
| 0     | 1     | Cin  | Inverse Propagate |
| 1     | 0     | Cin  | Propagate         |
| 1     | 1     | 1    | Generate          |



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



| Cout0 | Cout1 | Cout | Name              |
|-------|-------|------|-------------------|
| 0     | 0     | 0    | Kill              |
| 0     | 1     |      | Propagate         |
| 1     | 0     | Cin  | Inverse Propagate |
| 1     | 1     | 1    | Generate          |
|       |       |      |                   |

| Х | Y | Cin | Cout |       | Cout1 | Cout0 |
|---|---|-----|------|-------|-------|-------|
| 0 | 0 | 0   | 0    | Cout0 | 0     | 0     |
| 0 | 1 | 0   | 0    | Cout0 | 1     | 0     |
| 1 | 0 | 0   | 0    | Cout0 | 1     | 0     |
| 1 | 1 | 0   | 1    | Cout0 | 1     | 1     |
| 0 | 0 | 1   | 0    | Cout1 | 0     | 0     |
| 0 | 1 | 1   | 1    | Cout1 | 1     | 0     |
| 1 | 0 | 1   | 1    | Cout1 | 1     | 0     |
| 1 | 1 | 1   | 1    | Cout1 | 1     | 1     |

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

# **Carry Chain**



|   |   | Cin   | Cin   |                             |
|---|---|-------|-------|-----------------------------|
| Х | Y | Cout1 | Cout0 |                             |
| 0 | 0 | 0     | 0     | $\overline{X} \overline{Y}$ |
| 0 | 1 | 1     | 0     | ΧY                          |
| 1 | 0 | 1     | 0     | XΫ                          |
| 1 | 1 | 1     | 1     | ΧY                          |

| Cout1 | Cout0 | Cout | Name              |
|-------|-------|------|-------------------|
| 0     | 0     | 0    | Kill              |
| 0     | 1     | Cin  | Inverse Propagate |
| 1     | 0     | Cin  | Propagate         |
| 1     | 1     | 1    | Generate          |

Carry Out

|   | <b>J</b> | -   |     |  |
|---|----------|-----|-----|--|
| Х | Y        | Cin |     |  |
| 0 | 0        | Cin | Cin |  |
| 0 | 1        |     | Cin |  |
| 1 | 0        | Cin | Cin |  |
| 1 | 1        | Cin | Cin |  |
|   |          |     |     |  |

Cout1=1 when Cin=1 Cout0=0 when Cin=0 Cout = Cin propagate

Cout1=0 when Cin=1Cout0=1 when Cin=0Cout =  $\overline{Cin}$  inverse propagate

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

## Parity Checker



|   |   | Cin   | Cin   |                             |
|---|---|-------|-------|-----------------------------|
| Х | Y | Cout1 | Cout0 |                             |
| 0 | 0 | 1     | 0     | $\overline{X} \overline{Y}$ |
| 0 | 1 | 0     | 1     | ΧY                          |
| 1 | 0 | 0     | 1     | XΫ                          |
| 1 | 1 | 1     | 0     | ΧY                          |

| Cout1 | Cout0 | Cout | Name              |
|-------|-------|------|-------------------|
| 0     | 0     | 0    | Kill              |
| 0     | 1     | Cin  | Inverse Propagate |
| 1     | 0     | Cin  | Propagate         |
| 1     | 1     | 1    | Generate          |

Cout1=1 when Cin=1 Cout0=0 when Cin=0 Cout = Cin propagate Cout1=0 when Cin=1 Cout0=1 when Cin=0 Cout = Cin inverse propagate

Computing Parity  $X \oplus Y \oplus Cin$ 

| 0 ⊕ 0 ⊕ Cin | Cin |
|-------------|-----|
| 0 ⊕ 1 ⊕ Cin | Cin |
| 1 ⊕ 0 ⊕ Cin | Cin |
| 1 ⊕ 1 ⊕ Cin | Cin |
|             |     |

# **Ripple Carry Chain**



the first cell in the chain

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



the logic cells - resources to compute a function the exact location of logic cells depends on the user. a user can start or end a carry computation at any place in an fpga. But in many carry computations, the first cell has only 2 inputs, and forcing the carry chain to wait for the arrival of an additional, unnecessary input Z will only needlessly <u>slow down</u> the circuit's computation.



when Cin is ignored, Z can also be ignored by having the <u>same</u> LUTs



#### the first cell in the chain

the same LUTs

the <u>same</u> output regardless of Z and Cin Cout1 = Cout0 = Cout regardless of the select

# **Ripple Carry Chain**





fig1b shows an implementation of a mux that does not obey this requirement

since the carry chain is part of an fpga, the input to this mux could be connected to some unused logic in another row which is generating unknown values.

if that unused logic had multiple transitions which caused the signal to change quicker than the gate could react, then it is possible that **the select signal** to this mux could be stuck midway between true and false (2.5V for 5V CMOS)

in this case, it will <u>not</u> be able to <u>pass a true value</u> from the input to the output and thus will not function properly for this application.

# **Ripple Carry Chain**



however a mux built with both n-transistor and p-transistor pass gates will operate properly for this case

assume this mux implementation will be used

tristate driver based muxes could be used, which restore signal drive and cut series RC chains

### Unit Gate Delay Model

All simple gate of two or three inputs that are directly implementable in one logic level in CMOS are considered to have a delay of one.

All other gate must be implemented by such gates, and have the delay of the underlying circuit.

#### Delay of one

- inverters and
- 2 to 3 input NAND
- 2 to 3 input NOR gates

A 2:1 mux has a delay of one from the I0 or I1 inputs to the output, But has a delay of two from the select input to the output due to the Inverter delay

#### Delay of zero (constant delay)

- the delay of the 2-LUTs,
- any routing leading to them,





Significantly slower two muxes on the carry chain in each cell

Delay 1 for first cell Delay 3 for each additional cell in the carry chain delay 1 for mux2 delays 2 for mux1

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



The critical path comes from the 2-LUTs and not from the input Z since the delay through the 2-LUTs will be larger than through mux 2 in the first cell

Overall 3n-2 for an n-cell carry chain



delay of 3n-2 for an n-bit ripple carry chain

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

#### **Carry Chain Adder**

20

the linear delay growth of ripple carry adders

optimize a ripple carry chain structure for use in FPGAs

while this provides some performance gain over the basis ripple carry scheme found in many current FPGAs,

still much slower than what is done in custom logic

advanced adder techniques in custom logic can be integrated into reconfigurable logic



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

#### **Carry Chain Adder**

22

to reduce the delay of the ripple carry chain

- remove mux2 from the carry path.
- no need to choose between Cin and Z for the select line to the output mux1

- two separate muxes, mux1 and mux2, controlled by Cin and Z, respectively.
- the circuit chooses between these outputs with mux3.



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry





- not logically equivalent
- the Z input in the <u>first</u> cell cannot be used
  - Z is only attached to mux2
  - mux2 does not lead to the carry cells
  - not connected to Cout

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

delay of 2



an additional cell for generating Cin

delay of 2



 need an <u>additional cell</u> to use Z as a carry input

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



delay of 2(n+1) for an n-bit ripple carry chain with a carry input

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

although this design is 1 gate delay slower than that of fig 2a, it provides the ability to have a carry input to the first cell in a carry chain, something that is important in many computations.

Also, for carry computations that do not need this feature, without a carry input the first cell in a carry chain built from fig 2b can be configured to bypass mux1, reducing the overall delay to 2n, which is identical to that of fig2a.



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

#### **Carry Chain Adder**

28

the other cells



for cells in the <u>middle</u> of a carry chain mux2 passes Cout1 mux3 passes Cout0 mux4 receives Cout1 and Cout0 provides a standard ripple carry path.



For the <u>first</u> cell in a carry chain with a carry input (provided by input Z), mux2 and mux3 both pass the value from mux1

the two main inputs to mux4 are identical the output of mux4 (Cout) will be the same as the output of mux1 (ignoring Cin)

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



mux1's main inputs are driven by two 2-LUTs (OR, AND) controlled by X and Y mux1 forms a **3-LUT** with the other 2-LUTs

When mux2 and mux3 pass the value from mux1 (Cout1 and Cout2 respectively) the circuit is configured to continue the carry chain

Functionally equivalent



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

**Carry Chain Adder** 

30



A delay of 2 in all other cells <u>except</u> the first cell in the carry chain

an total delay of **2n+1** for an n-bit carry chain when a carry input to the first cell is enabled

1 gate delay slower than that of fig 2a,

a delay of 3 in the first cell 1 in mux1, 1 in mux2, 1 in mux4

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



A delay of 2 in all other cells <u>except</u> the first cell in the carry chain

an total delay of **2n** for an n-bit carry chain when a carry input to the first cell is **disabled** 

a delay of 2 in the first cell when a carry input is not used

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



delay of 2 without Z with Z delay of 2 delay of 2 delay of 3 Х Y Y Ζ Х Y Ζ Х Ζ OR AND OR AND OR AND Cout0 Cout1 Cout1 Cout0 Cout1 Cout0 3 Ρ 3 Ρ Ρ 2 Ρ Ρ Ρ Cin Cin Cin Cout Cout Cout F F F c2 c1 cn delay of 2n for an n-bit ripple carry chain without a carry input Z delay of 2n+1 with a carry input Z

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

without Z delay of 2 with Z delay of 2 delay of 2 delay of 3 Х Y Y Ζ Х Ζ Х Y Ζ OR AND OR AND OR AND Cout0 Cout1 Cout1 Cout0 Cout1 Cout0 3 Ρ Ρ Ρ Ρ 2 Ρ Ρ Cin Cin Cin Cout Cout Cout F F F c2 c1 cn without a carry input Z delay of 2n for an n-bit ripple carry chain delay of 2n+1 with a carry input Z

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

# Design C (1)

various high performance carry chains can be developed based on the carry cell of Design C

very similar to Design B except that the actual carry chain (mux4) has been replaced by an abstract fast carry logic unit and mux5 has been added

this extra mux5 is present because although some of our faster carry chains will have much <u>faster</u> carry propagation for long carry chains, they incur <u>significant delay</u> for non-carry computations

thus, when the cell is used as a simple normal **3 LUT**, using inputs X, Y, and Z mux5 allows us to bypass the carry chain by selecting the output of mux1



# Design C (1)

The important thing to realize about the logic of Design C is that any logic that can compute the value

$$Cout_i = (Cout_{i-1} \cdot C \mathbf{1}_i) + (\overline{Cout_{i-1}} \cdot C \mathbf{0}_i)$$

where i is the position of the cell within the carry chain, can provide the functionality necessary to support the needs of FPGA computations

thus, the fast carry logic unit can contain any logic structure implementing this (including Brent-Kung), Variable Bit, and Ripple Carry.

Note that because of the needs and requirements of carry chains for FPGAs, we will have to develop new circuits, inspired by the standard adder structures, but which are more appropriate for FPGAs



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

36

## Design C (2)

the main difference is to support all states

- Generate
- Propagate
- Kill
- Inverse Propagate

These 4 states are encoded on signals C1 and C0

Also, while standard adders are concerened only with the maximum delay through an entire n-bit adder structure, the delay concerns for FPGAs are more complicated

Specifically, when an n-bit carry chain is built into the architecture of an FPGA it does <u>not</u> represent an <u>actual</u> computation, but only the <u>potential</u> for a computation.



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

# Design C (2)

|   |   | Cin   | Cin   |                             |
|---|---|-------|-------|-----------------------------|
| Х | Y | Cout1 | Cout0 |                             |
| 0 | 0 | 0     | 0     | $\overline{X} \overline{Y}$ |
| 0 | 1 | 1     | 0     | $\overline{X}$ Y            |
| 1 | 0 | 1     | 0     | ΧŸ                          |
| 1 | 1 | 1     | 1     | ΧY                          |

| Cout1 | Cout0 | Cout | Name              |
|-------|-------|------|-------------------|
| 0     | 0     | 0    | Kill              |
| 0     | 1     | Cin  | Inverse Propagate |
| 1     | 0     | Cin  | Propagate         |
| 1     | 1     | 1    | Generate          |



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

| C1 | C0 |     | Name              |
|----|----|-----|-------------------|
| 0  | 0  | 0   | Kill              |
| 0  | 1  | Cin | Inverse Propagate |
| 1  | 0  | Cin | Propagate         |
| 1  | 1  | 1   | Generate          |



| Х | Y | C1 | C0 |                            |
|---|---|----|----|----------------------------|
| 0 | 0 | 0  | 0  | $\overline{X}\overline{Y}$ |
| 0 | 1 | 1  | 0  | ΧY                         |
| 1 | 0 | 1  | 0  | ΧŸ                         |
| 1 | 1 | 1  | 1  | ΧY                         |

 $Cout_i = (Cout_{i-1} \cdot C \mathbf{1}_i) + (\overline{Cout_{i-1}} \cdot C \mathbf{0}_i)$ 

 $(\overline{Cout_{i-1}} \cdot C \mathbf{0}_i) = \overline{Cout_{i-1}} \cdot X Y$ 

Υ

 $(Cout_{i-1} \cdot C1_i) = Cout_{i-1} \cdot (\overline{X} Y + X \overline{Y} + X Y)$ 

Cout,

| $C1_i = X_i + Y_i$      |  |
|-------------------------|--|
| $C 0_i = X_i \cdot Y_i$ |  |
|                         |  |
|                         |  |

| C1 | C0 |     | Name              |
|----|----|-----|-------------------|
| 0  | 0  | 0   | Kill              |
| 0  | 1  | Cin | Inverse Propagate |
| 1  | 0  | Cin | Propagate         |
| 1  | 1  | 1   | Generate          |



Х

| 1 |  |
|---|--|
| 1 |  |
|   |  |

Cout<sub>i+1</sub>

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

### Generate and propagate



$$Cout_{3} = (Cout_{1} \cdot (C \operatorname{1}_{3} \cdot C \operatorname{1}_{2} + C \operatorname{0}_{3} \cdot \overline{C \operatorname{1}_{2}})) + (\overline{Cout_{1}} \cdot (C \operatorname{1}_{3} \cdot C \operatorname{0}_{2} + C \operatorname{0}_{3} \cdot \overline{C \operatorname{0}_{2}}))$$

$$= (Cout_1 \cdot (C \, 1_3 \cdot (\bar{X}_2 Y_2 + X_2 \bar{Y}_2 + X_2 Y_2) + C \, 0_3 \cdot \bar{X}_2 \bar{Y}_2)) + (\overline{Cout_1} \cdot (C \, 1_3 \cdot X_2 Y_2 + C \, 0_3 \cdot (\bar{X}_2 Y_2 + X_2 \bar{Y}_2 + \bar{X}_2 \bar{Y}_2)))$$

40

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

### **Carry Chain Adder**

Young Won Lim 1/19/21

## Cout3 in terms of Cout1

| $X_{3}Y_{3}$ | $X_2 Y_2$ | Cout2 | Cout3 | Cout3 |
|--------------|-----------|-------|-------|-------|
| 0 0          | 0 0       | 0     | 0     | 0     |
| 0 0          | 0 1       | Cout1 | 0     | 0     |
| 0 0          | 1 0       | Cout1 | 0     | 0     |
| 0 0          | 1 1       | 1     | 0     | 0     |
| 0 1          | 0 0       | 0     | Cou2  | 0     |
| 0 1          | 0 1       | Cout1 | Cou2  | Cout1 |
| 0 1          | 1 0       | Cout1 | Cou2  | Cout1 |
| 0 1          | 1 1       | 1     | Cou2  | 1     |
| 1 0          | 0 0       | 0     | Cou2  | 0     |
| 1 0          | 0 1       | Cout1 | Cou2  | Cout1 |
| 1 0          | 1 0       | Cout1 | Cou2  | Cout1 |
| 1 0          | 1 1       | 1     | Cou2  | 1     |
| 1 1          | 00        | 0     | 1     | 1     |
| 1 1          | 0 1       | Cout1 | 1     | 1     |
| 1 1          | 1 0       | Cout1 | 1     | 1     |
| 1 1          | 1 1       | 1     | 1     | 1     |



$$\begin{array}{l} Cout_{3} \end{array} = \begin{pmatrix} Cout_{1} \cdot \left(C \operatorname{1}_{3} \cdot C \operatorname{1}_{2} + C \operatorname{0}_{3} \cdot \overline{C \operatorname{1}_{2}}\right) \\ + \left(\overline{Cout_{1}} \cdot \left(C \operatorname{1}_{3} \cdot C \operatorname{0}_{2} + C \operatorname{0}_{3} \cdot \overline{C \operatorname{0}_{2}}\right) \end{pmatrix} \end{array}$$

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

### **Carry Chain Adder**

## Cout3 in terms of Generate and Propagate

|              |           |                 |                 |                 |                 | Cout1          | Cout1                     | Cout1                           | Cout1                 |       |
|--------------|-----------|-----------------|-----------------|-----------------|-----------------|----------------|---------------------------|---------------------------------|-----------------------|-------|
| $X_{3}Y_{3}$ | $X_2 Y_2$ | C1 <sub>3</sub> | C0 <sub>3</sub> | C1 <sub>2</sub> | C0 <sub>2</sub> | $C1_{3}C1_{2}$ | $C0_{3}\overline{C1}_{2}$ | C1 <sub>3</sub> C0 <sub>2</sub> | $C0_3\overline{C0}_2$ | Cout3 |
| 0 0          | 0 0       | 0               | 0               | 0               | 0               | 0              | 0                         | 0                               | 0                     | 0     |
| 0 0          | 0 1       | 0               | 0               | 1               | 0               | 0              | 0                         | 0                               | 0                     | 0     |
| 0 0          | 1 0       | 0               | 0               | 1               | 0               | 0              | 0                         | 0                               | 0                     | 0     |
| 0 0          | 1 1       | 0               | 0               | 1               | 1               | 0              | 0                         | 0                               | 0                     | 0     |
| 0 1          | 0 0       | 1               | 0               | 0               | 0               | 0              | 0                         | 0                               | 0                     | 0     |
| 0 1          | 0 1       | 1               | 0               | 1               | 0               | 1              | 0                         | 0                               | 0                     | Cout1 |
| 0 1          | 1 0       | 1               | 0               | 1               | 0               | 1              | 0                         | 0                               | 0                     | Cout1 |
| 0 1          | 1 1       | 1               | 0               | 1               | 1               | 1              | 0                         | 1                               | 0                     | 1     |
| 1 0          | 0 0       | 1               | 0               | 0               | 0               | 0              | 0                         | 0                               | 0                     | 0     |
| 1 0          | 0 1       | 1               | 0               | 1               | 0               | 1              | 0                         | 0                               | 0                     | Cout1 |
| 1 0          | 1 0       | 1               | 0               | 1               | 0               | 1              | 0                         | 0                               | 0                     | Cout1 |
| 1 0          | 1 1       | 1               | 0               | 1               | 1               | 1              | 0                         | 1                               | 0                     | 1     |
| 1 1          | 00        | 1               | 1               | 0               | 0               | 0              | 1                         | 0                               | 1                     | 1     |
| 1 1          | 0 1       | 1               | 1               | 1               | 0               | 1              | 0                         | 0                               | 1                     | 1     |
| 1 1          | 1 0       | 1               | 1               | 1               | 0               | 1              | 0                         | 0                               | 1                     | 1     |
| 1 1          | 1 1       | 1               | 1               | 1               | 1               | 1              | 0                         | 1                               | 0                     | 1     |

 $Cout_{3} = (Cout_{1} \cdot (C \operatorname{1}_{3} \cdot C \operatorname{1}_{2} + C \operatorname{0}_{3} \cdot \overline{C \operatorname{1}_{2}}))$  $+ (\overline{Cout_{1}} \cdot (C \operatorname{1}_{3} \cdot C \operatorname{0}_{2} + C \operatorname{0}_{3} \cdot \overline{C \operatorname{0}_{2}}))$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



| $C1_{3}$                | $\cdot C 1_2 \cdot$     | $Cout_1$ |
|-------------------------|-------------------------|----------|
| prop                    | prop                    |          |
| $\overline{X_3}Y_3$     | $\overline{X_2}Y_2$     |          |
| $X_{3}\overline{Y_{3}}$ | $X_{2}\overline{Y_{2}}$ |          |
| $X_{3}Y_{3}$            | $X_2Y_2$                |          |
|                         |                         |          |

$$\begin{array}{c} C \ \mathbf{0}_{3} \cdot \overline{C} \ \mathbf{1}_{2} \cdot Cout_{1} \\ \\ \begin{array}{c} \text{gen} & \overline{\text{prop}} \\ \overline{X_{3}Y_{3}} & \overline{X_{2}\overline{Y_{2}}} \end{array}$$

 $\begin{array}{l} Cout_{3} \end{array} = \begin{pmatrix} Cout_{1} \cdot (C \ 1_{3} \cdot C \ 1_{2} + C \ 0_{3} \cdot \overline{C} \ \overline{1_{2}}) \end{pmatrix} \\ + \begin{pmatrix} \overline{Cout_{1}} \cdot (C \ 1_{3} \cdot C \ 0_{2} + C \ 0_{3} \cdot \overline{C} \ \overline{0_{2}}) \end{pmatrix} \end{array}$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



| $C1_{3}$                | $\cdot C 0_2 \cdot \overline{Cout}_1$ |
|-------------------------|---------------------------------------|
| prop                    | gen                                   |
| $\overline{X_3}Y_3$     | $X_2Y_2$                              |
| $X_{3}\overline{Y_{3}}$ |                                       |
| $X_{3}Y_{3}$            |                                       |

 $\begin{array}{c} C \ 0_3 \cdot \overline{C \ 0_2} \cdot \overline{Cout}_1 \\ \\ gen & gen \\ x_3 Y_3 & \overline{X_2 Y_2} \\ & \overline{X_2 Y_2} \\ & \overline{X_2 Y_2} \\ & \overline{X_2 Y_2} \end{array}$ 

$$(C1_3C1_2 + C0_3\overline{C1}_2)Cout_1 + (C1_3C0_2 + C0_3\overline{C0}_2)\overline{Cout}_1$$

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

 $(C1_{3}C1_{2}+C0_{3}\overline{C1}_{2})Cout_{1}+(C1_{3}C0_{2}+C0_{3}\overline{C0}_{2})\overline{Cout}_{1}$ 

## Design C - Carry Select (1)



$$\begin{array}{l} Cout_3 \end{array} = \begin{pmatrix} Cout_1 \cdot \left(C \operatorname{1}_3 \cdot C \operatorname{1}_2 + C \operatorname{0}_3 \cdot \overline{C \operatorname{1}_2}\right) \right) \\ + \left(\overline{Cout_1} \cdot \left(C \operatorname{1}_3 \cdot C \operatorname{0}_2 + C \operatorname{0}_3 \cdot \overline{C \operatorname{0}_2}\right) \right) \end{array} = \begin{pmatrix} Cout_1 \cdot \left(C \operatorname{1}_3 \cdot \left(\overline{X}_2 Y_2 + X_2 \overline{Y}_2 + X_2 Y_2\right) + C \operatorname{0}_3 \cdot \overline{X}_2 \overline{Y}_2 \right) \right) \\ + \left(\overline{Cout_1} \cdot \left(C \operatorname{1}_3 \cdot X_2 Y_2 + C \operatorname{0}_3 \cdot \left(\overline{X}_2 Y_2 + X_2 \overline{Y}_2 + \overline{X}_2 \overline{Y}_2 + \overline{X}_2 \overline{Y}_2 \right) \right) \end{pmatrix}$$

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



$$Cout_{2} = (Cout_{1} \cdot C \mathbf{1}_{2}) + (\overline{Cout_{1}} \cdot C \mathbf{0}_{2})$$

$$Cout_{3} = (Cout_{2} \cdot C \mathbf{1}_{3}) + (\overline{Cout_{2}} \cdot C \mathbf{0}_{3})$$

$$= (((Cout_{1} \cdot C \mathbf{1}_{2}) + (\overline{Cout_{1}} \cdot C \mathbf{0}_{2})) \cdot C \mathbf{1}_{3})$$

$$+ (\overline{((Cout_{1} \cdot C \mathbf{1}_{2}) + (\overline{Cout_{1}} \cdot C \mathbf{0}_{2}))} \cdot C \mathbf{0}_{3})$$

 $(((Cout_1 \cdot C \mathbf{1}_2) + (\overline{Cout_1} \cdot C \mathbf{0}_2)) \cdot C \mathbf{1}_3)$ =  $(C \mathbf{1}_3 C \mathbf{1}_2 Cout_1 + C \mathbf{1}_3 C \mathbf{0}_2 \overline{Cout_1})$ 



 $(C1_{3}C1_{2}+C0_{3}\overline{C1}_{2})Cout_{1}+(C1_{3}C0_{2}+C0_{3}\overline{C0}_{2})\overline{Cout}_{1}$ 

 $C1 = \overline{X} Y + X \overline{Y} + X Y \qquad C0 = X Y$ 

 $\overline{C1} = \overline{X} \, \overline{Y} \qquad \overline{C0} = \overline{X} \, Y + X \, \overline{Y} + \overline{X} \, \overline{Y}$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

## Design C - Carry Select (1)

| $C1 = \bar{X}Y + X\bar{Y} + XY$             | C 0 = X Y                                                                     |
|---------------------------------------------|-------------------------------------------------------------------------------|
| $\overline{C1} = \overline{X} \overline{Y}$ | $\overline{C0} = \overline{X} Y + X \overline{Y} + \overline{X} \overline{Y}$ |

| C1 | C0 | Name |                   |
|----|----|------|-------------------|
| 0  | 0  | 0    | Kill              |
| 0  | 1  | Cin  | Inverse Propagate |
| 1  | 0  | Cin  | Propagate         |
| 1  | 1  | 1    | Generate          |

| $Cout_1 = (Cout_0 \cdot C 1_1) + (\overline{Cout_0} \cdot C 0_1)$ |  |
|-------------------------------------------------------------------|--|
| $Cout_2 = (Cout_1 \cdot C 1_2) + (\overline{Cout_1} \cdot C 0_2)$ |  |
| $Cout_3 = (Cout_2 \cdot C 1_3) + (\overline{Cout_2} \cdot C 0_3)$ |  |

| Cout <sub>3</sub> | $= (Cout_1 \cdot (C  1_3 \cdot C  1_2 + C  0_3 \cdot \overline{C  1_2}))$                                                |
|-------------------|--------------------------------------------------------------------------------------------------------------------------|
|                   | + $(\overline{Cout_1} \cdot (C  \mathbb{1}_3 \cdot C  \mathbb{0}_2 + C  \mathbb{0}_3 \cdot \overline{C  \mathbb{0}_2}))$ |

 $= Cout_1 \cdot \left[ (\bar{X}Y + X\bar{Y} + XY)_3 \cdot (\bar{X}Y + X\bar{Y} + XY)_2 + (XY)_3 \cdot (XY)_2 \right]$  $+ Cout_1 \cdot \left[ (\bar{X}Y + X\bar{Y} + XY)_3 \cdot (XY)_2 + (XY)_3 \cdot (\bar{X}Y + X\bar{Y} + \bar{X}\bar{Y})_2 \right]$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

 $= (Cout_1 \cdot (C \, 1_3 \cdot (\overline{X}_2 Y_2 + X_2 \overline{Y}_2 + X_2 Y_2) + C \, 0_3 \cdot \overline{X}_2 \overline{Y}_2))$  $+ (\overline{Cout_1} \cdot (C \, 1_3 \cdot X_2 Y_2 + C \, 0_3 \cdot (\overline{X}_2 Y_2 + X_2 \overline{Y}_2 + \overline{X}_2 \overline{Y}_2)))$ 

## Design C (3)

A carry chain resource may span the entire height of a column in the FPGA, but a mapping to the logic may use only a small portion of this chain, with the carry logic in the mapping starting and ending at <u>arbitrary</u> points in the column

concerned with not just the **carry delay** from the first to the last position in a carry chain, but must consider the delay for a **carry computation** beginning <u>at any point</u> within this column.

For example,

even though the FPGA architecture may provide support for **carry chains** of up to 32 bits, it must also efficiently support 8 bit carry computations placed at any point within this carry chain resource



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

## Design C (4)

#### **Carry Select**

the problem with a ripple carry structure is that the computation of the Cout for bit position i <u>cannot</u> begin until after the computation has been completed in bit positions 0 .. i-1

A carry select structure overcomes this limitation

the main observation is that for any bit position, the only information it received from the previous bit positions is its Cin signal, which can be either **true** or **false**.

In a carry select adder the carry chain is <u>broken</u> at a specific column, and two separate additions occur



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

## Design C (5)

one assuming the Cin signal is **true**, the other assuming it is **false** 

These computations can take place before the previous columns complete their operation since they do <u>not</u> depend on the <u>actual value</u> of the <u>Cin</u> signal

This Cin signal is instead used to determine which adder's outputs should be used

if the Cin signal is **true**, the output of the following stages comes from the adder that assumed that the Cin would be **true** 

likewise, a false Cin chooses the other adder's output



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

## Design C (6)

This <u>splitting</u> of the **carry chain** can be done multiple times, breaking the computation into several pairs of short adders with <u>output muxes</u> choosing which adder's output to select

the length of the adders and the breakpoint are carefully chosen such that the small adders finish computation just as their Cin signals become available

Short adders handle the low-order bits, and the adder length is increased further along the carry chain, since later computations have more time until their Cin signal is available



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

## Design C - Carry Select (1)

A **Carry Select carry chain** structure for use in FPGAs the carry computation for the <u>first two cells</u> is performed with the simple **ripple-carry** structure implemented by **mux1** 

For cell2 and cell3 we use <u>two</u> ripple carry adders, with one adder (implemented by mux2) assuming the Cin is true, and the other (mux3) assuming the Cin is false

Then mux4 and mux5 pick between these two adders' outputs based on the actual Cin coming from mux1.



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

### **Carry Chain Adder**

## Design C - Carry Select (2)

Similarly, cell4, cell5, cell6 have two ripple carry adders (mux6 & mux7 for a Cin of 1, mux8 & mux9 for a Cin of 0), with output muxes (mux10, mux11, mux12) deciding between the two based upon the actual Cin (from mux5).

Subsequent stages will continue to grow in length by one, with cells7, cell8, cell9, cell10 in one block, cell11, cell12, cell13, cell14, cell15 in another, and so on.

timing values showing the delay of the Carry Select carry chain relative to other carry chain will be presented later



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

## Design C - Carry Select (3)

A Carry Select carry chain structure for use in FPGAs The carry computation for the first two cells is performed with the simple ripple-carry structure implemented by mux1

For cells 2 and 3 we use two ripple carry adders, with one adder (implemented by mux2) assuming the Cin is true, and the other (mux3) assuming the Cin is false

Then muxes 4 and 5 pick between these two adders' outputs based on the actual Cin coming from mux1.

Similarly, celss 4-6 have two ripple carry adders (mux6 & mux7 for a Cin of 1, mux8 & mux9 for a Cin of 0), with output muxes (muxes 10-12) deciding between the two based upon the actual Cin (from mux5).

Subsequent stages will continue to grow in length by one, with cells 7-10 in one block, cells 11-15 in another, and so on.

timing values showing the delay of the Carry Select carry chain relative to other carry chain will be presented later





High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

## Design C

delay of 2 delay of 2 delay of 3 Х Υ Ζ Х Х Y Ζ Y Ζ OR AND AND OR OR AND Cout1 Cout0 Cout1 Cout0 Cout1 Cout0 3 3 3 2 Ρ Ρ Ρ Ρ 2 Ρ Ρ 2 C0 C0 C0 C1 C1 C1 **Fast Carry Logic Fast Carry Logic** Cout Cout Cout 5 5 5 Ρ Ρ Ρ F F F

delay of 2n+2 for an n-bit ripple carry chain

(1 for mux1, 1 for mux2, 1 in mux4)

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



 $Cout_i = (Cout_{i-1} \cdot C1_i) + (\overline{Cout_{i-1}} \cdot C0_i)$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



$$Cout_i = (Cout_{i-1} \cdot C1_i) + (\overline{Cout_{i-1}} \cdot C0_i)$$

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

### **Carry Chain Adder**

## Fast Carry Logc

Carry Select Adder Carry Lookahead Adder Brent-Kung Variable Block Ripple Carry Adder

https://en.wikipedia.org/wiki/Carry-lookahead\_adder



62

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



 $= (\overline{Cout_1}Cout_1 + \overline{C1_2}Cout_1 + \overline{Cout_1}\overline{C0_2} + \overline{C1_2}\overline{C0_2}) \cdot C0_3$  $= (\overline{C1_2}Cout_1 + \overline{C0_2}\overline{Cout_1}) \cdot C0_3$  $= (C0_3\overline{C1_2}Cout_1 + C0_3\overline{C0_2}\overline{Cout_1})$ 

 $(C1_{3}C1_{2}+C0_{3}\overline{C1}_{2})Cout_{1}+(C1_{3}C0_{2}+C0_{3}\overline{C0}_{2})\overline{Cout}_{1}$ 

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



$$Cout_{i} = (Cout_{i-1} \cdot C \mathbf{1}_{i}) + (\overline{Cout_{i-1}} \cdot C \mathbf{0}_{i})$$
$$Cout_{i+1} = (Cout_{i} \cdot C \mathbf{1}_{i+1}) + (\overline{Cout_{i}} \cdot C \mathbf{0}_{i+1})$$

$$Cout_{i+1} = \left( \left[ \left( Cout_{i-1} \cdot C \mathbf{1}_i \right) + \left( \overline{Cout_{i-1}} \cdot C \mathbf{0}_i \right) \right] \cdot C \mathbf{1}_{i+1} \right) \\ + \left( \overline{\left[ \left( Cout_{i-1} \cdot C \mathbf{1}_i \right) + \left( \overline{Cout_{i-1}} \cdot C \mathbf{0}_i \right) \right]} \cdot C \mathbf{0}_{i+1} \right)$$

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

### References

- [1] http://en.wikipedia.org/
- [2] J-P Deschamps, et. al., "Synthesis of Arithmetic Circuits", 2006