# Carry Skip Adder (5A)

•

•

Copyright (c) 2024 – 2013 Young W. Lim.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

https://en.wikipedia.org/wiki/AND\_gate https://en.wikipedia.org/wiki/OR\_gate https://en.wikipedia.org/wiki/XOR\_gate https://en.wikipedia.org/wiki/NAND\_gate

Please send corrections (or suggestions) to youngwlim@hotmail.com.

This document was produced by using OpenOffice and Octave.

## Carry Kill, Propagate, Generate conditions (1)

| Χ | Υ |   |            |
|---|---|---|------------|
| 0 | 0 | K | Kill (=PG) |
| 0 | 1 | Р | Propagate  |
| 1 | 0 | Р | Propagate  |
| 1 | 1 | G | Generate   |



https::/electronics.stackexchange.com/questions/21251/critical-path-for-carry-skip-adder

## Carry Kill, Propagate, Generate conditions (2)



### K, P, and G conditions in a 2-bit adder (1)

| Χ | Υ |   |            |
|---|---|---|------------|
| 0 | 0 | K | Kill (=PG) |
| 0 | 1 | Р | Propagate  |
| 1 | 0 | Р | Propagate  |
| 1 | 1 | G | Generate   |



Unless the two FA's are in propagate mode, the transition of Cin does <u>not</u> affect the transition of Cout

Critical path – all FA's in propagate mode

Broken paths for any FA in other mode - kill mode, generate mode

https::/electronics.stackexchange.com/questions/21251/critical-path-for-carry-skip-adder

### K, P, and G conditions in a 2-bit adder (2)

| Χ | Υ |   |            |
|---|---|---|------------|
| 0 | 0 | K | Kill (=PG) |
| 0 | 1 | Р | Propagate  |
| 1 | 0 | P | Propagate  |
| 1 | 1 | G | Generate   |



### 1. Cases when **FA1** is in the **K** mode



### 2. Cases when **FA1** is in the **P** mode



### 3. Cases when FA1 is in the G mode



### Cases for $C_{out} = 1$



### Cases for $C_{out} = 0$



### FA with P & G





| а | b | C S |
|---|---|-----|
| 0 | 0 | 0 0 |
| 0 | 1 | 0 1 |
| 1 | 0 | 0 1 |
| 1 | 1 | 1 0 |



Half Adder

https://en.wikipedia.org/wiki/Carry-skip\_adder

Full adder with additional generate and propagate signals.

## Ripple Carry Adder

$$\begin{array}{c} \textbf{p}_{i} = \textbf{a}_{i} \oplus \textbf{b}_{i} \\ \textbf{g}_{i} = \textbf{a}_{i} \wedge \textbf{b}_{i} \\ \\ \textbf{c}_{2} = \textbf{g}_{1} + \textbf{p}_{1} \wedge \textbf{c}_{1} \\ \\ \textbf{c}_{3} = \textbf{g}_{2} + \textbf{p}_{2} \wedge \textbf{c}_{2} \\ \\ \textbf{c}_{4} = \textbf{g}_{3} + \textbf{p}_{3} \wedge \textbf{c}_{3} \\ \\ \end{array}$$



### 4-bit Full Adder with P and G



https://upload.wikimedia.org/wikiversity/en/1/18/RCA.Note.H.1.20151215.pdf

## C<sub>o</sub> propagation condition



 $c_0$  can be propagated to  $c_{out}$  only when s = 1

$$\mathbf{S} = \mathbf{p}_{3} \wedge \mathbf{p}_{2} \wedge \mathbf{p}_{1} \wedge \mathbf{p}_{0} = \mathbf{p}_{[3:0]}$$

$$= (\mathbf{a}_{3} \oplus \mathbf{b}_{3})$$

$$\wedge (\mathbf{a}_{2} \oplus \mathbf{b}_{2})$$

$$\wedge (\mathbf{a}_{1} \oplus \mathbf{b}_{1})$$

$$\wedge (\mathbf{a}_{0} \oplus \mathbf{b}_{0})$$





The n-bit Carry Skip Adder consists of

a n-bit **carry-ripple-chain**, a n-input **AND-gate** and one **multiplexer**.

a multiplexer switches either the last carry-bit  $\mathbf{c}_{\rm n}$  or the carry-in  $\mathbf{c}_{\rm o}$  to the carry-out signal  $\mathbf{c}_{\rm out}$ 

$$s = p_3 \wedge p_2 \wedge p_1 \wedge p_0 = p_{[3:0]}$$

when 
$$s = 1$$
,  $c_{out} \leftarrow c_0$   
otherwise, internally generated carries  
can be propagated to  $c_{out} \leftarrow c_4$ 





The critical path of a Carry Skip Adder begins at the first full adder, passes through all adders and ends at the sum bit  $s_{n-1}$ 

Since a <u>single</u> *n-bit* Carry Skip Adder has <u>no</u> real speed <u>benefit</u> compared to a *n-bit* Ripple Carry Adder

$$T_{CSA}(n) = T_{RCA}(n)$$



#### the skip logic consists of a k-input AND gate and one MUX





As the propagate signals are computed in parallel and are early available,

$$\mathbf{p}_{i} = \mathbf{a}_{i} \oplus \mathbf{b}_{i}$$

The <u>critical path</u> in a Carry Skip Adder consists of <u>ripple carry path</u> and <u>mux path</u> for ripple carry  $(T_{SK2})$ 

the <u>critical path</u> for the <u>skip logic</u> in a Carry Skip Adder consists of the delay imposed by the <u>multiplexer</u> (conditional skip)

T<sub>CSK</sub> skip logic delay in the critical path

$$T_{CSK} = T_{SK2} = T_{MUX} = 2D$$





### Block Carry Skip Adder

Block carry skip adders are composed of a number of carry skip adders

There are two types of block carry skip adders

The two operands  $A = (a_{n-1}, a_{n-2}, \dots a_1, a_0)$  and  $B = (b_{n-1}, b_{n-2}, \dots b_1, b_0)$  are split in k blocks of  $(m_k, m_{k-1}, \dots m_2, m_1)$  bits

- Why are block carry skip adders used
- Should the block size be constant or variable?
- Fixed block size vs. variable block size

### Fixed-size Block Carry Skip Adder

Carry Skip Adders are <u>chained</u> to reduce the <u>overall</u> <u>critical</u> <u>path</u>, (Block Carry Skip Adders)

Fixed size block Carry Skip Adders (FCSA) split the n bits of the input bits Into blocks of k bits each, resulting in R = n / k blocks.

# Fixed-size block CSA (FCSA)





Www.cs.tufts.edu

the critical path

the longest carry path must be

- generate in the first block
- <u>terminated</u> in the <u>last</u> block
- propagated in the blocks between the first and the last

| Χ | Υ |   |            |
|---|---|---|------------|
| 0 | 0 | K | Kill (=PG) |
| 0 | 1 | Р | Propagate  |
| 1 | 0 | Р | Propagate  |
| 1 | 1 | G | Generate   |

Fixed-size block CSA k bits
(FCSA) n = R·k

the last block (R-2) blocks the first block

carry terminated in the last FA

carry generated in the first FA



The critical path consists of

Fixed-size block CSA (FCSA)

• the <u>ripple path</u> and the <u>skip element</u> of the first block

 $T_{CRA}(k) + T_{CSK}$ 

 $T_{CRA[0:cout]}(k)$ 

R groups k bits

• the <u>skip paths</u> that are enclosed between the <u>first</u> and the <u>last</u> block  $(R-2)T_{CSK}$ 

 $n = R \cdot k$ 

• finally the <u>ripple path</u> of the last block  $T_{CRA}(k)$ 



### 4-bit Full Adder with P and G



k bits

$$T_{CRA}(k) = 2D \cdot k$$

https://upload.wikimedia.org/wikiversity/en/1/18/RCA.Note.H.1.20151215.pdf

### Critical Path Delay

The critical path consists of

Fixed-size block CSA (FCSA)

- the <u>ripple path</u> and the <u>skip element</u> of the first block  $T_{CRA}(k) + T_{CSK}$
- the <u>skip paths</u> that are enclosed between the <u>first</u> and the <u>last</u> block (R-2)T<sub>CSK</sub>
- finally the <u>ripple path</u> of the last block  $T_{CRA}(k)$

```
T_{FCSA}(n) = T_{CRA}(k) + T_{CSK} + (R-2)T_{CSK} + T_{CRA}(k)
= k 2D + 2D + (R-2)2D + k2D
= k2D + 2D + (R-1)2D - 2D + k2D
= k2D + (R-1)2D + k2D
= 2k2D + (R-1)2D
= (2k+R)2D
= (2k+R)2D
= (2k+n/k)2D
= 3D + k2D + R2D - 2D + k2D + 4D
= (2k+R)2D + 5D
```

R groups k bits

$$n = R \cdot k$$

### Optimal block size k

$$T_{FCSA}(n) = T_{CRA}(k) + T_{CSK} + (R-2)T_{CSK} + T_{CRA}(k)$$
  
=  $(2k+R)2D$   
=  $(2k+n/k)2D$   $(2k+\frac{n}{k})2D$ 

The optimal block size k for a given adder width n

$$dT_{FCSA}(n) / dk = 0$$

$$\frac{dT_{FCSA}(n)}{dk} = 0$$

$$(2-n(1/k^2)) = 0$$

$$2 = n/k^2$$

$$k^2 = n / 2$$

$$k = sqrt(n/2)$$

$$k_{opt} = \sqrt{\frac{n}{2}}$$

 $k^2 = \frac{n}{2}$ 

5.6 = 
$$sqrt(64/2)$$
  $n = 64bits \rightarrow k = 6$   
4 =  $sqrt(32/2)$   $n = 32bits \rightarrow k = 4$ 

https://en.wikipedia.org/wiki/Carry-skip\_adder

# Fixed-size block CSA (FCSA)

R groups

k bits

$$n = R \cdot k$$

### **Examples of Optimal Block Sizes**

$$T_{FCSA,opt}(n) = \left(2k + \frac{n}{k}\right) 2D$$

$$T_{FCSA}(32) = (2k+32/k)2D$$

#### (2k+32/k)



$$4 = \text{sqrt}(32/2)$$
  $k_{opt} = \sqrt{\frac{n}{2}}$ 

$$n = 32bits$$
  
 $\rightarrow k = 4$ 

https://en.wikipedia.org/wiki/Carry-skip\_adder

$$T_{FCSA,opt}(n) = \left(2k + \frac{n}{k}\right) 2D$$

$$T_{FCSA}(64) = (2k+64/k)2D$$

#### (2k + 64/k)



5.6 = sqrt(64/2) 
$$k_{opt} = \sqrt{\frac{n}{2}}$$

$$n = 64bits$$
  
 $\rightarrow k = 6$ 

# Fixed-size block CSA (FCSA)

R groups k bits

$$n = R \cdot k$$

### **Asymptotic Analysis**

$$T_{FCSA}(n) = (2k+n/k)2D$$

The optimal block size k for a given adder width n k = sqrt(n/2)

$$T_{FCSA, opt}(n)$$
 =  $(2sqrt(n/2) + n/sqrt(n/2))2D$   
=  $(sqrt(2n) + sqrt(n^2 /n / 2)) 2D$   
=  $(sqrt(2n) + sqrt(2n))2D$   
=  $(2sqrt(2n))2D$ 

$$T_{FCSA,opt}(n) = \left(2\sqrt{n/2} + \frac{n}{\sqrt{n/2}}\right) 2D$$
  
=  $\left(2\sqrt{2n}\right) 2D$  when  $k_{opt} = \sqrt{\frac{n}{2}}$ 

https://en.wikipedia.org/wiki/Carry-skip\_adder

# Fixed-size block CSA (FCSA)

R groups k bits

 $n = R \cdot k$ 

$$T_{CRA}(n) = 2D \cdot n$$
  
 $T_{FCSA,opt}(n) = 2D \cdot (2sqrt(2n))$ 



### Critical Carry Path (1)

 $T_s < 3T_p$ 

For longest carry path, if any block <u>generates</u> a carry, that carry will <u>propagate</u> through the remaining 3 FA's of that block



Least Significant Block

 $C_{in} = 1$  or 0

and then through the carry skip gates to the final block,



Middle Blocks

1

since all FA's are in the propagate mode, skip path is taken → no ripple delay

## Critical Carry Path (2)

At the <u>final</u> block, it may have to <u>propagate</u> through 3 FA's to reach the most significant adder

#### **Cases for the Most Significant Block**



the propagated carry must be terminated at the most significant adder

the new carry can be generated at the most significant adder

a new path starts here



Otherwise, the whole block is to be skipped

smaller delay than ripple delay

 $T_s < 3T_p$ 

cannot be a critical path since all FA's are propagate mode, skip path is taken

### Critical Carry Path (3)

$$T_s < 3T_p$$

#### **Cases for the Least Significant Block**







 $T_s < 3T_p$ 

cannot be a critical path since all FA's are propagate mode, skip path is taken → no ripple delay

### Critical Carry Path (4)



The longest delay path from  $c_1$  to  $c_{n-1}$ 

begins with a carry generated in FA0 in the least significant block 0, propagates through FA3 in block 0, then through the skip element (MUX can be replaced with OR gate), then through carry skip units of (R-2) blocks, and then through fa0, fa1, fa2 in the most significant block (R-1), to the  $c_{n-1}$  signal

# Fixed-size block CSA (FCSA)

R groups k bits

 $n = R \cdot k$ 



The longest delay path from C<sub>1</sub> to C<sub>n-1</sub>

$$(k-1)T_p + D + (n/k-2)T_s + (k-1)T_p$$

 $T_p$  is the time to propagate a carry through one stage of the full adder (from  $c_i$  to  $c_{i+1}$ )

T<sub>s</sub> is the delay through one carry-skip stage

More Fast Adders, Ivor Page, University of Texas at Dallas

# Fixed-size block CSA (FCSA)

R groups

k bits

$$n = R \cdot k$$

A carry signal centering a certain block can be propagated past the block without waiting for the signal to propagate through the 4 individual stages of the block



If all n/4 blocks propagate, a carry entering the least significant stage will pass to the most significant carry-out in time n/4 times the delay through the carry-skip unit



#### Ripple through k-1 bits

$$(k-1)\Delta_{rca}$$





#### Skip carry





The maximal delay  $\Delta$  of a Carry Skip Adder is encountered when carry is generated in the least-significant bit position,

- rippling through k-1 bit positions,
- skipping over R-2 = N/k-2 groups in the middle,
- rippling to the k-1 bits of most significant group and
- being assimilated in the *N-th* bit position to produce the sum  $S_N$ :

$$\Delta_{CSA} = (k - 1) \Delta_{rca} + (R - 2) \Delta_{SKIP} + (k - 1) \Delta_{rca}$$

$$= 2 (k - 1) \Delta_{rca} + (R - 2) \Delta_{SKIP}$$

$$= 2 (k - 1) \Delta_{rca} + (N/k - 2) \Delta_{SKIP}$$

$$\begin{split} \Delta_{\text{CSA}} &= (\mathsf{k} - 1) \, \Delta_{\text{rca}} + (\mathsf{R} - 2) \, \Delta_{\text{SKIP}} + (\mathsf{k} - 1) \, \Delta_{\text{rca}} \\ &= \, 2 \, (\mathsf{k} - 1) \, \Delta_{\text{rca}} + (\mathsf{R} - 2) \, \Delta_{\text{SKIP}} \\ &= \, 2 \, (\mathsf{k} - 1) \, \Delta_{\text{rca}} + (N/k - 2) \, \Delta_{\text{SKIP}} \end{split}$$

Carry Skip Adder is faster than RCA at the expense of a few relatively simple modifications.

The delay is still linearly dependent on the size of the adder N, however this linear dependence is reduced by a factor of 1/k

 $N = R \cdot k$ 



## Design C (9) – When Cout1 = 1





High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

If an arbitrary block generated a carry by itself,
The carry will always propagate to the next block
However, if the second block generates a carry itself,
Or kill the carry, then that is the end of the critical path

If the second block propagates the carry, then we see The advantage of the CSA architecture

https::/electronics.stackexchange.com/questions/21251/critical-path-for-carry-skip-adder

https::/electronics.stackexchange.com/questions/21251/critical-path-for-carry-skip-adder

### Variable size Block Carry Skip Adder

The performance can be improved, ie. all carries propagated quickly by <u>varying</u> the <u>block sizes</u>

Accordingly the initial blocks of the adder are made <u>smaller</u> so as to <u>quickly detect carry generates</u> that must be <u>propagated</u> the furthers, the <u>middle blocks</u> are made <u>larger</u> because they are not the problem case, and then the <u>most significant blocks</u> are again made smaller so that the <u>late arriving</u> carry inputs can be processed quickly

https://electronics.stackexchange.com/questions/21251/critical-path-for-carry-skip-adder

#### References

- [1] en.wikipedia.org
- [2] Parhami, "Computer Arithmetic Algorithms and Hardware Designs"